MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

Video causal reasoning aims to achieve a high-level understanding of videocontent from a causal perspective. However, current video reasoning tasks arelimited in scope, primarily executed in a question-answering paradigm andfocusing on short videos containing only a single event and simple causalrelationships, lacking comprehensive and structured causality analysis forvideos with multiple events. To fill this gap, we introduce a new task anddataset, Multi-Event Causal Discovery (MECD). It aims to uncover the causalrelationships between events distributed chronologically across long videos.Given visual segments and textual descriptions of events, MECD requiresidentifying the causal associations between these events to derive acomprehensive, structured event-level video causal diagram explaining why andhow the final result event occurred. To address MECD, we devise a novelframework inspired by the Granger Causality method, using an efficientmask-based event prediction model to perform an Event Granger Test, whichestimates causality by comparing the predicted result event when premise eventsare masked versus unmasked. Furthermore, we integrate causal inferencetechniques such as front-door adjustment and counterfactual inference toaddress challenges in MECD like causality confounding and illusory causality.Experiments validate the effectiveness of our framework in providing causalrelationships in multi-event videos, outperforming GPT-4o and VideoLLaVA by5.7% and 4.1%, respectively.