HyperAIHyperAI

Command Palette

Search for a command to run...

ENTER: Event Based Interpretable Reasoning for VideoQA

Hammad Ayyubi* ♦ Junzhang Liu* ♦ Ali Asgarov† Zaber Hakim† Najibul Sarker† Zhecan Wang♦ Chia-Wei Tang† Hani Alomari† Md. Atabuzzaman† Xudong Lin♦ Naveen Reddy Dyava♦ Shih-Fu Chang♦ Chris Thomas†

Abstract

In this paper, we present ENTER, an interpretable Video Question Answering(VideoQA) system based on event graphs. Event graphs convert videos intographical representations, where video events form the nodes and event-eventrelationships (temporal/causal/hierarchical) form the edges. This structuredrepresentation offers many benefits: 1) Interpretable VideoQA via generatedcode that parses event-graph; 2) Incorporation of contextual visual informationin the reasoning process (code generation) via event graphs; 3) Robust VideoQAvia Hierarchical Iterative Update of the event graphs. Existing interpretableVideoQA systems are often top-down, disregarding low-level visual informationin the reasoning plan generation, and are brittle. While bottom-up approachesproduce responses from visual data, they lack interpretability. Experimentalresults on NExT-QA, IntentQA, and EgoSchema demonstrate that not only does ourmethod outperform existing top-down approaches while obtaining competitiveperformance against bottom-up approaches, but more importantly, offers superiorinterpretability and explainability in the reasoning process.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp