HyperAI

UDK-VQA Data Generation Framework

The UDK-VQA framework is a data generation framework jointly proposed by Shanghai Artificial Intelligence Laboratory, Beijing Institute of Technology, Zhejiang University, and the University of Hong Kong in 2024. It aims to assist multimodal large models in providing feedback on real-time information.SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge".

The core purpose of the UDK-VQA framework is to enhance existing Large Scale Vision-Language Models (LVLMs) to enable them to handle Visual Question Answering (VQA) with respect to the latest knowledge. Since Large Scale Vision-Language Models cannot be updated frequently enough to include the latest knowledge, in many cases they fail when handling scenarios that require the latest information. For example, if an LVLM is released in January 2024, it will not know who is the singer of the theme song for a movie released in April 2024.

To address this problem, the researchers proposed a plug-and-play framework to provide LVLMs with the latest knowledge during inference via Internet search, the so-called Internet Augmented Generation (IAG). The UDK-VQA framework effectively and efficiently finds the most helpful content from the web pages returned by the search engine to prompt LVLMs with the latest knowledge by training a hierarchical filtering model.

In addition, in order to train the model and evaluate the performance of the framework, the researchers proposed a process to automatically generate news-related VQA samples to construct a dataset, which is named UDK-VQA.