AlgoPuzzleVQA Multimodal Algorithmic Puzzle Dataset
Date
Size
Publish URL
Categories
The AlgoPuzzleVQA dataset is a multimodal reasoning dataset constructed by the Singapore University of Technology and Design to challenge and evaluate the ability of multimodal language models in solving algorithmic puzzles that require visual understanding, language understanding, and complex algorithmic reasoning.
The dataset contains 18 different puzzles, covering diverse mathematical and algorithmic topics such as Boolean logic, combinatorics, graph theory, optimization, search, etc. The dataset generates puzzles from human-written code in an automated way, ensuring that the dataset can arbitrarily scale in reasoning complexity and dataset size. These puzzles all have exact solutions that can be found by algorithms without tedious manual calculations.
AlgoPuzzleVQA can be used as a benchmark for multimodal reasoning capabilities to evaluate and advance the ability of multimodal language models to solve complex problems that combine vision, language understanding, and algorithmic reasoning.