AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Large vision-language models (LVLMs) hallucinate: certain context cues in animage may trigger the language module's overconfident and incorrect reasoningon abnormal or hypothetical objects. Though a few benchmarks have beendeveloped to investigate LVLM hallucinations, they mainly rely on hand-craftedcorner cases whose fail patterns may hardly generalize, and finetuning on themcould undermine their validity. These motivate us to develop the firstautomatic benchmark generation approach, AUTOHALLUSION, that harnesses a fewprincipal strategies to create diverse hallucination examples. It probes thelanguage modules in LVLMs for context cues and uses them to synthesize imagesby: (1) adding objects abnormal to the context cues; (2) for two co-occurringobjects, keeping one and excluding the other; or (3) removing objects closelytied to the context cues. It then generates image-based questions whoseground-truth answers contradict the language module's prior. A model has toovercome contextual biases and distractions to reach correct answers, whileincorrect or inconsistent answers indicate hallucinations. AUTOHALLUSIONenables us to create new benchmarks at the minimum cost and thus overcomes thefragility of hand-crafted benchmarks. It also reveals common failure patternsand reasons, providing key insights to detect, avoid, or controlhallucinations. Comprehensive evaluations of top-tier LVLMs, e.g.,GPT-4V(ision), Gemini Pro Vision, Claude 3, and LLaVA-1.5, show a 97.7% and98.7% success rate of hallucination induction on synthetic and real-worlddatasets of AUTOHALLUSION, paving the way for a long battle againsthallucinations.