Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images
Rykov, Elisei ; Petrushina, Kseniia ; Titova, Kseniia ; Razzhigaev, Anton ; Panchenko, Alexander ; Konovalov, Vasily
발행일: 5/21/2025

초록
Measuring how real images look is a complex task in artificial intelligenceresearch. For example, an image of a boy with a vacuum cleaner in a desertviolates common sense. We introduce a novel method, which we call Through theLooking Glass (TLG), to assess image common sense consistency using LargeVision-Language Models (LVLMs) and Transformer-based encoder. By leveragingLVLMs to extract atomic facts from these images, we obtain a mix of accuratefacts. We proceed by fine-tuning a compact attention-pooling classifier overencoded atomic facts. Our TLG has achieved a new state-of-the-art performanceon the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuningcomponent.