17 days ago

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

Sunghyun Ahn, Youngwan Jo, Kijung Lee, Sein Kwon, Inpyo Hong, Sanghyun Park

Abstract

Video anomaly detection (VAD) is crucial for video analysis and surveillancein computer vision. However, existing VAD models rely on learned normalpatterns, which makes them difficult to apply to diverse environments.Consequently, users should retrain models or develop separate AI models for newenvironments, which requires expertise in machine learning, high-performancehardware, and extensive data collection, limiting the practical usability ofVAD. To address these challenges, this study proposes customizable videoanomaly detection (C-VAD) technique and the AnyAnomaly model. C-VAD considersuser-defined text as an abnormal event and detects frames containing aspecified event in a video. We effectively implemented AnyAnomaly using acontext-aware visual question answering without fine-tuning the large visionlanguage model. To validate the effectiveness of the proposed model, weconstructed C-VAD datasets and demonstrated the superiority of AnyAnomaly.Furthermore, our approach showed competitive performance on VAD benchmarkdatasets, achieving state-of-the-art results on the UBnormal dataset andoutperforming other methods in generalization across all datasets. Our code isavailable online at github.com/SkiddieAhn/Paper-AnyAnomaly.