Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

Recent studies highlighted a practical setting of unsupervised anomalydetection (UAD) that builds a unified model for multi-class images. Despitevarious advancements addressing this challenging task, the detectionperformance under the multi-class setting still lags far behindstate-of-the-art class-separated models. Our research aims to bridge thissubstantial performance gap. In this paper, we introduce a minimalisticreconstruction-based anomaly detection framework, namely Dinomaly, whichleverages pure Transformer architectures without relying on complex designs,additional modules, or specialized tricks. Given this powerful frameworkconsisted of only Attentions and MLPs, we found four simple components that areessential to multi-class anomaly detection: (1) Foundation Transformers thatextracts universal and discriminative features, (2) Noisy Bottleneck wherepre-existing Dropouts do all the noise injection tricks, (3) Linear Attentionthat naturally cannot focus, and (4) Loose Reconstruction that does not forcelayer-to-layer and point-by-point reconstruction. Extensive experiments areconducted across popular anomaly detection benchmarks including MVTec-AD, VisA,and Real-IAD. Our proposed Dinomaly achieves impressive image-level AUROC of99.6%, 98.7%, and 89.3% on the three datasets respectively, which is not onlysuperior to state-of-the-art multi-class UAD methods, but also achieves themost advanced class-separated UAD records.