Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection

Unsupervised image Anomaly Detection (UAD) aims to learn robust anddiscriminative representations of normal samples. While separate solutions perclass endow expensive computation and limited generalizability, this paperfocuses on building a unified framework for multiple classes. Under such achallenging setting, popular reconstruction-based networks with continuouslatent representation assumption always suffer from the "identical shortcut"issue, where both normal and abnormal samples can be well recovered anddifficult to distinguish. To address this pivotal issue, we propose ahierarchical vector quantized prototype-oriented Transformer under aprobabilistic framework. First, instead of learning the continuousrepresentations, we preserve the typical normal patterns as discrete iconicprototypes, and confirm the importance of Vector Quantization in preventing themodel from falling into the shortcut. The vector quantized iconic prototype isintegrated into the Transformer for reconstruction, such that the abnormal datapoint is flipped to a normal data point.Second, we investigate an exquisitehierarchical framework to relieve the codebook collapse issue and replenishfrail normal patterns. Third, a prototype-oriented optimal transport method isproposed to better regulate the prototypes and hierarchically evaluate theabnormal score. By evaluating on MVTec-AD and VisA datasets, our modelsurpasses the state-of-the-art alternatives and possesses goodinterpretability. The code is available athttps://github.com/RuiyingLu/HVQ-Trans.