You Only Train Once: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment

Although recent efforts in image quality assessment (IQA) have achievedpromising performance, there still exists a considerable gap compared to thehuman visual system (HVS). One significant disparity lies in humans' seamlesstransition between full reference (FR) and no reference (NR) tasks, whereasexisting models are constrained to either FR or NR tasks. This disparityimplies the necessity of designing two distinct systems, thereby greatlydiminishing the model's versatility. Therefore, our focus lies in unifying FRand NR IQA under a single framework. Specifically, we first employ an encoderto extract multi-level features from input images. Then a HierarchicalAttention (HA) module is proposed as a universal adapter for both FR and NRinputs to model the spatial distortion at each encoder stage. Furthermore,considering that different distortions contaminate encoder stages and damageimage semantic meaning differently, a Semantic Distortion Aware (SDA) module isproposed to examine feature correlations between shallow and deep layers of theencoder. By adopting HA and SDA, the proposed network can effectively performboth FR and NR IQA. When our proposed model is independently trained on NR orFR IQA tasks, it outperforms existing models and achieves state-of-the-artperformance. Moreover, when trained jointly on NR and FR IQA tasks, it furtherenhances the performance of NR IQA while achieving on-par performance in thestate-of-the-art FR IQA. You only train once to perform both IQA tasks. Codewill be released at: https://github.com/BarCodeReader/YOTO.