Restore Anything Model via Efficient Degradation Adaptation

With the proliferation of mobile devices, the need for an efficient model torestore any degraded image has become increasingly significant and impactful.Traditional approaches typically involve training dedicated models for eachspecific degradation, resulting in inefficiency and redundancy. More recentsolutions either introduce additional modules to learn visual promptssignificantly increasing model size or incorporate cross-modal transfer fromlarge language models trained on vast datasets, adding complexity to the systemarchitecture. In contrast, our approach, termed RAM, takes a unified path thatleverages inherent similarities across various degradations to enable bothefficient and comprehensive restoration through a joint embedding mechanismwithout scaling up the model or relying on large multimodal models.Specifically, we examine the sub-latent space of each input, identifying keycomponents and reweighting them in a gated manner. This intrinsic degradationawareness is further combined with contextualized attention in an X-shapedframework, enhancing local-global interactions. Extensive benchmarking in anall-in-one restoration setting confirms RAM's SOTA performance, reducing modelcomplexity by approximately 82% in trainable parameters and 85% in FLOPs. Ourcode and models will be publicly available.