SwinIA: Self-Supervised Blind-Spot Image Denoising without Convolutions

Self-supervised image denoising implies restoring the signal from a noisyimage without access to the ground truth. State-of-the-art solutions for thistask rely on predicting masked pixels with a fully-convolutional neuralnetwork. This most often requires multiple forward passes, information aboutthe noise model, or intricate regularization functions. In this paper, wepropose a Swin Transformer-based Image Autoencoder (SwinIA), the firstfully-transformer architecture for self-supervised denoising. The flexibilityof the attention mechanism helps to fulfill the blind-spot property thatconvolutional counterparts normally approximate. SwinIA can be trainedend-to-end with a simple mean squared error loss without masking and does notrequire any prior knowledge about clean data or noise distribution. Simple touse, SwinIA establishes the state of the art on several common benchmarks.