Frame-Recurrent Video Super-Resolution

Recent advances in video super-resolution have shown that convolutionalneural networks combined with motion compensation are able to merge informationfrom multiple low-resolution (LR) frames to generate high-quality images.Current state-of-the-art methods process a batch of LR frames to generate asingle high-resolution (HR) frame and run this scheme in a sliding windowfashion over the entire video, effectively treating the problem as a largenumber of separate multi-frame super-resolution tasks. This approach has twomain weaknesses: 1) Each input frame is processed and warped multiple times,increasing the computational cost, and 2) each output frame is estimatedindependently conditioned on the input frames, limiting the system's ability toproduce temporally consistent results. In this work, we propose an end-to-end trainable frame-recurrent videosuper-resolution framework that uses the previously inferred HR estimate tosuper-resolve the subsequent frame. This naturally encourages temporallyconsistent results and reduces the computational cost by warping only one imagein each step. Furthermore, due to its recurrent nature, the proposed method hasthe ability to assimilate a large number of previous frames without increasedcomputational demands. Extensive evaluations and comparisons with previousmethods validate the strengths of our approach and demonstrate that theproposed framework is able to significantly outperform the current state of theart.