XVFI: eXtreme Video Frame Interpolation

In this paper, we firstly present a dataset (X4K1000FPS) of 4K videos of 1000fps with the extreme motion to the research community for video frameinterpolation (VFI), and propose an extreme VFI network, called XVFI-Net, thatfirst handles the VFI for 4K videos with large motion. The XVFI-Net is based ona recursive multi-scale shared structure that consists of two cascaded modulesfor bidirectional optical flow learning between two input frames (BiOF-I) andfor bidirectional optical flow learning from target to input frames (BiOF-T).The optical flows are stably approximated by a complementary flow reversal(CFR) proposed in BiOF-T module. During inference, the BiOF-I module can startat any scale of input while the BiOF-T module only operates at the originalinput scale so that the inference can be accelerated while maintaining highlyaccurate VFI performance. Extensive experimental results show that our XVFI-Netcan successfully capture the essential information of objects with extremelylarge motions and complex textures while the state-of-the-art methods exhibitpoor performance. Furthermore, our XVFI-Net framework also performs comparablyon the previous lower resolution benchmark dataset, which shows a robustness ofour algorithm as well. All source codes, pre-trained models, and proposedX4K1000FPS datasets are publicly available athttps://github.com/JihyongOh/XVFI.