15 days ago

CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering

Zhengqing Wang, Yuefan Wu, Jiacheng Chen, Fuyang Zhang, Yasutaka Furukawa

Abstract

This paper proposes a neural rendering approach that represents a scene as"compressed light-field tokens (CLiFTs)", retaining rich appearance andgeometric information of a scene. CLiFT enables compute-efficient rendering bycompressed tokens, while being capable of changing the number of tokens torepresent a scene or render a novel view with one trained network. Concretely,given a set of images, multi-view encoder tokenizes the images with the cameraposes. Latent-space K-means selects a reduced set of rays as cluster centroidsusing the tokens. The multi-view ``condenser'' compresses the information ofall the tokens into the centroid tokens to construct CLiFTs. At test time,given a target view and a compute budget (i.e., the number of CLiFTs), thesystem collects the specified number of nearby tokens and synthesizes a novelview using a compute-adaptive renderer. Extensive experiments on RealEstate10Kand DL3DV datasets quantitatively and qualitatively validate our approach,achieving significant data reduction with comparable rendering quality and thehighest overall rendering score, while providing trade-offs of data size,rendering quality, and rendering speed.