LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Structured document understanding has attracted considerable attention andmade significant progress recently, owing to its crucial role in intelligentdocument processing. However, most existing related models can only deal withthe document data of specific language(s) (typically English) included in thepre-training collection, which is extremely limited. To address this issue, wepropose a simple yet effective Language-independent Layout Transformer (LiLT)for structured document understanding. LiLT can be pre-trained on thestructured documents of a single language and then directly fine-tuned on otherlanguages with the corresponding off-the-shelf monolingual/multilingualpre-trained textual models. Experimental results on eight languages have shownthat LiLT can achieve competitive or even superior performance on diversewidely-used downstream benchmarks, which enables language-independent benefitfrom the pre-training of document layout structure. Code and model are publiclyavailable at https://github.com/jpWang/LiLT.