PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction

Document pair extraction aims to identify key and value entities as well astheir relationships from visually-rich documents. Most existing methods divideit into two separate tasks: semantic entity recognition (SER) and relationextraction (RE). However, simply concatenating SER and RE serially can lead tosevere error propagation, and it fails to handle cases like multi-line entitiesin real scenarios. To address these issues, this paper introduces a novelframework, PEneo (Pair Extraction new decoder option), which performs documentpair extraction in a unified pipeline, incorporating three concurrentsub-tasks: line extraction, line grouping, and entity linking. This approachalleviates the error accumulation problem and can handle the case of multi-lineentities. Furthermore, to better evaluate the model's performance and tofacilitate future research on pair extraction, we introduce RFUND, are-annotated version of the commonly used FUNSD and XFUND datasets, to makethem more accurate and cover realistic situations. Experiments on variousbenchmarks demonstrate PEneo's superiority over previous pipelines, boostingthe performance by a large margin (e.g., 19.89%-22.91% F1 score on RFUND-EN)when combined with various backbones like LiLT and LayoutLMv3, showing itseffectiveness and generality. Codes and the new annotations are available athttps://github.com/ZeningLin/PEneo.