HyperAIHyperAI
2 months ago

A Graphical Approach to Document Layout Analysis

Wang, Jilin ; Krumdick, Michael ; Tong, Baojia ; Halim, Hamima ; Sokolov, Maxim ; Barda, Vadym ; Vendryes, Delphine ; Tanner, Chris
A Graphical Approach to Document Layout Analysis
Abstract

Document layout analysis (DLA) is the task of detecting the distinct,semantic content within a document and correctly classifying these items intoan appropriate category (e.g., text, title, figure). DLA pipelines enable usersto convert documents into structured machine-readable formats that can then beused for many useful downstream tasks. Most existing state-of-the-art (SOTA)DLA models represent documents as images, discarding the rich metadataavailable in electronically generated PDFs. Directly leveraging this metadata,we represent each PDF page as a structured graph and frame the DLA problem as agraph segmentation and classification problem. We introduce the Graph-basedLayout Analysis Model (GLAM), a lightweight graph neural network competitivewith SOTA models on two challenging DLA datasets - while being an order ofmagnitude smaller than existing models. In particular, the 4-million parameterGLAM model outperforms the leading 140M+ parameter computer vision-based modelon 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these twomodels achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, makingGLAM a favorable engineering choice for DLA tasks.