DoPTA-HR | 0.970 | 0.957 | 0.949 | 0.977 | 0.944 | 0.895 | DoPTA: Improving Document Layout Analysis using Patch-Text Alignment | - |
ResNext-101-32×8d | 0.968 | 0.940 | 0.935 | 0.976 | 0.930 | 0.862 | Vision Grid Transformer for Document Layout Analysis | |
UDoc | 0.964 | 0.937 | 0.939 | 0.973 | 0.939 | 0.885 | Unified Pretraining Framework for Document Understanding | - |
GLAM | 0.206 | 0.862 | 0.722 | 0.868 | 0.878 | 0.800 | A Graphical Approach to Document Layout Analysis | |
TRDLU | 0.966 | 0.975 | 0.959 | 0.976 | 0.958 | 0.921 | Transformer-based Approach for Document Understanding | - |
Faster RCNN | 0.937 | 0.883 | 0.902 | 0.954 | 0.910 | 0.826 | PubLayNet: largest dataset ever for document layout analysis | |
Mask RCNN | 0.949 | 0.886 | 0.910 | 0.960 | 0.916 | 0.840 | PubLayNet: largest dataset ever for document layout analysis | |
VGT | 0.971 | 0.968 | 0.962 | 0.981 | 0.950 | 0.939 | Vision Grid Transformer for Document Layout Analysis | |
BEiT-B | 0.957 | 0.924 | 0.931 | 0.973 | 0.934 | 0.866 | BEiT: BERT Pre-Training of Image Transformers | |