HyperAI

Document Image Classification On Rvl Cdip

Metriken

Accuracy
Parameters

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
Accuracy
Parameters
Paper TitleRepository
LayoutLMV3Large95.93%368MLayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Multimodal (MobileNetV2)92.2%12MMultimodal Side-Tuning for Document Classification
DiT-B92.11%87MDiT: Self-supervised Pre-training for Document Image Transformer
Roberta base90.06125MRoBERTa: A Robustly Optimized BERT Pretraining Approach
DocFormerBASE96.17%183MDocFormer: End-to-End Transformer for Document Understanding
Pre-trained EfficientNet92.31%-Improving accuracy and speeding up Document Image Classification through parallel systems
DocFormer large95.50%536MDocFormer: End-to-End Transformer for Document Understanding
Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet5090.97%-Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
Transfer Learning from VGG16 trained on Imagenet92.21%-Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
LayoutXLM95.21%-LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
TILT-Base95.25%-Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Pre-trained LayoutLM94.42%160MLayoutLM: Pre-training of Text and Layout for Document Image Understanding
DiT-L92.69%304MDiT: Self-supervised Pre-training for Document Image Transformer
VLCDoC93.19%217MVLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification-
TILT-Large95.52%-Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
StrucTexTv2 (small)93.4%28MStrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
DocXClassifier-B94.00%95.4MDocXClassifier: High Performance Explainable Deep Network for Document Image Classification
EAML97.70%-EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification-
AlexNet + spatial pyramidal pooling + image resizing90.94%-Analysis of Convolutional Neural Networks for Document Image Classification-
LiLT[EN-R]BASE95.68%-LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
0 of 31 row(s) selected.