HyperAI
Accueil
Actualités
Articles de recherche récents
Tutoriels
Ensembles de données
Wiki
SOTA
Modèles LLM
Classement GPU
Événements
Recherche
À propos
Français
HyperAI
Toggle sidebar
Rechercher sur le site...
⌘
K
Accueil
SOTA
Document Image Classification
Document Image Classification On Rvl Cdip
Document Image Classification On Rvl Cdip
Métriques
Accuracy
Parameters
Résultats
Résultats de performance de divers modèles sur ce benchmark
Columns
Nom du modèle
Accuracy
Parameters
Paper Title
Repository
LayoutLMV3Large
95.93%
368M
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Multimodal (MobileNetV2)
92.2%
12M
Multimodal Side-Tuning for Document Classification
DiT-B
92.11%
87M
DiT: Self-supervised Pre-training for Document Image Transformer
Roberta base
90.06
125M
RoBERTa: A Robustly Optimized BERT Pretraining Approach
DocFormerBASE
96.17%
183M
DocFormer: End-to-End Transformer for Document Understanding
Pre-trained EfficientNet
92.31%
-
Improving accuracy and speeding up Document Image Classification through parallel systems
DocFormer large
95.50%
536M
DocFormer: End-to-End Transformer for Document Understanding
Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50
90.97%
-
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
Transfer Learning from VGG16 trained on Imagenet
92.21%
-
Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
LayoutXLM
95.21%
-
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
TILT-Base
95.25%
-
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Pre-trained LayoutLM
94.42%
160M
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
DiT-L
92.69%
304M
DiT: Self-supervised Pre-training for Document Image Transformer
VLCDoC
93.19%
217M
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
-
TILT-Large
95.52%
-
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
StrucTexTv2 (small)
93.4%
28M
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
DocXClassifier-B
94.00%
95.4M
DocXClassifier: High Performance Explainable Deep Network for Document Image Classification
EAML
97.70%
-
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
-
AlexNet + spatial pyramidal pooling + image resizing
90.94%
-
Analysis of Convolutional Neural Networks for Document Image Classification
-
LiLT[EN-R]BASE
95.68%
-
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
0 of 31 row(s) selected.
Previous
Next