HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Document Image Classification
Document Image Classification On Rvl Cdip
Document Image Classification On Rvl Cdip
المقاييس
Accuracy
Parameters
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Accuracy
Parameters
Paper Title
Repository
LayoutLMV3Large
95.93%
368M
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Multimodal (MobileNetV2)
92.2%
12M
Multimodal Side-Tuning for Document Classification
DiT-B
92.11%
87M
DiT: Self-supervised Pre-training for Document Image Transformer
Roberta base
90.06
125M
RoBERTa: A Robustly Optimized BERT Pretraining Approach
DocFormerBASE
96.17%
183M
DocFormer: End-to-End Transformer for Document Understanding
Pre-trained EfficientNet
92.31%
-
Improving accuracy and speeding up Document Image Classification through parallel systems
DocFormer large
95.50%
536M
DocFormer: End-to-End Transformer for Document Understanding
Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50
90.97%
-
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
Transfer Learning from VGG16 trained on Imagenet
92.21%
-
Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
LayoutXLM
95.21%
-
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
TILT-Base
95.25%
-
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Pre-trained LayoutLM
94.42%
160M
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
DiT-L
92.69%
304M
DiT: Self-supervised Pre-training for Document Image Transformer
VLCDoC
93.19%
217M
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
-
TILT-Large
95.52%
-
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
StrucTexTv2 (small)
93.4%
28M
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
DocXClassifier-B
94.00%
95.4M
DocXClassifier: High Performance Explainable Deep Network for Document Image Classification
EAML
97.70%
-
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
-
AlexNet + spatial pyramidal pooling + image resizing
90.94%
-
Analysis of Convolutional Neural Networks for Document Image Classification
-
LiLT[EN-R]BASE
95.68%
-
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
0 of 31 row(s) selected.
Previous
Next