Document Image Classification On Rvl Cdip

المقاييس

Accuracy

Parameters

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج	Accuracy	Parameters	Paper Title	Repository
LayoutLMV3Large	95.93%	368M	LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Multimodal (MobileNetV2)	92.2%	12M	Multimodal Side-Tuning for Document Classification
DiT-B	92.11%	87M	DiT: Self-supervised Pre-training for Document Image Transformer
Roberta base	90.06	125M	RoBERTa: A Robustly Optimized BERT Pretraining Approach
DocFormerBASE	96.17%	183M	DocFormer: End-to-End Transformer for Document Understanding
Pre-trained EfficientNet	92.31%	-	Improving accuracy and speeding up Document Image Classification through parallel systems
DocFormer large	95.50%	536M	DocFormer: End-to-End Transformer for Document Understanding
Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50	90.97%	-	Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
Transfer Learning from VGG16 trained on Imagenet	92.21%	-	Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
LayoutXLM	95.21%	-	LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
TILT-Base	95.25%	-	Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Pre-trained LayoutLM	94.42%	160M	LayoutLM: Pre-training of Text and Layout for Document Image Understanding
DiT-L	92.69%	304M	DiT: Self-supervised Pre-training for Document Image Transformer
VLCDoC	93.19%	217M	VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification	-
TILT-Large	95.52%	-	Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
StrucTexTv2 (small)	93.4%	28M	StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
DocXClassifier-B	94.00%	95.4M	DocXClassifier: High Performance Explainable Deep Network for Document Image Classification	-
EAML	97.70%	-	EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification	-
AlexNet + spatial pyramidal pooling + image resizing	90.94%	-	Analysis of Convolutional Neural Networks for Document Image Classification	-
LiLT[EN-R]BASE	95.68%	-	LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

0 of 31 row(s) selected.