HyperAI

Efficient Vits On Imagenet 1K With Deit T

Metrics

GFLOPs
Top 1 Accuracy

Results

Performance results of various models on this benchmark

Model Name
GFLOPs
Top 1 Accuracy
Paper TitleRepository
SPViT (1.0G)1.072.2SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
LTMP (60%)0.871.5Learned Thresholds Token Merging and Pruning for Vision Transformers
MCTF ($r=20$)0.671.4Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
PS-ViT0.772.0Patch Slimming for Efficient Vision Transformers-
LTMP (45%)0.769.8Learned Thresholds Token Merging and Pruning for Vision Transformers
ToMe ($r=16$)0.670.7Token Merging: Your ViT But Faster
DPS-ViT0.672.1Patch Slimming for Efficient Vision Transformers-
MCTF ($r=8$)1.072.9Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
BAT0.872.3Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
ToMe ($r=12$)0.871.4Token Merging: Your ViT But Faster
SPViT (0.9G)0.972.1SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
S$^2$ViTE0.970.1Chasing Sparsity in Vision Transformers: An End-to-End Exploration
EvoViT0.872.0Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
HVT-Ti-10.669.6Scalable Vision Transformers with Hierarchical Pooling
SPViT1.070.7Pruning Self-attentions into Convolutional Layers in Single Path
Base (DeiT-T)1.272.2Training data-efficient image transformers & distillation through attention-
eTPS0.872.3Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
MCTF ($r=16$)0.772.7Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
LTMP (80%)1.072.0Learned Thresholds Token Merging and Pruning for Vision Transformers
PPT0.872.1PPT: Token Pruning and Pooling for Efficient Vision Transformers
0 of 22 row(s) selected.