GPT-3 175B (1 shot) | 71.2 | Language Models are Few-Shot Learners | |
ST-MoE-32B 269B (fine-tuned) | 95.2 | ST-MoE: Designing Stable and Transferable Sparse Expert Models | - |
SparseGPT (175B, 4:8 Sparsity) | 68.35 | SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
LLaMA 13B + CFG (0-shot) | 79.1 | Stay on topic with Classifier-Free Guidance | - |
LLaMA 65B + CFG (0-shot) | 84.2 | Stay on topic with Classifier-Free Guidance | - |
LLaMA 3 8B+MoSLoRA (fine-tuned) | 90.5 | Mixture-of-Subspaces in Low-Rank Adaptation | |