Language Modelling On C4
المقاييس
Perplexity
Steps
TPUv3 Hours
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | Perplexity | Steps | TPUv3 Hours | Paper Title | Repository |
---|---|---|---|---|---|
T5++ | 12.69 | 1M | 16.5K | Primer: Searching for Efficient Transformers for Language Modeling | |
Zeropoint LLM.int8 13B (vector-wise + decomp) | 12.45 | - | - | LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | |
Original T5 | 13.25 | 1M | 15.7K | Primer: Searching for Efficient Transformers for Language Modeling | |
N-Grammer 343M | 14.79 | - | - | N-Grammer: Augmenting Transformers with latent n-grams | |
LLM.float32 2.7B | 14.43 | - | - | LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | |
LLM.float32 1.3B | 15.91 | - | - | LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | |
Primer | 12.35 | 1M | 17.3K | Primer: Searching for Efficient Transformers for Language Modeling | |
N-Grammer 288M | 15.01 | - | - | N-Grammer: Augmenting Transformers with latent n-grams | |
LLM.float32 6.7B | 13.3 | - | - | LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale |
0 of 9 row(s) selected.