HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Language Modelling
Language Modelling On Wikitext 103
Language Modelling On Wikitext 103
Metriken
Number of params
Test perplexity
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Number of params
Test perplexity
Paper Title
Repository
Transformer-XL Large + Phrase Induction
257M
17.4
Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
AWD-LSTM-MoS + ATOI
-
32.85
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
LSTM (Hebbian)
-
34.3
Fast Parametric Learning with Activation Memorization
-
Reformer 125M
-
26.0
Reformer: The Efficient Transformer
LSTM
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
GCNN-8
-
44.9
Language Modeling with Gated Convolutional Networks
Transformer-XL Large
257M
18.3
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
GRU
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
Subformer
96M
20.39
Subformer: A Parameter Reduced Transformer
-
Routing Transformer
-
15.8
Efficient Content-Based Sparse Attention with Routing Transformers
SRU++ Base
148M
18.3
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
SRU++ Large
234M
17.1
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
DIFFQ (λ=1, g=16)
-
18.0
Differentiable Model Compression via Pseudo Quantization Noise
Transformer+SSA+Self-ensemble
-
17.18
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
GPT-2 Large
774M
22.05
Language Models are Unsupervised Multitask Learners
-
Primal.+Trans.
-
31.0
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Staged Training
247M
17.56
Shortformer: Better Language Modeling using Shorter Inputs
Hybrid H3 (355M)
355M
16.9
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Transformer-XL (RMS dynamic eval)
257M
16.4
Dynamic Evaluation of Transformer Language Models
Decay RNN
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
0 of 89 row(s) selected.
Previous
Next