HyperAI
HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
언어모델링
Language Modelling On Wikitext 103
Language Modelling On Wikitext 103
평가 지표
Number of params
Test perplexity
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Number of params
Test perplexity
Paper Title
Repository
Transformer-XL Large + Phrase Induction
257M
17.4
Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
-
AWD-LSTM-MoS + ATOI
-
32.85
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
-
LSTM (Hebbian)
-
34.3
Fast Parametric Learning with Activation Memorization
-
Reformer 125M
-
26.0
Reformer: The Efficient Transformer
-
LSTM
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
-
GCNN-8
-
44.9
Language Modeling with Gated Convolutional Networks
-
Transformer-XL Large
257M
18.3
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
-
GRU
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
-
Subformer
96M
20.39
Subformer: A Parameter Reduced Transformer
-
Routing Transformer
-
15.8
Efficient Content-Based Sparse Attention with Routing Transformers
-
SRU++ Base
148M
18.3
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
-
SRU++ Large
234M
17.1
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
-
DIFFQ (λ=1, g=16)
-
18.0
Differentiable Model Compression via Pseudo Quantization Noise
-
Transformer+SSA+Self-ensemble
-
17.18
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
-
GPT-2 Large
774M
22.05
Language Models are Unsupervised Multitask Learners
-
Primal.+Trans.
-
31.0
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
-
Staged Training
247M
17.56
Shortformer: Better Language Modeling using Shorter Inputs
-
Hybrid H3 (355M)
355M
16.9
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
-
Transformer-XL (RMS dynamic eval)
257M
16.4
Dynamic Evaluation of Transformer Language Models
-
Decay RNN
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
-
0 of 89 row(s) selected.
Previous
Next
Language Modelling On Wikitext 103 | SOTA | HyperAI초신경