HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Language Modelling
Language Modelling On Wikitext 103
Language Modelling On Wikitext 103
평가 지표
Number of params
Test perplexity
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Number of params
Test perplexity
Paper Title
Repository
Transformer-XL Large + Phrase Induction
257M
17.4
Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
AWD-LSTM-MoS + ATOI
-
32.85
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
LSTM (Hebbian)
-
34.3
Fast Parametric Learning with Activation Memorization
-
Reformer 125M
-
26.0
Reformer: The Efficient Transformer
LSTM
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
GCNN-8
-
44.9
Language Modeling with Gated Convolutional Networks
Transformer-XL Large
257M
18.3
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
GRU
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
Subformer
96M
20.39
Subformer: A Parameter Reduced Transformer
-
Routing Transformer
-
15.8
Efficient Content-Based Sparse Attention with Routing Transformers
SRU++ Base
148M
18.3
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
SRU++ Large
234M
17.1
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
DIFFQ (λ=1, g=16)
-
18.0
Differentiable Model Compression via Pseudo Quantization Noise
Transformer+SSA+Self-ensemble
-
17.18
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
GPT-2 Large
774M
22.05
Language Models are Unsupervised Multitask Learners
-
Primal.+Trans.
-
31.0
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Staged Training
247M
17.56
Shortformer: Better Language Modeling using Shorter Inputs
Hybrid H3 (355M)
355M
16.9
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Transformer-XL (RMS dynamic eval)
257M
16.4
Dynamic Evaluation of Transformer Language Models
Decay RNN
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
0 of 89 row(s) selected.
Previous
Next