HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
언어모델링
Language Modelling On Text8
Language Modelling On Text8
평가 지표
Bit per Character (BPC)
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Bit per Character (BPC)
Paper Title
td-LSTM (Zhang et al., 2016)
1.63
Architectural Complexity Measures of Recurrent Neural Networks
td-LSTM-large
1.49
Architectural Complexity Measures of Recurrent Neural Networks
BFN
1.41
Bayesian Flow Networks
Unregularised mLSTM
1.40
Multiplicative LSTM for sequence modelling
BN LSTM
1.36
Recurrent Batch Normalization
LayerNorm HM-LSTM
1.29
Hierarchical Multiscale Recurrent Neural Networks
Large mLSTM +emb +WN +VD
1.27
Multiplicative LSTM for sequence modelling
Large RHN
1.27
Recurrent Highway Networks
Bipartite flows (8 flows)
1.23
Discrete Flows: Invertible Generative Models of Discrete Data
mLSTM + dynamic eval
1.19
Dynamic Evaluation of Neural Sequence Models
12-layer Character Transformer Model
1.18
Character-Level Language Modeling with Deeper Self-Attention
PAR Transformer 24B
1.18
Pay Attention when Required
GAM-RHN-10
1.157
Recurrent Highway Networks with Grouped Auxiliary Memory
64-layer Character Transformer Model
1.13
Character-Level Language Modeling with Deeper Self-Attention
12L Transformer + 8K adaptive span
1.11
Adaptive Attention Span in Transformers
BP-Transformer - 12 Layers
1.11
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
All-attention network - 18 layers
1.11
Augmenting Self-attention with Persistent Memory
Transformer-LS (small)
1.09
Long-Short Transformer: Efficient Transformers for Language and Vision
All-attention network - 36 layers
1.08
Augmenting Self-attention with Persistent Memory
Transformer-XL - 24 layers
1.08
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
0 of 24 row(s) selected.
Previous
Next