HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
언어모델링
Language Modelling On Enwiki8
Language Modelling On Enwiki8
평가 지표
Bit per Character (BPC)
Number of params
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Bit per Character (BPC)
Number of params
Paper Title
LSTM (7 layers)
1.67
-
Generating Sequences With Recurrent Neural Networks
Hypernetworks
1.34
27M
HyperNetworks
SHA-LSTM (4 layers, h=1024, no attention head)
1.33
51M
Single Headed Attention RNN: Stop Thinking With Your Head
LN HM-LSTM
1.32
35M
Hierarchical Multiscale Recurrent Neural Networks
ByteNet
1.31
-
Neural Machine Translation in Linear Time
Recurrent Highway Networks
1.27
46M
Recurrent Highway Networks
Large FS-LSTM-4
1.25
47M
Fast-Slow Recurrent Neural Networks
Large mLSTM
1.24
46M
Multiplicative LSTM for sequence modelling
AWD-LSTM (3 layers)
1.232
47M
An Analysis of Neural Language Modeling at Multiple Scales
Cluster-Former (#C=512)
1.22
-
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
LSTM
1.195
48M
Mogrifier LSTM
Mogrifier LSTM
1.146
48M
Mogrifier LSTM
64-layer Character Transformer Model
1.11
44M
Character-Level Language Modeling with Deeper Self-Attention
SHA-RNN (4 layers, h=1024, single attention head)
1.076
52M
Single Headed Attention RNN: Stop Thinking With Your Head
SHA-RNN (4 layers, h=1024, attention head per layer)
1.068
54M
Single Headed Attention RNN: Stop Thinking With Your Head
Transformer-XL (12 layers)
1.06
41M
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer (64 layers)
1.06
235M
Character-Level Language Modeling with Deeper Self-Attention
Skip Cross-Head Transformer-XL
1.033
41M
Memory-efficient Stochastic methods for Memory-based Transformers
Transformer-XL (18 layers)
1.03
88M
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer+SSA
1.024
-
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
0 of 42 row(s) selected.
Previous
Next
Language Modelling On Enwiki8 | SOTA | HyperAI초신경