Single layer bilstm distilled from BERT | 90.7 | Distilling Task-Specific Knowledge from BERT into Simple Neural Networks | |
USE_T+CNN (lrn w.e.) | 87.21 | Universal Sentence Encoder | |
CNN-multichannel [kim2013] | 88.1 | Convolutional Neural Networks for Sentence Classification | |