HyperAI超神经

摘要

在文档级情感分类任务中，需将每篇文档映射为一个固定长度的向量。文档嵌入模型（document embedding models）通过将文档映射到连续向量空间中的稠密低维向量来实现这一目标。本文提出采用余弦相似度（cosine similarity）而非点积（dot product）来训练文档嵌入表示。在IMDB数据集上的实验结果表明，相较于使用点积，采用余弦相似度可提升分类准确率。此外，结合朴素贝叶斯加权n-gram词袋模型（Naive Bayes weighted bag of n-grams）的特征融合方法，达到了新的最优性能，准确率提升至97.42%。所有实验的复现代码已公开，可通过 https://github.com/tanthongtan/dv-cosine 获取。

摘要

Tan Thongtan Tanasanee Phienthrakul

摘要

用 AI 构建 AI

HyperAI Newsletters

Tan Thongtan Tanasanee Phienthrakul

摘要

用 AI 构建 AI

HyperAI Newsletters

Tan Thongtan Tanasanee Phienthrakul

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

基于余弦相似度训练的文档嵌入进行情感分类

Tan Thongtan Tanasanee Phienthrakul

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

基于余弦相似度训练的文档嵌入进行情感分类

Tan Thongtan Tanasanee Phienthrakul

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

基于余弦相似度训练的文档嵌入进行情感分类

Tan Thongtan Tanasanee Phienthrakul

摘要

用 AI 构建 AI

HyperAI Newsletters