3 years ago

Md. Jahidul Islam Razin M. F. Mridha Md. Abdul Karim S M Rafiuddin Tahira Alam

Long Short-Term Memory Networks (LSTM)

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)

Table of Contents

Abstract

Business sentiment analysis (BSA) is one of the significant and popular topics of natural language processing. It is one kind of sentiment analysis techniques for business purposes. Different categories of sentiment analysis techniques like lexicon-based techniques and different types of machine learning algorithms are applied for sentiment analysis on different languages like English, Hindi, Spanish, etc. In this paper, long short-term memory (LSTM) is applied for business sentiment analysis, where a recurrent neural network is used. An LSTM model is used in a modified approach to prevent the vanishing gradient problem rather than applying the conventional recurrent neural network (RNN). To apply the modified RNN model, product review dataset is used. In this experiment, 70% of the data is trained for the LSTM and the rest 30% of the data is used for testing. The result of this modified RNN model is compared with other conventional RNN models, and a comparison is made among the results. It is noted that the proposed model performs better than the other conventional RNN models. Here, the proposed model, i.e., the modified RNN model approach has achieved around 91.33% of accuracy.

One-sentence Summary

The authors propose a modified long short-term memory (LSTM) recurrent neural network that mitigates the vanishing gradient problem of conventional architectures to improve business sentiment analysis, achieving approximately 91.33% accuracy on a product review dataset split 70% for training and 30% for testing while outperforming standard RNN baselines.

Key Contributions

This work proposes a modified recurrent neural network architecture that integrates long short-term memory (LSTM) units to classify product reviews into positive, neutral, and negative categories.
The modified architecture mitigates the vanishing gradient problem inherent in conventional recurrent networks, enabling the model to capture both short- and long-term dependencies in sequential text.
Evaluations on a product review dataset demonstrate that the proposed model achieves 91.33% accuracy, outperforming standard recurrent neural networks and feed-forward network baselines.

Introduction

Business sentiment analysis automates the extraction of customer opinions from massive volumes of unstructured text, enabling companies to forecast market trends, refine marketing strategies, and track consumer behavior. Traditional machine learning approaches rely on manual feature engineering and struggle with scale, while conventional recurrent neural networks frequently fail to capture long-range textual dependencies due to the vanishing gradient problem. The authors leverage a modified recurrent neural network architecture built on Long Short-Term Memory units to effectively model sequential data while resolving gradient degradation. Applied to a product review dataset, their approach achieves 91.33 percent accuracy, outperforming standard RNN and feed-forward baselines and providing a reliable tool for automated three-way sentiment classification.

Dataset

Dataset Composition & Sources: The authors source their data from the Amazon Review Information Dataset (ARD), originally compiled via web scraping and APIs. While the full ARD contains 142.8 million ratings with extensive metadata, the authors extract a focused subset of 25,000 product reviews.
Subset Breakdown & Split: The selected reviews are categorized into positive, neutral, and negative sentiment classes. The authors partition this subset into a 70% training set and a 30% testing set.
Text Cleaning & Preprocessing: Raw review text undergoes strict cleaning to strip HTML tags and punctuation, which are replaced with spaces. Single-character tokens and multiple consecutive spaces are then removed. The authors note that the cleaned text is naturally divided into emotion-based business categories. No cropping strategy is applied, and the original ARD metadata is acknowledged but not integrated into the processing pipeline.
Vectorization, Model Integration & Training: Each token is converted into a fixed-dimensional vector ( $v \in \mathbb{R}^{1 \times d}$ ) using Word2Vec embeddings. The authors feed these sequences sequentially into an LSTM network, which processes the data left-to-right. The LSTM output passes through a Dense layer with sigmoid activation to generate a final probability score between 0.0 and 1.0. The model is trained for up to 50 epochs to prevent overfitting, ultimately achieving approximately 96.23% training accuracy and 91.33% testing accuracy.

Method

The authors leverage long short-term memory (LSTM) networks to address the limitations of standard recurrent neural networks (RNNs) in capturing long-term dependencies within sequential text data for business sentiment analysis. The core of the approach lies in replacing the standard RNN cell with an LSTM cell, which incorporates gated mechanisms to control information flow and mitigate the vanishing gradient problem. Refer to the framework diagram for an overview of the RNN architecture, where the recurrent weight matrix A propagates information through time. The LSTM cell, as shown in the diagram , replaces the simple hidden layer with a memory block containing specialized gates. This block processes the current input $X(t)$ , the previous hidden state $h(t-1)$ , and the previous cell state $c(t-1)$ to produce the updated cell state $c(t)$ and the current hidden state $h(t)$ .

The LSTM computation proceeds in four main steps. First, the forget gate $f_t$ and input gate $i_t$ are computed using sigmoid activation functions, which determine what information to discard from the previous cell state and what new information to store, respectively. The equations for these gates are $f_t = \sigma(x_t U^f + h_{t-1} W^f)$ and $i_t = \sigma(x_t U^i + h_{t-1} W^i)$ . Second, the cell state is updated by combining the previous cell state $C_{t-1}$ with the candidate cell state $\tilde{C}_t$ , which is generated using a hyperbolic tangent activation function on the combined input and hidden state: $\tilde{C}_t = \tanh(x_t U^g + h_{t-1} W^g)$ . The updated cell state is then $C_t = f_t \times C_{t-1} + i_t \times \tilde{C}_t$ . Third, the output gate $o_t$ is computed using a sigmoid activation function: $o_t = \sigma(x_t U^o + h_{t-1} W^o)$ . Finally, the new hidden state $h_t$ is produced by applying a hyperbolic tangent to the updated cell state and multiplying it by the output gate activation: $h_t = \tanh(C_t) \times o_t$ . This process is illustrated in the detailed LSTM cell diagram .

The overall model architecture for sentiment analysis, as depicted in the diagram , consists of multiple LSTM layers. The input sequence is fed into the first LSTM layer, which processes it to generate a sequence of hidden states. These hidden states are then passed to subsequent LSTM layers, allowing the model to capture increasingly complex features. After the final LSTM layer, a dense layer with a sigmoid activation function is applied to produce the final output. The model is trained using a multi-model approach, where separate LSTM models are trained on data categorized as positive, negative, and neutral. For a new input review, each trained model evaluates the review, and the model with the smallest error value is selected to assign the sentiment label. This architecture is designed to overcome the vanishing gradient problem and effectively handle the sequential nature of text data, enabling robust performance in business sentiment analysis.

Experiment

The evaluation setup involves testing the trained RNN-LSTM architecture on previously unseen product reviews to validate its core capability for automated business sentiment classification. The initial experiments confirm that the model reliably maps novel text to distinct sentiment categories by applying probability thresholds, demonstrating strong generalization. Subsequent comparative analysis further validates its practical superiority, as the architecture consistently outperforms established baseline methods across standard accuracy benchmarks.

The authors evaluate the performance of their LSTM model through training and testing phases, demonstrating consistent improvements in both accuracy metrics over epochs. Results show that the model achieves high testing accuracy, outperforming other models mentioned in the comparison section. The model's testing accuracy improves as training progresses over epochs. The model achieves higher accuracy compared to other sentiment classification models. Training accuracy consistently exceeds testing accuracy across all epochs.

The authors describe a model that classifies product reviews into sentiment categories using an LSTM-based approach. The model is evaluated on unseen data, with classification thresholds defined based on prediction probabilities, and it achieves higher accuracy compared to other models mentioned in the literature. The model uses an LSTM architecture with a dense output layer for sentiment classification. Classification decisions are based on prediction probability thresholds for different sentiment levels. The proposed model outperforms other models in terms of accuracy compared to existing approaches.

The authors evaluate a model for business sentiment analysis that classifies product reviews into categories such as excellent, good, bad, and very bad based on prediction probability thresholds. The model achieves high accuracy, outperforming other existing models in sentiment classification tasks. The model classifies reviews into sentiment categories using probability thresholds, with higher values indicating more positive sentiment. The model achieves higher accuracy compared to other models, including KNN and SVM-based approaches. The classification system uses a multi-valued encoding scheme for sentiment categories, including positive, neutral, and negative.

The authors evaluate an LSTM-based sentiment classification model through iterative training and testing phases on unseen product review data. The experiments demonstrate that the architecture consistently improves predictive accuracy over successive epochs while maintaining stable generalization between training and testing performance. By leveraging probability thresholds for multi-category sentiment encoding, the approach effectively captures nuanced review classifications. Overall, the model validates its superiority by consistently outperforming traditional baselines such as KNN and SVM across all tested scenarios.

Source PDF

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

3 years ago

Md. Jahidul Islam Razin M. F. Mridha Md. Abdul Karim S M Rafiuddin Tahira Alam

Long Short-Term Memory Networks (LSTM)

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)

Go to Notebook

Table of Contents

Abstract

One-sentence Summary

Key Contributions

This work proposes a modified recurrent neural network architecture that integrates long short-term memory (LSTM) units to classify product reviews into positive, neutral, and negative categories.
The modified architecture mitigates the vanishing gradient problem inherent in conventional recurrent networks, enabling the model to capture both short- and long-term dependencies in sequential text.
Evaluations on a product review dataset demonstrate that the proposed model achieves 91.33% accuracy, outperforming standard recurrent neural networks and feed-forward network baselines.

Introduction

Dataset

Dataset Composition & Sources: The authors source their data from the Amazon Review Information Dataset (ARD), originally compiled via web scraping and APIs. While the full ARD contains 142.8 million ratings with extensive metadata, the authors extract a focused subset of 25,000 product reviews.
Subset Breakdown & Split: The selected reviews are categorized into positive, neutral, and negative sentiment classes. The authors partition this subset into a 70% training set and a 30% testing set.
Text Cleaning & Preprocessing: Raw review text undergoes strict cleaning to strip HTML tags and punctuation, which are replaced with spaces. Single-character tokens and multiple consecutive spaces are then removed. The authors note that the cleaned text is naturally divided into emotion-based business categories. No cropping strategy is applied, and the original ARD metadata is acknowledged but not integrated into the processing pipeline.
Vectorization, Model Integration & Training: Each token is converted into a fixed-dimensional vector ( $v \in \mathbb{R}^{1 \times d}$ ) using Word2Vec embeddings. The authors feed these sequences sequentially into an LSTM network, which processes the data left-to-right. The LSTM output passes through a Dense layer with sigmoid activation to generate a final probability score between 0.0 and 1.0. The model is trained for up to 50 epochs to prevent overfitting, ultimately achieving approximately 96.23% training accuracy and 91.33% testing accuracy.

Method

Experiment

Source PDF

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

A Long Short-Term Memory (LSTM) Model for Business Sentiment Analysis Based on Recurrent Neural Network

Md. Jahidul Islam Razin M. F. Mridha Md. Abdul Karim S M Rafiuddin Tahira Alam

Long Short-Term Memory Networks (LSTM)

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

A Long Short-Term Memory (LSTM) Model for Business Sentiment Analysis Based on Recurrent Neural Network

Md. Jahidul Islam Razin M. F. Mridha Md. Abdul Karim S M Rafiuddin Tahira Alam

Long Short-Term Memory Networks (LSTM)

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

A Long Short-Term Memory (LSTM) Model for Business Sentiment Analysis Based on Recurrent Neural Network

Md. Jahidul Islam Razin M. F. Mridha Md. Abdul Karim S M Rafiuddin Tahira Alam

Long Short-Term Memory Networks (LSTM)

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters