TWITTER SENTIMENT ANALYSIS WITH LSTM
Today, every second, millions of people are expressing their opinions on any product, topic or person on Twitter. This social media organization, which has become a giant dataset of ideas and comments, provides us with information that can result in a strong performance evaluation about the success of the business, as a result of positive or negative labeling of comments on any topic. It means that Twitter sentiment analysis allows you to keep track of what's being said about your product or service on social media, and can help you detect angry customers or negative mentions before they they escalate.
In this project, a Deep Learning model was created that can detect whether a random tweet is positive or negative regardless of the subject and product.
Dataset is sample twitter dataset imported from nltk-corpus which has 10000 tweets(5000 negative,5000 positive).
positive tweets:
negative tweets:
MAIN STEPS OF THE PROJECT:
-
Preprocessing:
-
Some preprocess steps are applied into words of tweet sentences due to get best numeric representation for training of Deep Learning model. These are:
- remove stopwords, remove redundant frequent tweet words or chars(RT, https, # etc.),apply stemmizer, remove punctiations.
-
All sentences are tokenized into words.
-
Adding <start> <end> tags to all sentences. (it is useful for understanding of the model to where is the exact sentence in the vector which has fixed size for all sentence representation after padding.)
2. Encoding:
-
After preprocessing step, a corpus is created which includes all preprocessed words from positive and negative tweet sentences.
-
All words in corpus are represented with a unique number randomly and defined in dictionary.
-
All sentences are converted to numeric type using this encoding dictionary.
-
Padding is applied to last numeric sentence vectors based on maximum sentence length.(This step is necessary, because size of input layer in the neural network is fixed, so all input data must has its fixed size)
-
Sentence labels(classes) are added as last column of each sentence vectors.
3. Creating Dataset:
-
Structural data is separated as train and test data and converts into tensor dataset format, then grouped within themselves depending on the batch size value.
4. Creating LSTM Model:
-
Since the order and position of the words in the sentence carries the actual information to be extracted from the sentence, the LSTM model was used, which is suitable for data containing a series.
-
LSTM model is configured. Components of the model decided in this configuration are:
- type of layers (embedding, LSTM, linear etc.)
- number of layers (1 embedding layer, 2 linear layer, 4 LSTM layer)
- dimension of layers (ex: how many neurons embedding layer has)
- type of activation functions (ex: last neuron is activated by sigmoid, because last neuron must give binary solution)
- Hyperparameters are determined. These are:
- Learning rate (0.001)
- Epochs (10)
- Optimizer function (Adam Optimizer)
- Loss function (Binary Cross Entropy Loss)
3. Training and Evaluation:
-
After the weight update process in each epoch, the trained model was evaluated and according to results, it seems that the global minimum was reached in a short time in the cost function.
Evaluation result for each epoch:
BONUS:
Let’s give an example tweet to our trained model and see what is happening:
Here is the sample tweet:
After preprocessing and encoding, it looks like this:
And when this vector is used as input to the model…
The result says that the example tweet is 94% positive !