Sentiment Analysis of Movie Reviews with Google’s BERT

Performing NLP tasks by leveraging transformer encoder architecture (BERT) against the traditional LSTM’s

5 min readDec 10, 2021

BERT is currently being used at Google to optimize the interpretation of user search queries. BERT excels at several functions that make this possible, including:

Sequence-to-sequence based language generation tasks such as:

Question answering
Abstract summarization
Sentence prediction
Conversational response generation

BERT is based on the transformer encoder architecture. The important question is why transformer architecture was needed and how they are better than the traditional LSTM RNN’s.

The shortcoming of the LSTM RNN’s

LSTM’s are computationally expensive hence they are slow.
They are not truly bidirectional i.e. learning in both directions happens independently. Hence true context is slightly lost.
As the data is read sequentially they don't do well on GPU which prefer parallel processing.

Here come the saviour — Transformers

Transformer tends to do better than LSTM’s in the following ways

They are capable of processing words simultaneously which leads to less computation and promotes parallel processing hence GPU friendly.
They tend to learn the context of the text better.

The transformer contains two blocks — encoder and decoder block

Now you can learn in detail the functioning of the encoder and decoder here

I am trying to give you intuition below.

Encoder takes the input simultaneously and processes them to word embeddings. The embedding is the vectors that have the true meaning of the words.

Decoder takes the input from the encoder and previously learning sentences and then generates the new words.

When we stack up the encoder we get BERT — Bi-directional encoder representation of transformers.

In the current article, we try to predict move review sentiments using BERT.

About the Dataset
Data Import
Creating Training and Validation Set
Select Bert and Preprocessing module
Passing Data to Preprocessing Module & Bert
Training and Evaluations
Saving and Re-Loading the Model
Predictions
Summary
References

№1: About the Dataset

This notebook trains a sentiment analysis model to classify movie reviews as positive or negative, based on the text of the review.

The dataset contains over 50,000 movie reviews collected from the IMDB movie database. The dataset can be downloaded here

№2: Data Import

For processing the textual information, the following libraries need to be installed

TensorFlow
models — for accessing Bert

Now we download the dataset using the Keras utility. We then access the training dataset and remove the unwanted files.

№3: Creating Training and Validation Set

Now we define the training, validation and test sets. Now there are caveats before you use the Keras preprocessing _text_data_from_directory_ method to define the training, validation and test sets. The way we define the training and validation sets are pretty standard nothing fancy.

caveat — the reviews should be under the target folder. i.e. if there is a positive review it must be under pos folder similarly negative review must be under neg folder

accessing a few samples along with its target values

№4: Select Bert and Preprocessing module

Out of the different variations available for Bert, you can select the required configuration from the dropdown.

As we have limited computing available it is advisable to use the model which is processing the lowest amount of parameters hence I use a model ending with A-2.

A preprocessing model is also required to be selected based on the Bert model selected. The below utility helps in auto-selecting the right preprocessing model. The importance of the pre-processing model is explained in upcoming cells.

Note in the above code block the text look petty haywire but if you open it in google colab this look better some like this

№5: Passing Data to Preprocessing Module & Bert

Below we pass a sample text to the preprocessing model. This model accepts 128 lengths of inputs hence the preprocessing is done up to 128 words. The preprocessing model converts the text into 3 keys -

input_mask — here each word is displayed as 1 but you will notice there are 2 additional 1’s. for instance for hello world we have four 1’s instead of 2 which is one for each word. The way Bert works is it put one special token in the beginning and one separator token in the end. Hence we see 2 additional token values for every input sentence.
input_type_ids — This is usually zero
input_work_ids — this represents the token values for each word. not the first word i.e. the special token will always be 101 and the separator token, in the end, would always be 102.

Using the BERT model

Bert output is passed to the neural network and the output probability is calculated. BERT performs the task of word embedding but after that, the rest of the activity is taken care of by a neural network.

We calculate the loss using binary cross-entropy for 2 class classification model

№6: Training and Evaluations

We train the model for 3 epochs and define some hyperparameters. Let's not go to in details of these parameters and keep this tutorial simple

Train and validation loss seems to be converging well, hence our model does not seem to be overfitting which is great news.

№7: Saving and Re-Loading the Model

We now learn to save and reuse the saved model that helps us in saving up all the training time. To save the model we use .save method and to reload the model we use .saved_model.load method

You see below both the saved and reloaded model have the same prediction which confirms the saved model is working as expected.

№8: Predictions

We check our prediction on the below samples

№9: Summary

We performed the sentiment classification using the Bert models by following steps -

Imported the dataset to our environment.
we used Keras utility function tf.keras.preprocessing.text_dataset_from_directory to create training and validation sets
Now we select the BERT model and corresponding preprocessing model.
We parsed the input data to the pre-processing model and understood the various key generated.
We then passed the output of bert to a neural network to make the predictions.
We now trained the model and defined the hyperparameter to default values.
We also learned to save and reload the same model
Finally, we make predictions using the trained model.

№10: References

I hope you are having a wonderful time reading the blog. If you like what to learn today please feel free to give a 👏.

Please feel free to reach out to me over LinkedIn

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com