Sentiment Analysis in Data Science using VADER

Have you ever written a 10-paragraph-review on Amazon because you felt so unsatisfied with a specific product? Do you think there’s someone at Amazon’s office right now reading your reviews and knowing exactly how you feel? With today’s technology and environment, companies might have mountains of customer feedback collected. Yet for mere humans, it’s still impossible to analyze it manually without any sort of error or bias.

Sentiment analysis is contextual mining of text which identifies and extracts subjective information in source material, and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations.

For this short project (GitHub repo), I am going to show you how to analyze the sentiments of customers who use Amazon Alexa using VADER.

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media and customer reviews. VADER is used to quantify how much of positive or negative emotion the text has and also the intensity of emotion. It does not require any training data. Moreover, VADER can understand the sentiment of a text containing emoticons, slangs, conjunctions, capital words, punctuations and much more so it works very well on social media text. To calculate the sentimental score of the entire text, VADER scans the text for known sentimental features, modified the intensity and polarity according to the rules, summed up the scores of features found within the text and normalized the final score to -1 (most extreme negative) and +1 (most extreme positive).

Let’s get started by importing necessary libraries and packages.

The dataset I’m using for the task of sentiment analysis of Amazon Alexa reviews was collected from Kaggle. It contains data about ratings between 1 and 5, the date of reviews, and customer feedback on their experience with variety of Alexa products.

The dataset’s rating column contains the ratings given by the users of Amazon Alexa on a scale of 1 to 5, where 5 is the best rating a user can give.

From the above figure, we can see that most of the customers have rated “5” for their Amazon Alexa. So it means that most of the customers are happy with this product.

Now let’s move on to the task of sentiment analysis of Alexa’s reviews. The “verified_reviews” column of the dataset contains all the reviews given by Amazon Alexa’s customers. So let’s add new columns to this data as positive, negative, neutral, and compound by calculating the sentiment scores of the reviews:

Overall, customers have neutral to great experience with Amazon Alexa, over 1000 reviews are positive and only about ~97 out of 3150 reviews are negative.

Finally, let’s have a look at the compound score for the positive and negative labels.

As we can see from the box plot above, the positive labels achieved much higher compound score and the majority is higher than 0.5. The negative labels got a lower compound score, but it’s fairly neutral with mean and median stay around 0.

In conclusion, sentiment analysis helps businesses improve their customer service and products since it can give them a sneak peek into their customer’s emotions and satisfaction levels. If you like this post, also check out Andy Kim’s TEDx Talk about harnessing the power of machine learning to analyze human communication.

Data Scientist with finance and client service background