AI for Women

Twitter Sentiment Analysis for Data Science Using Python

Twitter is a great place to share knowledge, spread awareness, let out some steam, and put forward personal opinions. As of early 2019, Twitter is said to have about 126 million daily users. This means that, by analysing the Tweets of all of these users on a particular subject, we can figure out the average overall opinion of people about that subject.

For example, during the Election Season, by studying the Tweets of people about a particular party or candidate, we can predict the possible outcome of the election. This can be done through something called ‘Sentiment Analysis’.

What is Twitter Sentiment Analysis?

Twitter Sentiment Analysis is, in very simple terms, the process of analysing people’s Tweets on a specific topic in order to understand how they feel about it.

How does it Work?

Twitter Sentiment Analysis is a part of NLP (Natural Language Processing). It uses Data Mining to develop conclusions for further use. It involves:

  1. Scraping Twitter to collect relevant Tweets as our data.
  2. Cleaning this data.
  3. Finding the polarity of each of these Tweets.
  4. Calculating the number of positive, negative, and neutral sentiments.
  5. Obtaining a conclusion on this analysis.

Applications of Twitter Sentiment Analysis

Knowing people’s sentiments about a particular topic can benefit the following sectors –

  1. Politics: As mentioned before, we can use TSA to figure out the political stance and views of majority of the citizens of the country, and to predict the possible outcome of an upcoming election. Candidates and parties can use this information to plan their campaigns accordingly.
  2. Business: Companies can use TSA to grow their business, by understanding what people feel about their products, services, ad campaigns, and new ideas. Accordingly, they can plan out their strategies, and improve their products based on people’s needs.
  3. Social Situations: Many people respond or react to social situations through Twitter, and using TSA on these Tweets can help us to understand the gravity of certain situations. For example, when a devastating Flood occurs in a country, the news of it begins trending, and this initiates public action and encourages people to contribute in their own way to help the people in need.

Now that we know what Twitter Sentiment Analysis is, we can learn the Code required to perform it.

Note: Most of the codes that I found online where outdated, or they seemed to not work on my system (still don’t know why). So after a lot of searching and testing, I finally came up with a code that works and is accurate in its results. That’s why I have put it up for you, to save you the trouble of spending a lot of time and having to figure it out for yourself.

The Twitter Sentiment Analysis Project

In this Program, we will be performing Sentiment Analysis (with Python) on Tweets that we collect about a particular topic, after which we will also use a Pie Chart, Word Cloud, and Histogram to visualise our data better.

Requirements:

  1. Twitter Developer Account (You can search online to learn how to create one).
  2. Python Libraries for Twitter.
  3. Python Libraries for Sentiment Analysis.
  4. Python Libraries for Data Visualisation.
  5. Other Python Libraries.
  6. [Optional] Jupyter Notebooks Environment (which makes coding easier).

Let us now go through the necessary Python Libraries –

TWITTER LIBRARIES
Tweepy: It allows Python to interact with Twitter and use its API.
OauthHandler: It provides token-based authentication for accessing Twitter Data.

SENTIMENT ANALYSIS LIBRARIES
TextBlob: It is an NLP library that is used to analyse textual data. In this case, it is used for sentiment analysis.

DATA VISUALISATION LIBRARIES
Matplotlib: It allows us to create 2D Graphs and plots.
Matplotlib.pyplot: It makes plotting much more convenient by enabling us to change certain features of our Graphs based on our requirements.
WordCloud: As the name suggests, it allows us to create a WordCloud with our data, i.e., a form of data visualisation that shows us the frequency of each word based on the size of its appearance.

OTHER LIBRARIES
Pandas: It is used to make our data manipulation much easier.
Re: ‘Regular Expression’ helps us to find particular strings or sets of strings in our data. Here, we use it for cleaning the data.

The Code

We shall now begin our Python Program. Start by importing all these Modules into your environment. You may need to install them first. Search online to learn how to install these modules onto your system. Don’t worry, it’s very easy.

import re 

import tweepy 

from tweepy import OAuthHandler 

from textblob import TextBlob 

import matplotlib.pyplot as plt

import pandas as pd

from wordcloud import WordCloud

Next, we define our ‘Consumer Key’, ‘Consumer Secret’, ‘Access Token’, and ‘Access Token Secret’, which we obtain from our Twitter Developer Account. Remember, all four of these are top secret, so don’t tell anyone!

consumer_key = 'xxxxx'
consumer_secret = 'xxxxx' 
access_token = 'xxxxx'
access_token_secret = 'xxxxx'

We then provide the code to authorise our access to Tweets.

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)


api = tweepy.API(auth)

We now create a function to clean up our Tweets by removing unnecessary characters and retaining the rest.

def remove_url(txt):
    return " ".join(re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", "", txt).split())

You can either input the topic from a user, or, type in your own. Just to make it more flexible, we will programme our code to input a topic from a user.

word = input("Enter the word here: ")

The program searches for relevant Tweets by filtering out the ones that do not contain the keyword, i.e., the topic that the user typed in. We have programmed it to input 100 such tweets, but you can always modify that number based on what you require.

filtered='word -filter:retweets'


tweets = tweepy.Cursor(api.search,
              q=filtered,
              lang="en").items(100)

tweets

Remove urls from the Tweets to avoid any inaccuracy.

cleantweets = [remove_url(tweet.text) for tweet in tweets]

We now create TextBlob objects for the Tweets.

sentiment_objects = [TextBlob(tweet) for tweet in cleantweets]

sentiment_objects[0].polarity, sentiment_objects[0]

After this, we create a list of Polarity values and Tweet Text. Just to check, we print the value of the ‘0th’ row. Then we print all the Sentiment Values from row 0 to row 99.

sentiment_values = [[tweet.sentiment.polarity, str(tweet)] for tweet in sentiment_objects]

sentiment_values[0]

sentiment_values[0:99]

Create a Data Frame with all these sentiment values.

sentiment_df = pd.DataFrame(sentiment_values, columns=["polarity", "tweet"])

sentiment_df

We save the Polarity column as ‘n’, and convert this into a series ‘m’.

n=sentiment_df["polarity"]

m=pd.Series(n)

m

We then initialise the variables, ‘pos’, ‘neg’, and ‘neu’.

pos=0
neg=0
neu=0

We create a ‘for loop’ to classify the sentiment values as positive (if it is greater than 0), negative (if it is less than 0), and neutral (if it is equal to 0). We also count the number of each respective class and display it.

for items in m:
    if items>0:
        print("Positive")
        pos=pos+1
    elif items<0:
        print("Negative")
        neg=neg+1
    else:
        print("Neutral")
        neu=neu+1
        
print(pos,neg,neu)

Let us not Visualise the Data.

First, we will create a pie chart to compare our results.

pieLabels=["Positive","Negative","Neutral"]

populationShare=[pos,neg,neu]

figureObject, axesObject = plt.subplots()

axesObject.pie(populationShare,labels=pieLabels,autopct='%1.2f',startangle=90)

axesObject.axis('equal')

plt.show()

We then display the percentage of Twitter users who feel a certain sentiment towards the topic.

print("%f percent twitter users feel positive about %s"%(pos,word))

print("%f percent twitter users feel negative about %s"%(neg,word))

print("%f percent twitter users feel neutral about %s"%(neu,word))

Let us now create a Histogram to compare our data.

fig, ax = plt.subplots(figsize=(8, 6))

# Plot histogram of the polarity values
sentiment_df.hist(bins=[-1, -0.75, -0.5, -0.25, 0.25, 0.5, 0.75, 1],
             ax=ax,
             color="purple")

plt.title("Sentiments from the Tweets")
plt.show()

Finally, we will create a Word Cloud for our data.

all_words = ' '.join([text for text in cleantweets])
wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).generate(all_words)

plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()

There you have it! You have now completed your very own Twitter Sentiment Analysis Project using Python. It’s much easier than it sounds, and it looks very impressive on your Resume. You can also access a lot of data and do plenty of personal research using Twitter. Just remember to follow the guidelines and rules provided by the website.

If you would like to learn how to generate live Tweets from a particular Twitter Handle, visit this Github Repository:
https://github.com/nikitasilaparasetty/Quick-and-Easy-Twitter-Sentiment-Analysis-Projects/blob/master/My%20Project%206%20(b)%20-%20Twitter%20Sentiment%20Analysis%20-%20Generating%20Live%20Tweets%20from%20a%20Twitter%20Handle.ipynb

New Report

Close