The First GOP Debate



Sentiment Analysis refers to the use of text analysis , linguistics  to extract subjective information from a text source. It is a widely used technique to understand customer sentiments to improvise on marketing strategies. This technique aims to determine the attitude of a speaker/writer with respect to a selected topic or the overall contextual polarity.

The recent rise of social media and blogs have fueled sentiment analysis practice. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations.

Here we try to look at the Twitter Data pertaining to the First GOP Debate in the August of 2015 held at Ohio. Here we try to understand the foreign policy focus of the GOP debate. With the controversial Iran Nuclear Deal, the GOP has expressed distress towards the Obama Administration. The Twitter Data can be found at Kaggle.

Here the data is made of the following columns:

[1] “id”                        “candidate”                 “candidate_confidence”     

 [4] “relevant_yn”               “relevant_yn_confidence”    “sentiment”                

 [7] “sentiment_confidence”      “subject_matter”            “subject_matter_confidence”

[10] “candidate_gold”            “name”                      “relevant_yn_gold”         

[13] “retweet_count”             “sentiment_gold”            “subject_matter_gold”      

[16] “text”                      “tweet_coord”               “tweet_created”            

[19] “tweet_id”                  “tweet_location”            “user_timezone”    

The subject matter column talks about what the corresponding Tweets talk about. The subjects are:

1] “”                                     “Abortion”                            

 [3] “Foreign Policy”                       “FOX News or Moderators”              

 [5] “Gun Control”                          “Healthcare (including Medicare)”     

 [7] “Immigration”                          “Jobs and Economy”                    

 [9] “LGBT issues”                          “None of the above”                   

[11] “Racial issues”                        “Religion”                            

[13] “Women’s Issues (not abortion though)”

The “” refers to no subject. Here as said before, we try to focus on the Foreign Policy issues.

After some filtering and analysis, we display the frequent terms that were used in Tweets related to the US Foreign Policy issues.

The big issues include Russia, China, Iran and Israel. We try to find words that are associated with these terms by setting a threshold minimum of 0.2.

Word Association with respect to the term ‘Russia’

Word Association with respect to the term ‘China’

Word Association with respect to the term ‘Israel’


Word Association with respect to the term ‘Iran’


The above data seems to be correct if we use intuition; the words ‘west’ and ‘bank’ are related to the ongoing Israel-Palestine conflict, the word ‘deal’ is related to Iran etc.

The Twitter Data is collected over a span of 12 hours. In these 12 hours, there we several aspects of the Foreign Policy issues that we raised. Now, we try our best to model that.  Topic models are  a suite of techniques that uncover the hidden structure of a set of documents . Here we use the LDA(latent Dirichlet allocation) technique to find this hidden structure. This technique assumes that a document is a mixture of small number of topics and that each word’s creation is attributable to one of the document’s topics.

Here we try to find 5 topics that might be a part of the Foreign Policy issues of the USA. 

When we try to plot this over time we get an interesting result

The tweets focused more on the ongoing insurgency in Iraq and Syria(ISIS), Russia , Iran and Israel.

Locations as seen in red

We could go a step further to locate the sources of these Tweets using the coordinates given. Here the rworldmap package in R.

How were these Tweets distributed over time?

Here we look at a rather bigger picture. How are the Tweets distributed over time? Here we try to modify the tweet_created variable to suit our needs. We are going to plot the distribution of the count of tweets.



How are the Tweets Distributed over the month?

The above plot does not tell us anything about the pattern.


What about days?


We see that the number of Tweets substantially increased on the second day , after the GOP Debate.

What about hours?

We observe that the number of Tweets were substantial in the hours 08 to 09 and in the hours 19 and 20. This could have been the hours prior to and after the debate.


Thank You for Reading and Have a nice Weekend!

The code can be found here


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s