Showing posts with label analysis. Show all posts
Showing posts with label analysis. Show all posts

Friday, September 15, 2017

Twitter Sentiment Analysis of March Madness

Twitter Sentiment Analysis of March Madness


With March Madness officially over (and congrats to the Louisville Cardinals on a great season; thanks for winning my bracket pool), I thought I should pass along this article I came across regarding an application of Twitter sentiment analysis in the world of sports. The article at this link: http://www.vertica.com/2013/03/20/a-method-to-the-march-madness/ details how researchers used Twitter data to do sentiment analysis about NCAA basketball. Their model and results were presented recently at the MIT Sloan Sports Conference (which I really want to attend some day) in order to see if Twitter sentiment could predict the level of success of certain teams. The researchers hypothesized that those teams or players with a large number of tweets about them were more likely to be more successful due to the large number of people talking about them. Unfortanately, since the Sloan Conference was held before the actual start of what is commonly know as "March Madness" aka the NCAA Mens Basketball Tournament, the Twitter data used was an approximately one-week sample spanning the end of the regular season and beginning of the conference tournament games.
 
Michigans Trey Burke, Source: http://isportsweb.com/2011/11/15/michigan-basketball-trey-burke-leads-wolverines-victory/

The researchers limited the tweets they looked at to teams which were ranked in the top 25 of the AP poll at the time the data was collected, as well as top scorers from across college basketball. The researchers were able to data-mine almost 500,000 tweets in the week-long period and they show some interesting results when breaking the sentiment analysis down by team and player. Unsurprisingly, Michigans Trey Burke was the leader in positive sentiment analysis during the time period. This should not be a ground-breaking revelation because at the conclusion of the season he was named the National Player of the Year by the AP, and anyone who watched him carry Michigan to the national title game the past 2+ weeks knows how good he is. Apparently though, being good means getting a lot of Twitter love. Most of the players heading the largest sentiment graphic were well-known stars, although I did notice an absence of players from smaller schools having big years, such as Creightons Doug McDermott. I guess when you are not on TV every week the Twitterverse isnt all that interested in you. The other player I found a surprising amount of love for was Kentucky center Nerlens Noel. For those of you who dont know, Noel injured a knee earlier in the year and was not even playing at the time the data was mined, but still got a surprising amount of sentiment. My only guess is that UK fans were collectively whining about how the would have made the tourney had he not been injured.

Sad UK Fan, Source: http://www.crimsoncast.com/2012/03/pre-game-meal-kentucky-wants-revenge/imagescau0mx6c-2/

Data about individual teams is only briefly mentioned in the article, with only a supporting graphic showing the sentiment for the Kansas Jayhawks, but the authors did include a note that traditional powerhouse teams led the way in tweet volume, which was not at all surprising. The final thing that really surprised me was the volume of tweets coming from overseas, particularly London. The United Kingdom is not generally know as a basketball fan country, sticking mostly to soccer, but tweets out of London outpaced many major American cities, which was surprising to say the least. Even more impressive is that most people there are asleep when games are being played over here, meaning the Brits are dedicated enough to check up on happenings around college hoops the next day, then tweet about it retroactively. I got some cool insights out of this article, and it concludes with a challenge to all of us to try and use their HP Vertica platform to generate a model combining Twitter sentiment analysis and statistics to try and predict the winner. Its a little late to try it out until next year, but I will be keeping a close eye on their blog to see if anyone gave it their best shot.

download file now

Read more »

Sunday, August 6, 2017

Trident ML Sentiment Analysis Classifier

Trident ML Sentiment Analysis Classifier


Trident-ML comes with a pre-trained twitter sentiment classifier, this post shows how to use this classifier to perform sentiment analysis in Storm.

This post shows some very basic example of how to use the pre-trained twitter sentiment classifier in Trident-ML to classifier sentiment of text which will return true (positive) or false (negative).

Firstly create a Maven project (e.g. with groupId="com.memeanalytics" artifactId="trident-sentiment-classifier"). The complete source codes of the project can be downloaded from the link:

https://dl.dropboxusercontent.com/u/113201788/storm/trident-sentiment-classifier.tar.gz

For the start we need to configure the pom.xml file in the project.

Configure pom.xml:

Firstly we need to add the clojars repository to the repositories section:

<repositories>
<repository>
<id>clojars</id>
<url>http://clojars.org/repo</url>
</repository>
</repositories>

Next we need to add the storm dependency to the dependencies section (for storm):

<dependency>
<groupId>storm</groupId>
<artifactId>storm</artifactId>
<version>0.9.0.1</version>
<scope>provided</scope>
</dependency>

Next we need to add the strident-ml dependency to the dependencies section (for text classification):

<dependency>
<groupId>com.github.pmerienne</groupId>
<artifactId>trident-ml</artifactId>
<version>0.0.4</version>
</dependency>

Next we need to add the exec-maven-plugin to the build/plugins section (for execute the Maven project):

<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<executable>java</executable>
<classpathScope>compile</classpathScope>
<mainClass>com.memeanalytics.trident_sentiment_classifier.App</mainClass>
</configuration>
</plugin>

Next we need to add the maven-assembly-plugin to the build/plugins section (for packacging the Maven project to jar for submitting to Storm cluster):

<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2.1</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>

Sentiment Classification in Trident topology using Trident-ML implementation

Once the pom.xml update is completed, we can build a Trident topology which uses TwitterSentimentClassifier in a DRPCStream to classify text sentiment in Trident-ML. This is implemented in the main class shown below:

package com.memeanalytics.trident_sentiment_classifier;

import com.github.pmerienne.trident.ml.nlp.TwitterSentimentClassifier;

import storm.trident.TridentTopology;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.LocalDRPC;
import backtype.storm.generated.StormTopology;
import backtype.storm.tuple.Fields;

public class App
{
public static void main( String[] args )
{
LocalDRPC drpc=new LocalDRPC();

LocalCluster cluster=new LocalCluster();
Config config=new Config();

cluster.submitTopology("SentimentClassifierDemo", config, buildTopology(drpc));

try{
Thread.sleep(2000);
}catch(InterruptedException ex)
{
ex.printStackTrace();
}

System.out.println(drpc.execute("classify", "Have a nice day!"));
System.out.println(drpc.execute("classify", "I feel really bad!"));
System.out.println(drpc.execute("classify", "Whatever, i dont really care"));
System.out.println(drpc.execute("classify", "feel sleepy zzzz...."));

cluster.killTopology("SentimentClassifierDemo");
cluster.shutdown();
drpc.shutdown();
}

private static StormTopology buildTopology(LocalDRPC drpc)
{
TridentTopology topology=new TridentTopology();

topology.newDRPCStream("classify", drpc).each(new Fields("args"), new TwitterSentimentClassifier(), new Fields("sentiment"));

return topology.build();
}
}

The DRPCStream allows user to pass in a text string to the TwitterSentimentClassifier which will then return a "sentiment" field, that contains the predicted label (true for positive; false for negative) of the testing text.

Next copy the following two files into the "main/resources" folder under the project root folder:

twitter-sentiment-classifier-classifier.json:
https://github.com/pmerienne/trident-ml/blob/master/src/main/resources/twitter-sentiment-classifier-classifier.json

twitter-sentiment-classifier-extractor.json:
https://github.com/pmerienne/trident-ml/blob/master/src/main/resources/twitter-sentiment-classifier-extractor.json

The above step can be important, otherwise you may get a FileNotFoundException during runtime.

Once the coding is completed, we can run the project by navigating to the project root folder and run the following commands:

> .mvn compile exec:java

download file now

Read more »