Sentiment analysis, Elasticsearch and Kibana
From the Twitter streaming API grab tweets for a live TV show whilst
it’s showing, classify by contestant, analyze sentiment and provide a
real-time dashboard into the outlook.
Try to predict good bets on the winners. Optional: Gamble. :-)
I picked the final of the BBC’s “Strictly Come Dancing” in the UK. My wife is an
avid viewer, myself less, so I’m generally consigned to the laptop whilst it’s
on. This was a welcome distraction.
During the show I gathered 63,000 tweets, peaking at 35 per second at the
busiest point. Elasticsearch and Kibana didn’t break a sweat.
Most tweets were very easy to classify by one of the 4 contestants - they tend to mention
a name or twitter account and the names were all distinct and unambiguous (Natalie Gumede, Abbey Clancy, Sophie Ellis-Bextor, Susanna Reid).
If a tweet was ambiguous, ie. mentioned more than one
contestant it’d be tagged with both, but only those tagged with a single
contestant were used.
The Sentiment classifier classified tweets into one of: Negative, Neutral or
Interestingly, the sentiment analysis didn’t provide any huge revelations. Most
tweeters were largely positive (there were no big upsets during the dancing, I’m
reliably informed), so relative proportions of negative/neutral/positive for
each contestant were very similar.
I noticed the SentiStrength library produced a few misclassifications for the
negative examples - classifying such short text as tweets is a particularly hard
problem to do accurately.
In a show with stronger opinions it may be this could play a more significant
Kibana is a great tool for visualizing time series data in Elasticsearch. It’s
aimed at dashboards for log files and this problem was a remarkably good fit,
The above screenshot was after the first elimination of Sophie - it was easy to
adjust the query at this point to exclude those tweets. On the timeseries graph
you can spot the peaks as each contestants performed for the 1st and 2nd time.
I placed some small bets on a betting market whilst the show was running. The
best bet was on the 4th place position. This didn’t happen as the market had
assumed (Sophie went out, not Natalie) but twitter correctly predicted this one,
so the odds I got of 3.9 produced decent winnings.
The 1st place didn’t predict a clear winner - it was close enough that I figured
hedging my bets made most sense at that point, so made minimal winnings of a
couple of pounds on this bet.
If there’s the demand, I’ll clean up the code and put it on github.
There are comments.