Auto Sentiment Analysis Failing? Context is King

UK company FreshMinds Research recently ran a test by pulling social media commentary about Starbucks using several popular analytic tools offering automated sentiment analysis of the text gathered.  They found flipping a coin to determine the sentiment of each individual comment would have been more accurate than what the tools reported.

FreshMinds analyzed over 19,000 online conversations with tools from Alterian, Biz360, Brandwatch, Nielsen, Radian6, Scoutlabs and Sysomos.  All content was centered on Starbucks.

The good news is aggregate level reporting of sentiment (average overall) was between 60% and 80% in agreement with a manual coding by trained staff.  Not bad.  The bad news?  Only about a third of individual comments were accurately coded.

Somehow, the randomization of automation errors resulted in an aggregate number of coding all conversations that wasn’t off by much.  But, if you wanted to dig deeper into individual conversations either for more insight or to engage in the conversation, the likelihood of finding the right positive or negative comments is not very high at all.

Their report is an excellent overview of these seven tools and how they perform across geographies and content sources.  And, as a side note, it’s a great marketing effort to get you and me to pull down their paper in exchange for contact information.

It’s not surprising to me that these tools are still so far off.  It’s a micro-representation of a macro-level challenge facing most research firms, agencies, and marketers today:  putting things into context from a people-centric approach.  We have so much data today that making it both accurate and actionable requires a more concerted effort to put everything into context, mirroring the reality of human decision-making and behavior as much as possible.

I’m sure some combination of neural networks, complexity science, and/or agent-based simulation tools eventually will yield “smarter” sentiment analysis tools to speed up the process of sifting through thousands of lines of text-based data.  Those pursuing that dream need not lose sight of the biggest mystery to solve:  understanding the meaning of words within a human context.

The FreshMinds report is definitely worth the read.  I’m curious what the makers of these tools would have to say about their report.

Thanks to Research (the magazine) for the heads up on the white paper release.

About The Author


Other posts byMaury

Author his web site

4 Comments Add Yours ↓

The upper is the most recent comment

  1. 1


    Thanks for highlighting the abilities (and inabilities) of automated analysis. Often we think technology can always produce accurate results but with complex concepts like sentiment, mathematical algorithms just can’t capture the nuances within the words.

    That said, we still need ways to speed the analysis of vast amounts of information such as company commentary. Content analysis has been around for a long time. By using the proper statistical approaches, the accuracy is sure to increase. Just as your article suggests, multiple measures of sentiment may lead to better analysis.

  2. 2

    Calvin – totally agree. I think one of the keys will be going beyond statistical tools to those that are more adept at pattern recognition and non-linear trends. The challenge now is many of these tools are used and promoted today, but without the caveat of what really can and can’t be done with them with any degree of accuracy or confidence.

  3. 3

    hi there
    Brandwatch is one of the tools in the report and as you might expect we have a lot to say about this subject. There are a couple of problems with disclosure however

    1. it’s a complex area and much of the valid and useful explanation is rather technical (we use SVMs for example and if you have a read of, you’ll see what i mean

    2. most of the best explanations i can give you for our approach both now and more importantly, in future are rather sensitive – i wouldn’t want to discuss a lot of it in public

    3. it takes time to explain – time which i) is scarce on this side and ii) tends to bore the hell out of people on the other side :)

    So with that done, let’s crack on

    The automated sentiment analysis systems of Brandwatch are all implemented with a set of Machine Learning algorithms.

    These algorithms need to learn how to do these things before doing them. And they need to learn to behave in a most human-like manner, so that what they do is not counter-intuitive to the users.

    We teach them these skills by first having our data analysts perform annotations on a small subset of documents, and then showing this data to the algorithms so that they can find ways to imitate. It is very much like teaching somebody a new skill by acting it in front of them several time.

    Brandwatch uses Support Vector Machines. SVMs learn by finding and memorising representative examples in their training data. [The term "support vector" is the same as "representative training resource turn into a vector of numbers"].

    For instance, a sentiment annotator implemented with an SVM will detect and remember training resources that are representative of each sentiment class.

    The principles behind SVMs are not very different from highschool geometry — except, instead of working with 2 or 3 dimensions that we humans find comfortable, SVMs work easily with thousands of them.

    When a SVM receives a new piece of text that it needs to classify, it figures out where in the space of memorised training data this new resource falls. A new resource will fall into that region where there are memorised training resources most similar to the new one. The SVM then classifies the new resource as belonging to the category of that region.

    So, you’re right, context is indeed extremely important. And in our experience there are 2 parts to this context

    1. isolating the text which ‘talks about’ the Brand or query. This is, as one of our guys might say, a non-trivial problem. How can you say which parts of a page of text actually refer to the subject you are analysing? Moreover, how can you program a machine to do it? It’s tough and it’s something we’re working at improving

    2. representative training data. The SVMs are only as good as the training they get. So we have to make sure the dataset that is used to train them is as close as possible to the subject matter being analysed. In the case of the Fresh Networks analysis, they didn’t tell us what they were going to look at and Starbucks was in our general food and drink industry which has had rather broad training – 50,000 data points covering brands from Coke and Evian to Jamie Oliver. The up shot is that the accuracy is ok for some queries and not very good for others. If we had trained the system on Starbucks mentions (it would take around 500 data points), the results would have been a lot better. That’s something we do with clients but clearly not in blind tests.

    So in summary – it’s reasonably early days for text and context analysis, but i’m pretty sure that over the course of the next few years big strides will be made. It’s difficult to say exactly when automated text analysis is good enough / clearly the best choice. Rather, it’s more likely to creep up on us and before we know it accurate automated sentiment analysis of anything on the web will be a given. I’d give it 3 years til we get to that point.

    I hope that’s helpful – as you can probably tell, i could go on and on about this stuff :)


  4. 4

    Giles -

    Very interesting stuff. And, it makes complete sense to me that human “training” is required to provide appropriate context for the industry/category/situation/etc. I wonder how many different nuances within a particular category or industry are required before you can have sufficient training to be able to apply the tool to a new data set without training? I suppose it has everything to do with the type of question from which the data originated OR the type of site in which the text was submitted (posting to a blog vs. responding to a question vs. a short burst like a tweet or facebook status update, etc.

    I totally agree that it will happen. I’d love to see if anyone is trying to apply complexity science techniques to this problem. Of course, the challenge you still face is getting text into a workable data set… therein lies the core issue of assigning meaning to text.

    Thanks for sharing.

Your Comment