Sentiment Analysis is Worthless

Disclaimer: The article does not assume that readers have a data science background and thus excludes and masks any complexities behind sentiment analysis or data science.

Opinion mining has reached its peak with the introduction of tools that facilitates sharing ideas and thoughts with the public. Although subjectivity of opinions affects how factual information is, sentiment analysis plays a huge role in studying a targeted group’s perception of a certain entity or event. To mention a few applications where sentiment analysis shines: Discovering a public event’s reaction, improving the customer satisfaction process, and studying a certain brand’s or an entity’s reputation. However, there’s a huge disconnection between the mentioned valuable applications and sentiment analysis, thus, I will try to connect the dots here and illustrate how sentiment analysis should fulfil business needs. Let’s start with a brief explanation of how sentiment analysis works and then move to satisfy the title’s claim.

Sentiment Analysis

Sentiment analysis as a part of natural language processing is the task of discovering a certain text’s emotional tone that is perceived by readers. It receives a text and outputs how positive, negative, or neutral it is. There are other categories as well that are used for sentiment analysis such as [“Angry”, “Sad”, “Happy”, “Excited”] or [1, 2, 3, 4, 5] similar to a rating that goes from 1 being very negative to 5 that is very positive, and so on. I have chosen to group the techniques in terms of their limitations and end results, which will fall into two groups.



There are many words that we categorize conceptually as negative, positive, or neutral. And that’s the very first trials of sentiment classification in the literature that was born right after the outburst of subjectivity analysis (Detecting whether a text is opinionated or not) in the 1990s where the paper “Recognizing subjective sentences: a computational investigation of narrative text” has given a huge contribution to.

Short Overview

Word-level-based models at their core check whether the text has more positive words/phrases than negative words or vice-versa, and then classifies based on that. I won’t go deeper on how it does that as there are many well-known approaches such as looking at the language morphology of a word, using hand-crafted rules, automated “rules” through machine learning, looking at the semantics of words. But the important point to take is that it only operates at the word level and doesn’t go far with the whole text’s semantics. Now let’s see how that works straightforwardly by only focusing on one category: “Negative” Sentiment.

Figure 2 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

The example is pretty simple here (“Wrong” & “Bad”) but what if it was negating a positive word like saying “Not Good” or “Not Correct”? Here we move to negation handling (Still word-level) where we check words surrounding a positive/negative word and see if they were negating their positivity/negativity.

Figure 3 — Translation: I told the cashier Khalid that my order is not correct, and he said that he can’t change it. The service is not good at all!

This solves the problem of negation. However, what if we have different examples like this:

Figure 4 — Translation: I got the wrong order but the cashier Khalid has solved my problem immediately
Figure 5 — Translation: The pistachio latte’s taste is too bitter. Couldn’t finish it!!

Word-level-based approaches struggle with these kinds of examples where we have in Figure 4 a negative word that precedes “But” and then the negativity gets canceled by “solved my problem” and turns into a positive text. Figure 5 on the other hand falls into a deeper issue where we have the word in Arabic “مر” that might refer to “Pass” or “Bitter” and it can only be resolved by using Arabic diacritics that not so many people use, or employing an extremely complicated parser. The two problems can be solved through the usage of context and semantics.



Words are never independent in a text, each word can change the meaning or opinion of the whole text. Although some natural language processing tasks can run away from the burden of context inclusion (A deeper dive into the semantics of words and their “interactions”), sentiment analysis cannot.

Time-Line Summary

Many trials in the past used rule-based approaches along with word morphology in order to include some semantics, then a movement towards models that try to create groups of words that are similar and by that, documents/sentences will have multiple topics based on the words mentioned (Topic Modeling) where Latent Dirichlet Allocation in 2003 wins as the strongest contributor. After that, deep learning has taken a long course starting from word-level semantics where the star was Word2Vec by Tomas Mikolov through “Efficient Estimation of Word Representations in Vector Space” paper and then moving towards context-level semantics (Contextualized Embedding), until reaching to Transformers to solve many efficiency and quality issues. The basic idea is that there was a huge past where the byproduct is the introduction of models that cater for the context and semantics of words within documents (There’s a huge amazing work on interpreting gigantic deep learning architectures, so the idea that these models cannot be interpreted is not fully true especially when analyzing the core concept of transformers; Attention)

Onto a quick simple example whereby the model includes a contextual representation of text and can understand that the word “مر” is not “pass” but “bitter”.

Figure 6 — Translation: The pistachio latte’s taste is too bitter. Couldn’t finish it!!

Sentiment Analysis and Business Value Disconnect


When we have millions of documents that could be coming from app store or google play comments for an app, google reviews for a place, complaints about a company, twitter region or hashtags tweets…etc. Applying sentiment analysis and getting 10% positive, 20% neutral, and 70% negative for an app or a Twitter hashtag let’s say, is basically useless due to the loss of connecting it to a certain topic. Knowing that some hashtag is too negative only tells you the what, not the why.

You might say that I’ll just filter the text by a keyword but that keyword was chosen by you, not the data! How many words are you going to account for? Are these words being used by customers? Heavily? The data (reviews, comments, tweets) should drive the process of deciding which aspects, or more elaborately, which collection of hundreds of keywords that you should look for. The key takeaway is that you need to know what the aspects are to know what exactly is so positive or negative about your place, app, Twitter marketing campaign, or generally speaking, your business, and then improve.


We ( have researched this subject in order to solve this problem in a different methodology than what is well-known in the literature due to the following reasons:

  1. Scarce Arabic NLP literature
  2. Arabic NLP datasets are of low quality
  3. Arabic NLP base components-of-the-shelf have low quality
  4. Inherent domain-specificity for well-known algorithmic approaches in terms of practicality and generality

We have released our first Generalized Hybrid Aspect-Sentiment Detection and Tracking model which Figure-7 illustrates only its core capability (The model is integrated within Bloom System that is part of Customer-Success platform)

Figure 7 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

One more thing to notice is that the sentiment has gone through multiple layers of indexing and statistical calculations in order to be served as a comparable metric to the CSAT Score used in Customer-Success Management. However, the aforementioned does not address the issue!

Deeper Dive !

We have discovered that aspects are also not enough. We want to know a very well fine-grained problem specification of the aspects given in Figure 7. What was bad about customer-service above is “Order Exchange” & “Wrong Order” that should be detected by looking at “cannot change it” (ما اقدر اغير) and “Wrong Order” (طلبي غلط). Hence, through a combination of contextualized modeling and graph theory (our first text representation layer to solve the issue), we are currently researching in fully connecting the dots until reaching the core of the problem where Figure 8 will elaborate:

Figure 8 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

By that, can now discover:

  1. What the total CSAT Score is for a business
  2. Why the total CSAT Score is as such
  3. How to change the CSAT Score

and automatically generate an actionable well-defined recommendation that fits our Decision-Making Platform.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s