Sentiment Analysis is Worthless

Disclaimer: The article does not assume that readers have a data science background and thus excludes and masks any complexities behind sentiment analysis or data science.

Opinion mining has reached its peak with the introduction of tools that facilitates sharing ideas and thoughts with the public. Although subjectivity of opinions affects how factual information is, sentiment analysis plays a huge role in studying a targeted group’s perception of a certain entity or event. To mention a few applications where sentiment analysis shines: Discovering a public event’s reaction, improving the customer satisfaction process, and studying a certain brand’s or an entity’s reputation. However, there’s a huge disconnection between the mentioned valuable applications and sentiment analysis, thus, I will try to connect the dots here and illustrate how sentiment analysis should fulfil business needs. Let’s start with a brief explanation of how sentiment analysis works and then move to satisfy the title’s claim.

Sentiment Analysis

Sentiment analysis as a part of natural language processing is the task of discovering a certain text’s emotional tone that is perceived by readers. It receives a text and outputs how positive, negative, or neutral it is. There are other categories as well that are used for sentiment analysis such as [“Angry”, “Sad”, “Happy”, “Excited”] or [1, 2, 3, 4, 5] similar to a rating that goes from 1 being very negative to 5 that is very positive, and so on. I have chosen to group the techniques in terms of their limitations and end results, which will fall into two groups.

Word-Level

Intuition

There are many words that we categorize conceptually as negative, positive, or neutral. And that’s the very first trials of sentiment classification in the literature that was born right after the outburst of subjectivity analysis (Detecting whether a text is opinionated or not) in the 1990s where the paper “Recognizing subjective sentences: a computational investigation of narrative text” has given a huge contribution to.

Short Overview

Word-level-based models at their core check whether the text has more positive words/phrases than negative words or vice-versa, and then classifies based on that. I won’t go deeper on how it does that as there are many well-known approaches such as looking at the language morphology of a word, using hand-crafted rules, automated “rules” through machine learning, looking at the semantics of words. But the important point to take is that it only operates at the word level and doesn’t go far with the whole text’s semantics. Now let’s see how that works straightforwardly by only focusing on one category: “Negative” Sentiment.

Figure 2 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

The example is pretty simple here (“Wrong” & “Bad”) but what if it was negating a positive word like saying “Not Good” or “Not Correct”? Here we move to negation handling (Still word-level) where we check words surrounding a positive/negative word and see if they were negating their positivity/negativity.

Figure 3 — Translation: I told the cashier Khalid that my order is not correct, and he said that he can’t change it. The service is not good at all!

This solves the problem of negation. However, what if we have different examples like this:

Figure 4 — Translation: I got the wrong order but the cashier Khalid has solved my problem immediately
Figure 5 — Translation: The pistachio latte’s taste is too bitter. Couldn’t finish it!!

Word-level-based approaches struggle with these kinds of examples where we have in Figure 4 a negative word that precedes “But” and then the negativity gets canceled by “solved my problem” and turns into a positive text. Figure 5 on the other hand falls into a deeper issue where we have the word in Arabic “مر” that might refer to “Pass” or “Bitter” and it can only be resolved by using Arabic diacritics that not so many people use, or employing an extremely complicated parser. The two problems can be solved through the usage of context and semantics.

Context-Level

Intuition

Words are never independent in a text, each word can change the meaning or opinion of the whole text. Although some natural language processing tasks can run away from the burden of context inclusion (A deeper dive into the semantics of words and their “interactions”), sentiment analysis cannot.

Time-Line Summary

Many trials in the past used rule-based approaches along with word morphology in order to include some semantics, then a movement towards models that try to create groups of words that are similar and by that, documents/sentences will have multiple topics based on the words mentioned (Topic Modeling) where Latent Dirichlet Allocation in 2003 wins as the strongest contributor. After that, deep learning has taken a long course starting from word-level semantics where the star was Word2Vec by Tomas Mikolov through “Efficient Estimation of Word Representations in Vector Space” paper and then moving towards context-level semantics (Contextualized Embedding), until reaching to Transformers to solve many efficiency and quality issues. The basic idea is that there was a huge past where the byproduct is the introduction of models that cater for the context and semantics of words within documents (There’s a huge amazing work on interpreting gigantic deep learning architectures, so the idea that these models cannot be interpreted is not fully true especially when analyzing the core concept of transformers; Attention)

Onto a quick simple example whereby the model includes a contextual representation of text and can understand that the word “مر” is not “pass” but “bitter”.

Figure 6 — Translation: The pistachio latte’s taste is too bitter. Couldn’t finish it!!

Sentiment Analysis and Business Value Disconnect

Disconnection

When we have millions of documents that could be coming from app store or google play comments for an app, google reviews for a place, complaints about a company, twitter region or hashtags tweets…etc. Applying sentiment analysis and getting 10% positive, 20% neutral, and 70% negative for an app or a Twitter hashtag let’s say, is basically useless due to the loss of connecting it to a certain topic. Knowing that some hashtag is too negative only tells you the what, not the why.

You might say that I’ll just filter the text by a keyword but that keyword was chosen by you, not the data! How many words are you going to account for? Are these words being used by customers? Heavily? The data (reviews, comments, tweets) should drive the process of deciding which aspects, or more elaborately, which collection of hundreds of keywords that you should look for. The key takeaway is that you need to know what the aspects are to know what exactly is so positive or negative about your place, app, Twitter marketing campaign, or generally speaking, your business, and then improve.

Connection

We (noura.ai) have researched this subject in order to solve this problem in a different methodology than what is well-known in the literature due to the following reasons:

  1. Scarce Arabic NLP literature
  2. Arabic NLP datasets are of low quality
  3. Arabic NLP base components-of-the-shelf have low quality
  4. Inherent domain-specificity for well-known algorithmic approaches in terms of practicality and generality

We have released our first Generalized Hybrid Aspect-Sentiment Detection and Tracking model which Figure-7 illustrates only its core capability (The model is integrated within Bloom System that is part of Customer-Success platform)

Figure 7 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

One more thing to notice is that the sentiment has gone through multiple layers of indexing and statistical calculations in order to be served as a comparable metric to the CSAT Score used in Customer-Success Management. However, the aforementioned does not address the issue!

Deeper Dive !

We have discovered that aspects are also not enough. We want to know a very well fine-grained problem specification of the aspects given in Figure 7. What was bad about customer-service above is “Order Exchange” & “Wrong Order” that should be detected by looking at “cannot change it” (ما اقدر اغير) and “Wrong Order” (طلبي غلط). Hence, through a combination of contextualized modeling and graph theory (our first text representation layer to solve the issue), we are currently researching in fully connecting the dots until reaching the core of the problem where Figure 8 will elaborate:

Figure 8 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

By that, noura.ai can now discover:

  1. What the total CSAT Score is for a business
  2. Why the total CSAT Score is as such
  3. How to change the CSAT Score

and automatically generate an actionable well-defined recommendation that fits our Decision-Making Platform.

The Hard Truth about Data Science

One of the life-changing decisions that you must have faced discomforting emotions about; is the career path you have to follow. You must have asked, what will happen if I chose this and it turned out to be not at all your interest, or you might have realized that after a couple of years. In this article, I want to focus on choosing the path of being a data scientist, what the other side of data science that is not very well-known to new joiners is, and what data and data science mean outside the scientific realm.

Dilemma of Choice

With the sudden peak in popularity that Harvard Business Review contributed to in 2012 where they have annotated “Data Science” as the sexiest job of the 21st century, businesses started looking for data scientists to employ (Even when they sometimes don’t need to). Consequently, ambitious students started joining this demand wave by choosing this path.

If you were to look up on Google now “Why should I learn data science”, you will find multiple reasons summarized as such: To become good at problem-solving, having a lucrative career path, or due to the very high market demand. These reasons are too broad, not exclusive to data science, never guaranteed, and there might as well be better alternatives. However, they are being repeated everywhere missing out on one main point, people will never be great at something unless they are fully devoted to it, and people popularizing data science unknowingly mask out some challenges that are necessary to be successful. Hence, the title of this article.

Concealed Side

There’s always a difficult side for any field, let’s elaborate on what kind of predicaments or challenges data scientists might face but are not usually well-known.

Reading, Reading, and Reading

Not so many people enjoy reading every day, some of them are new joiners to data science. Data science is about reading books, academic literature, articles, and so on. To bring great ideas that are truly valuable which can improve your output, you must read a ton of knowledge. Following data scientists on social media platforms, subscribing to research organizations’ email lists (My favorite email list is DeepAI), and always being up-to-date is a must, your eyes must be everywhere. Most of what you think about is a byproduct of knowledge you have been introduced to, so be sure to have an abundance of it.

Furthermore, you have a strong backup when trying to fix/detect programming errors, exceptions are raised, program crashes, the output is clearly wrong,…etc, not so much with “Theoretical Bugs”. These bugs are too good at hiding, and you will never catch them if you were not a dedicated reader, you must understand a great level of the inner workings of what you are aiming to apply. Theoretical Bugs sometimes get detected after days, weeks, months, or never; where the model’s true quality is nowhere near to what has been reported.

Living Under Uncertainty

Imagine working for a whole month on a project, then throw it all away, how would that make you feel? Many people cannot accept failure and never let go. They go into a spiral of bad performance or multiple trials of reviving a machine learning project that is already a lost cause. Data science is uncertain, and it will always be, that’s why it’s distinguished by the word science. Managers as well must understand this uncertainty. To lead a successful data science project that is unique and valuable, you have to accept failure and be the first person who supports the team as failure is not so easy to consume.

To account for the risk of failure (For AI projects), I have briefly summarized some of the points that boost the probability of success or at least mitigate its failure:

  • Switch your data science jargon off and accurately define and communicate the business requirements
  • Heavy research in order to define the algorithmic approaches and model’s quality KPI that are in alignment with business needs (e.g. Based on these references, we’re confident to mark a > 85% accuracy as a KPI for use-case X)
  • Be clear with stakeholders about requirements & KPI’s. Communicate exactly what the quality metric means (Further information in the Communication section).
  • Choose at least 3-5 fallback approaches if the chosen first approach failed and make sure you have your timeline buffered for this.
  • Fail fast, and let go if there’s no hope in achieving a value, or pushing the deadline

Communication

You must have heard this phrase before “Explain it like I’m 5”, data science communication is all about this. Translating extreme complexity to minimal simplicity is the hardest-to-improve skill for data scientists, as the better you get, the more complexity you will face, and the harder it will be. To mention a few cases where proper communication (AI-Specific) is a must:

  • Project Initiation: Convincing stakeholders to initiate a project necessitates grasping what the end goal is. You need to simulate how it looks like and attach it, always, to a business value. If your main goal is to directly support a decision-making process in a certain industry for example, when presenting a project, you should focus on simulating a decision-making scenario of which the data science project helps at.
  • Limitations: Limitations are unknown to stakeholders, but very well-studied by data scientists. Limitations must be clarified from the beginning as well as documented by focusing on cannot’s. For example: “The project cannot do X”.
  • Timeline: Project timeline choice should align with its value, and a proper Work Breakdown Structure must be prepared and communicated throughout the project life.
  • Performance Report and Continuous Monitoring: You must have communicated your model’s KPI beforehand, you have to bring examples sometimes, people have different perceptions about numbers. 85% accuracy might sound great for a person, but when introduced with an example, it becomes, for the same person, garbage! (I usually like flipping the quality metric by saying, for example, we will make 15 “mistakes” out of 100 “predictions” instead of saying 85% accuracy). Also, when monitoring the model’s performance in production, mistakes can happen, you always have to be ready to offer a proper defense or a proper retrospection when presented by mistakes. One of the things that are most of the time, unfortunately, not included in a data science curriculum is Interpretability. You need to know why the model has predicted an “Apple” instead of an “Orange”, and here where the conundrum peaks! Some projects are critical, and any prediction has a burden of responsibility, so account for the need for interpretability if the project expects it.

Bright Side

Allow me to coat this field with fascination using my own definitions sacrificing some of the scientific jargon.

“Data” in a Different Dimension

Data is our way to represent the real world around us in a slightly different format than what we’re used to. It is a way to share information with others in a more accurate way, it is a method that allows us to play easily with this information using a machine. It’s a technique to convince others with evidence, it’s a method where we capture moments and occurrences of certain real-life events in this world to be later used. Your five senses are a considered data channels to your brain, as much as you can consider your phone’s camera as its sense of sight, or the microphone will be its sense of sound. Each type of computer will have these channeling mechanisms whereby it can receive different data with different formats. What then? The data will set there without any use. Here comes data science!

“Data Science” in a Different Dimension

Data science is an inter-disciplinary field that uses scientific methods, statistics, mathematics, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data

V. Dhar

Let’s throw that away for a bit and go with a simpler overview. We previously mentioned that data is just a representation of the real world; texts, sounds, images, numbers …etc. but this has no value. Data science transforms this representation, into another representation whereby people can relate to, it adds value and more information to what was only vague data flowing around us into things that are easily understood. After that, it affects our decision, it makes us realize things that we didn’t know before, it changes our actions, and might as well be used to give us a prediction of what will happen if that action was changed. Also, it might tell us things that we could not have known unless we learned, or even if we have learned it, it can tell us in a faster and a more evidential way.

Imagine that you spend some amount of money every day, wouldn’t it be useful to see where you spend that money, on a monthly basis, with respect to a certain type of spending. Also, you might have to ask your friend for some amount of money in the next month or reduce how much you spend every day if only you were able to estimate your next month’s budget.

Why Learn Data Science?

“Why Learn Data Science?”, is an interesting question… or… — Questions Alert! — is it? How interesting is it? Why is it interesting? And for whom exactly is it interesting? How many people find that interesting? How many people find it boring? Can I compare how interesting that question is with respect to other questions? But wait? How can I represent the concept “Interesting”? Also, Can I predict the number of people who would be interested in that question this year and in the coming year? Can I predict whether a person would be interested in that question or not before I ask?

Can I — Brainstorming Alert! — answer these questions by just seeing how many people searched for that question on google? Or how many people have clicked on websites that have the answer for that question? Or publish a survey that has related questions with that exact question being answered, and then publish the survey without that question and try to predict whether the person would answer “I am interested in that question” based on his other answers? Or can I just calculate the number of junior data scientists in a region at a certain time?

Data Science will give you the ability to ask questions about anything you see, read, or listen to in your everyday life whether it was as simple as the question above, or as hard as the Large Hadron Collider problem. It will make you capable of thinking about multiple approaches to overcome problems or answer questions. It will change the thought process you follow into an analytical thinker; it will change how you make decisions or receive factual claims from people or assess how truthful the claims are. It will provide you with a logical analytical domain of which you can tell when to accept a claim, reject a claim, or stay neutral.

Data Science is more of a lifestyle, and a philosophy, rather than just a career