SentiSquare | Newsroom | Automation Dilemma: How accurate does AI need to get to be worth deploying?

Automation Dilemma: How accurate does AI need to get to be worth deploying?_

Published on 22|10|2020 by David Radosta

Even as artificial intelligence lacks the scope of human brain capabilities, “Business AI” has already seen success in carrying out narrowly defined operational tasks. AI guru Kai-Fu Lee, former director of Google China, gives some examples of opportunities lying in the tagged data of companies — such as insurance fraud identification and cancer identification in X-ray scans. A lot of potential also lies in sorting and analysing customer messages — the jargon term for which is ‘text classification’.

AI thus provides insight into unstructured data, and can even automate tasks previously done by humans. One of the obstacles to widespread use is the issue of accuracy; AI makes mistakes.

We often come across the idea that task automation requires a success rate close to 100% This is how Tomáš Brychcín, a Natural Language Processing (NLP) Scientist and CEO at SentiSquare, responds: “Remember that humans are not quite there either — the normal is around 85%. In principle, AI cannot match human ability in text classification — since it is trained on human-generated data,” Even so, we are getting solid results in real-world cognitive automation. That is because AI works in a drastically different way to humans.”

The Myth of 100% Accuracy

Does it mean that the human brain is simply superior to AI? Not quite; comparing AI with human accuracy is tricky. It requires clarity about how we measure it and where our benchmarks are. Imagine the case of a 100% accurate classification of text pieces such as emails. The task is to assign each incoming email to a specialist inbox. A 100%-accurate agent would need to know the category definitions and company’s inner workings through and through. Even so, there is uncertainty in some emails as to where they belong. If multiple agents carry out the same task, their agreement rate is never 100%. Usually, there are differences in more than 15% of cases. Many companies are surprised by that!

The 100% is therefore practically unattainable. Yet, to err is human, and to find the right person for a task is divine. That applies especially in the very real context of high agent turnover and onboarding challenges in customer service.

But where do we set the objective if ‘correctness’ is so problematic? We need to consider the difficulty of the task. How many categories are there? Do they overlap? Sometimes, with a complex categorisation system, an accuracy average of 70% is a solid result for a team of agents. That is quite far from the initial 100% benchmark which a perfectionist would cling to.

The Classification Dilemma

Classifying incoming customer emails is one frequent scenario — and a potential automation target — in large companies. Emails need to be tagged into correct categories and dispatched to appropriate specialists. A similar process can also occur after an interaction; For instance, call centre agents categorise calls after the customer hangs up, so that the management knows what is going on at any given time and what the trends are.

Of course, an essential question is how complex the classification task should be. If the categories are few, the classification will be easy and high accuracy attainable — for example, inbound emails and/or calls can be tagged as questions, requests, or complaints. That would make the process fast and undemanding. However, such simplicity rarely reflects a large firm’s operational needs — most industries require a more specialized system.

It helps the response process a great deal if each incoming query is matched to a specific resolution scenario, rather than a broad, generic category. For example, in an electricity distribution firm, it is important to distinguish between precisely defined cases such as a new customer either moving into a new house with existing supply, or switching from a different supplier. Knowing immediately which scenario to apply to an incoming query enables speedy resolution — and creates a potential for automating the resolution process itself in the future.

All that is why most firms opt for more detailed and narrowly defined categories. The operational fit is necessary and the reporting output becomes much more valuable. On the other hand, it gets more difficult to keep the classification quick and accurate. An ideal solution would be to get the best of both worlds — operational excellence with deep real-time insight combined with an efficient process which does not pile up backlogs.

The more granular the categories, though, the more problems human agents are facing. With increasing complexity, they start to make more mistakes. Worse, some take shortcuts and put everything into a couple of their favourite categories. Mishaps happen even more frequently on Friday afternoons — in other words, performance varies among agents and over time. No wonder; manual text classification is a repetitive and ungrateful task. Different people cope in different ways and make different sorts of mistakes.

The issues facing individual agents mean that achieving a smooth flow of dispatching and categorisation is a challenge for contact centres. Human error plays out in multiple ways. Running the show costs a lot — qualified labour is expensive and agents need to be trained properly. In fact, according to contact centre experience, it takes an agent up to three minutes on average to dispatch one email. When there is a query surge and the average resolution time increases, though, the backlog of queries can snowball. To add salt to injury, incorrectly dispatched queries either bounce back to the backlog, or take up the time of a specialist that is supposed to take care of customers.

How AI beats humans

While accuracy remains a key metric, principal reasons for adopting AI lie elsewhere, in the disciplines where humans fail.

  1. 1. SPEED: Here's the biggest difference. - AI sorts an email out in a split second; on the other hand, a human agent needs up to 3 minutes on average.

  2. 2. COST-EFFICIENCY is a clear advantage. With inbound interactions at tens of thousands per month, AI-powered automation reduces labor costs.

  3. 3. RELIABILITY: AI behaves stably and consistently. It doesn't need coffee breaks, it doesn't have bad days and it always uses the same logic when making decisions.

  4. 4. FLEXIBILITY: AI adapts to change better than humans do. Both need to "retrain" — but a human takes a while to get used to the new setup. AI will start following the new rules as soon as it learns about them.

“But keep in mind that even in terms of accuracy AI is on near-human level now”, Brychcín points out. “Put together, these benefits make cognitive automation through AI a crucial source of competitive advantage.”

Cognitive automation of text-related tasks is finally possible. AI can help make the dispatching of customer emails faster and more efficient in a number of ways — and that is not all. To keep up with the increasing load, customer service will need to automate more processes. Companies have only just started to scratch the surface of the mountain of opportunities. One promising case is the quality assessment of messages that agents send to customers. Did the agent show empathy? What about grammar? AI is already there to give answers. When it comes to optimising customer service, only the contact centre manager’s imagination is the limit.

Naturally, text-related tasks ripe for automation can also be found outside the realm of customer service. Think internal ticketing as the closest neighbour, facing similar challenges and gaining the same advantages from cognitive automation. Another thrilling case is automating text mining to find out hidden gems in a pile of company data.

“Once you get your categories right and your business case ready, the implementation itself is faster and less expensive than you might think,” says Brychcín. “A functioning pilot can be set up within two weeks from the transfer of training data.”


SentiSquare is a technology company that deals with customer-generated text analysis. As one of the few companies uses artificial intelligence based on principles of distribution semantics, which provides many competitive advantages, such as language independence. The company was founded in 2014 as a spin-off by team of researchers at the Faculty of Applied Sciences of the University of West Bohemia in Pilsen. SentiSquare currently supplies technology to the contact centers of large companies such as T‍-‍Mobile, E.ON or Albert.

Media contact:
Lucie Kolářová
+420 603 400 124