AI and assisted voice apps have become a new trend in almost every news media, social networks, professional meetups and forums. You probably have watched the movie “Her” a 2013 American romantic science-fiction drama film written, directed and produced by Spike Jonze. The film follows Theodore Twombly (Joaquin Phoenix), a man who develops a relationship with Samantha (Scarlett Johansson), an intelligent computer operating system personified through a female voice. The movie summarises what could be the apogee of voice-assisted apps. Back then in 2013, no one could imagine that such apps would ever exist. But thanks to the exponential growth in computer’s processing capacity and cloud technologies, the unthinkable unfolds in our eyes. This is a good thing though. Not only voice assisted apps are a new medium that we must exploit and use to the benefit of customer as much as we do with mobile apps and websites, but voice apps provide an opportunity to re-create the once lost one on one relationship with our customers by designing a voice that represents our brands and values, by recognising customer emotional state and personalise the experience in a way to keep them satisfied. There are a lot of benefits that come with adopting voice assisted apps for businesses that I will cover in a separate special article dedicated to the benefits of adopting assisted voice apps. My focus for this article is on a few considerations but very important considerations to take when designing a voice-assisted app.

There are a lot of resources on the internet that explains in an exhaustive manner different techniques to use when designing a user experience for mobile apps and websites. There are even academic and scientific papers that explore further how the experience can be taken to another level. The result is self-telling and no one can deny that the interacting with websites and mobile apps have become just pleasant and amazing. In contrast, there are few articles covering how to design a voice-assisted app and most of the academic and scientific papers focus on the Natural language processing algorithms, deep learning and a combination of the two which culminated to the rise of taking machines as we see them today. In this article, I would like to talk about a few things that you need to do when designing a voice-assisted application for your business.

Voice assisted apps are intelligent but not human

The very first consideration to take is to acknowledge that though computers can talk to a human, that does not make them intelligent. Your NLP algorithms are trained to have a good accuracy in predicting the intent of a human voice by analysing each word that makes the sentence and possible contexts within which the word has ever been used. This is already problematic since depending on the nature of the conversations that go through the app, it is easy for the algorithms to pick a word that is being used abusively and learnt it within a new context and soon the app will start talking back to your customer in a manner that is strange, offensive or even inappropriate. One way to overcome this is by keeping an eye on the training data. But sometimes it is impossible given the amount of data produced by your app. Nevertheless, implement steps to make sure samples that have incorrectly matched with your pre-defined user intents are removed. In the same manner, match correctly those sentences that have not matched their right match pre-defined intents. The rule of thumb is “train your model as you would teach your child. Always be with it(the model) morning, afternoon and night”.

Think of user’s intents vertically and horizontally

When designing your app, think of the problem that you want to resolve with the voice-assisted app and underline possible solutions. Use the big O analysis to vertically project the depth or the path of the conversation between the user and the voice app until the problem is resolved. Always aim to have a solution with the shortest path. The deeper the conversation the more the user lose track and the conversation becomes unreal and probably useless.
Once you found the shortest path now think horizontally in this way. Imagine a tree structure with one root node and multiple nodes or leaves that expand horizontally :

  • Each node represents possible intents that you expect from a customer as a result of the customer responding or acting on a suggestion or a response from your app. Predict all possible intents and designs as many nodes as you can to captcha all those intents. I call this a happy path.
  • Ask yourself if the customer can ask the app to repeat what it has just said. Then prepare a node to catch that intent of the user to keep the user happy.
  • Don’t allow the silence to prevail. If the user is not responding make sure that you are still with him/her and kindly remind them what was the object of your conversation. Don’t overdo the reminding, simply know when to stop and always say goodbye by reminding the user that you are always happy to talk to them when they are back.
  • Manage gracefully the fallbacks. Sometimes the user will say something that you don’t understand, in such situation let the user know that you don’t understand and apologise if possible but most importantly let them know that you are learning and improving. Make sure that at the end of the answer you provide possible alternatives to the user so that the user can try to express the same intent in a different way that you may understand.

To summarize this section, think of each flow of the conversation as a stick-like rectangle. Shorter in depth but larger in width.

Consistently evaluate the probability by which each intent will be triggered

As explained above, the conversation flow is horizontally expanded by nodes that are equivalent to possible intents from the user’s voice. Mathematically, these nodes are mutually exclusive and they should define your sample space. Each event(intent from the user) is dependent on the previous response provided by the app and the sum of probabilities of all events that make your sample space will be equal to one. To make this a bit clear, let’s imagine a scenario where a user is asking the app to add a reminder on a calendar. The conversation is as follows:

 User: add “surfing this weekend” to my calendar
 App: Would you like to add to your birthday calendar?


The user started by expressing the intent of adding a planned activity to a calendar. Then the app replied by suggesting to add to the birthday calendar. At this point we can assume that user will reply with the following possible intents:

 Yes: if the user is happy to add surfing to his/her birthday calendar
 No: if the user is not happy
 Another calendar: if the user wants to suggest a different calendar
 Repeat: if the user did not understand the response
 What other available calendars: if the user wants to choose a different calendar
 Fallback: if the user says something the app does not understand.

Our sample space at this level is made of six intents hence six events. The probability of each event to occur will depend on the response and there is no way two events will overlap nor occur at the same time.which is good since we can easily evaluate the probability of each event occurring. Therefore in the absence of data, you can use your belief and the understanding you have of the users and the response that the app provides, to assign the probability of each event happening given the response. With time as you gather more data, replace your belief with data by counting how many times each event is triggered for each response compared to the occurrence of all other events. This will give you the probability of each event.
Why is it then helpful to capture these probabilities? The answer is simple and intuitive, you want the higher probability to belong to the happy path. If for example, you find that the higher probability is assigned to the fallback or that the higher probability is assigned to “No” event, then something is wrong either with the suggestions that the app is providing or simply you need more nodes to capture all possible events(intents). I find this quite useful and is the most important part to make sure your trained models are not biased.

Back up your app with Machine learning APIs

I started by saying that voice assisted apps are intelligent but not human. But what I forgot to tell is that, same as when humans are not exposed to a good education system, these apps are not really intelligent if they are not empowered with educated guesses. Your NLP algorithms might be doing a great work in understanding users’ intents and you may have designed a well-optimised conversation flow, but if they are not backed with an intelligent API then the app is nothing but failed. When the app responds to the user, it needs to be more accurate as to what the user expects so that the user does not have to go through cognitive emotions of trying to understand how the app works and what is the best way to chat with it. There should be no rule on what the user can say to the app, but rather use the available data and run machine learning algorithms to understand the user’s preferences and expectations. Use the knowledge at your disposal to provide a personalised experience. If the user wants to add “surfing to the calendar”, you should be able to learn from the previous history of the user and guess the most relevant calendar where the event can be added. Aim to quickly achieve the happy path through educated guess and personalisation.

Have the best AI writers and AI UX as a core part of your team

As much as you need data scientists and software engineers, you need AI writers and AI UX specialists equally. Talking to a variety of customers from different backgrounds without offending them, managing expectations, representing your brand with an appropriate voice and intonation; all that requires a package of skills that you don’t expect to find in computer engineers. In fact, users are likely to be happy if they feel that they have been treated with respect and esteem. The way your voice assisted app talks to the users, the way the response infer confidence where there is no ambiguity, the way you appropriately apologise when an apology is expected; contributes to the higher adoption of your voice assisted app. Always let your AI UX experts choose which conversation flow provides a good experience to the users. Always make sure that AI writers have designed at least five transcripts or variations of the same response for each intent.

Iterate quickly as you learn new things

The most fun part of AI systems is their ability to gather data and their ability to provide insight and patterns hidden deep in the data. Always keep learning from what you see coming from the users. One of a good way of learning has been explained above in the section of probability but there are much more ways of learning from data. In any case, the goal is to acquire new knowledge and iterate whenever possible. You may need to add more nodes to capture more intents that you have not anticipated before but that is now obvious as you learn from the data. Identify new opportunities to bring more products to your voice assisted app sphere.voice interface is a new medium and we should exploit it to its full potential.


Designing a voice-assisted app isn’t an easy task and requires the involvement of people from different skill sets. AI writers and AI UX experts are needed as much as you need data scientists and software engineers. Always keep an eye on how the app is performing and clean it from time to time to avoid biases that come from talking to people with different slang and moods. Design your conversation flow vertically and Horizontally in such way it is a stick-like rectangle. Remember that although your app might act intelligently, it is not human so it won’t understand the density of human emotions and thus it will fail to satisfy people in some situations. A reasonable step to take is to keep users informed that they are talking to a machine and not a human. The different considerations discussed in this article are useful starting points when considering to design a voice-assisted app. However, designing a conversation flow is a scientific subject and requires more than what covered in this article. By sharing your experience with more people who are managing the same type of voice apps, you will be able to bridge the gaps and gain more valuable knowledge that will help you to provide the best of experience to your customers.
Thanks for reading and please share your thoughts in the comment section below.

Leave a Reply

Your email address will not be published. Required fields are marked *