Preparing For Voice UX In What Will Become Standard UI

The more you think of UX, the more it evolves. It feels like yesterday when TVs could only let you watch shows and phones could just let you talk to someone. The devices around us changed, and with the big splash created by the entry of the Internet, the product scene has never looked back. Product development has grown like wildfire. The Internet has provided a fresh platform for digital products that we had not conceived earlier. One had to wait in queues to book movie tickets, but now the same person does it on their smartphone. Current product line-ups are more focused on the aspect of user experience as compared to the yesteryears, especially the digital ones. Feasible products have their physical form to assert their presence, but not the digital products. Web products heavily count on the user experience to provide positive interaction. And thus far, they have been extremely efficient at their job and also evolving. But like everything, there’s a limit to it.

UX design has attained a saturation state, at least in regards to interaction medium. The current visual level product interaction has reached its tipping point. There’s not much you can do with UX design unless one focuses on a different medium. And that’s what designers did.

Designers started focusing on verbal interaction to make their products even more user-friendly than the present generation. Apple was one of the first to incorporate voice interaction in their products in the form of Siri. Since then, we have witnessed the birth of Cortana and Alexa. Voice UX is a novelty at the moment. It needs to evolve further, much like visual UX, to have that mass accessibility of users.

Voice UX

Just imagine how awesome it can be when you can command your products to perform the task, instead of using them to complete the work? Voice UX solely relies on user interaction with the product based on voice commands. It is like talking to your friend or parents. It’s just that they have been replaced for a product. The most straightforward implementation for voice UX in present scenario would be a one-way voice to visual communication. Think of you asking Siri the answer of 5+2, and Siri displaying the result ‘7’ on the screen. However, it’s a different form of interaction when asking Alexa for some sports news. Unlike Siri, Alexa would speak about the news. It represents a voice to auditory communication.

Any product that incorporates a voice UX only depends on the verbal commands, which is an obstacle for the current system. Let’s see how.

• Assume you want to buy a shirt online. What do you do?
• You visit the online store and look for a shirt you like.
• Then, you select a shirt, and the other details (size, color)
• Next, you provide the delivery address and make the payment.
• The shirt arrives at your doorstep.

Through this whole process, starting from looking for a shirt to confirming payment, how did you interact to buy the shirt? You visualized the elements and responded with actions (mouse click or touches, typing address via keyboard). The web product, i.e., the online store performed the task of representing information in a visual format for you to comprehend and progress accordingly. Do you think you could have bought the shirt if you kept staring at it? No right. You had to go through the process of selecting and paying for the shirt by performing physical actions like moving the mouse or touching the screen to provide input in mobile devices.

Communication Base

A usual UX has the user visualizing the product and interacting with physical actions. These actions make for a standard interaction with the product. The current visual UX system is highly strict with activities but in a way that does not hinder the user’s intentions.

Have you ever tried Windows Speech Recognition to control your computer? If you have, you would understand what I mean. In the current context, voice systems are not able to process users’ interactions properly because of the input variations. For products to appeal to customers, they need to understand them. As voice UX calls for verbal interaction from the user, products have to capable of handling the variations in interactions which might have the same user intentions. A voice system should be able to acknowledge that ‘speak’ and ‘tell’ intend the same thing, which the all in the current set are not capable of.

So here the task of designers begins. Language vocabulary is large. Hence, designers have to base their designs on some standardized set of phrases and keywords to establish consistency in user interaction.


The early stages of evolution have shown some limitations of the voice systems. There is some leeway to visual UX, as visualization is a customer’s primary sense of perception. A user sees a thing and acts accordingly.

User Interactions

A voice UX has to face a more substantial hurdle in understanding the user. It is owned to the fact that user intentions are the same, but their interactions vary (in the usage of words). Hence, designers have to establish uniformity in communication by setting a standard set of commands that are feasible to the regular users.

Presenting information

Another limitation of voice UX is the method of representing data. Voice systems can only relay information in a sequence, hence, tend to be slow. And in an age where everyone is short of time, latency in interaction is not something a product should offer. A visual UX can represent numerous information simultaneously on a screen for the user. Voice system only allows for serving a single bit of data at a time. The remedy? Systems can be made smarter to avoid unnecessary information by segregating the info based on some filters. Here’s an example.

Let’s say you want to dine out.

• You ask Siri, ‘Suggest me a place for dinner’ and she starts listing out the likely places. All while, you have to wait until you hear all the details about the diners, then make further choices.
• This situation is not an ideal. An ideal scenario would be,
• You ask Siri, ‘Suggest me a place for dinner.’
• She says, ‘There are a few choices – how do you intend to go, by walking or driving? • You say, ‘I’m going to walk.’
• Then she says, ‘You have three choices – Hotel Meridian and Hotel Blue Moon are 5-minute walks; The Adams is a 4-minute walk.’
• You say, ‘Great. I’m up for The Adams.’
• Finally, Siri says, ‘The Adams it is then.’

That’s the perfect end. For voice-controlled products to be mainstream, they have to capable of interacting the way humans communicate naturally. A developer cannot code every aspect of conversing into the product. Like people, voice-enabled devices will have to learn by interacting with the users. For the time being, voice UX can get away with the simple interactions because of its novelty. But as time goes by, it will have to evolve.


Then there are concerns about safety. For the voice-enabled devices, your voice is the input, not your presence near the machine. Hence, a simple recording of your voice can be the key card for your child ordering a pantry of cakes without you knowing anything about it. Synthesizing voice or using voice recordings to gain access to such voice systems are an obvious concern. Designers have to work hard to curb these loopholes if they want users to have any confidence in their products.

Present UX state

It won’t be wrong to say that the voice UX is currently in its infancy. Ok, not infancy, but more like early childhood. As mentioned many times by now, it has to evolve to prove its significance in the future.

The current UX state allows for simple interactions. Again think of you asking Siri a maths question, or Alexa for some sports news. The current model is safe for easy tasks. There’s a long way to go, with even more significant obstacles for voice aspect of UX to tackle when compared to other forms of UX. We have already talked about variations in user interactions, the method of data presentation and even the device responses. The progress has been good so far, especially when you consider the fact that it was three years ago, there were no Alexas and Cortanas. The point is, any form of initial phase development is slow, especially when you are setting up the basics.

We are in a time-frame where voice interactions with devices have made some mark in the UX industry, showcasing the massive potential for the future. Voice UX is a breath of freshness in the UX industry which has existed around for some time but gained vital importance within a short time.

Coping with present and future possibilities

It was about two decades ago when UX was given importance by designers owned to the proliferation of computers in workplaces. Since then we have gone about almost saturating the visual aspect of UX. Voice UX is a new chapter in the book of UX. It is something that both the present and future designers have to be aware of, as it holds the key to revolutionize user experience if its full potential gets unleashed. How voice UX develops comes upon us; how we can work upon the present stage and improve the capabilities of hardware and software alike for the final result to be effective and efficient in its job.

Addressing current issues

But before planning for improvements, we need to solve the current problems with the system. The first and foremost priority should be designing systems that are capable of understanding users beyond the barriers of communication.

Aid for the differently-abled

With appropriate programming, voice system could be a blessing for the differently-abled. The current setup is inadequate for having a differently-abled user. Currently, the voice system only supports audio or visual interaction from the product, which restricts the deaf and blind users. A bit of innovation and we might see blind and deaf users feeling similarly-abled about themselves, and might help in life or death situations.

Form of interactions

The current crop of voice devices cannot evolve. They are programmed in such a way that only makes for predefined responses to any human interaction. This aspect of the present generation makes them first-time charmers. You will be fascinated when you interact for the first time, but as time goes on, they lose their appeal.

Products have to feel more natural in interaction. Sooner or later, they have to lose their machine characters and project natural human interactions, else risk being a standoff in progress.

Assurance of safety

Also, the current solutions in the market do not provide any form of security. The future iterations have to incorporate security measures for prevention of misuse. Maybe integrate some system to differentiate between recordings and actual voice, or pairing with a Fitbit to detect user presence. The possibilities are endless, and I’m sure the designers will find a way.

The UX development has seen drastic advancements in recent years; voice UX being one of them. The verbal systems do inspire some confidence, primarily because of them simplifying the process of human-device interaction. The field is still a work in progress; hence, one can witness various gaps in the system. These come out as the system’s weaknesses but are something that can be solved over time. As already said, voice UX is a novelty. It is not a question of how but a matter of when this space in UX becomes mainstream in the industry.

