UX Design
February 24, 2025

12 steps to follow when crafting a voice multimodal design

Réka Pető

Multimodal design allows users to interact with a digital product in multiple ways, for example, by a mix of touch and voice commands. This is done for efficiency and convenience, based on personal preferences and situational context. 

In this article, we dive into the field of voice controlled multimodal design. We'll introduce our design experiment, share our findings and offer recommendations you should know if you're thinking about developing voice multimodal digital products or services.

But first, let’s see why we speak about multimodal design at all!

What is Multimodal Design?

Imagine a world where your digital interactions are as seamless and intuitive as speaking to a friend, touching a screen, or seeing visual cues that guide you effortlessly. How would your experience with technology change if you could choose the interaction method that best suits your needs at any given moment? 

This is the promise of multimodal design, an approach that integrates various interaction methods, such as voice commands, touchscreens, and visual cues, to create a more versatile and user-centric experience.

As an illustration, voice multimodal interfaces enable users to perform tasks such as searching for a particular item on an online store using only voice commands (e.g., "I'd like to purchase a small-sized, blue Ralph Lauren shirt") or upload images by simply saying "upload picture1 from my desktop".

Sounds good, right? Let’s see its advantages from a business point of view.

What Value Does Multimodal Design Bring to Business?

By offering multiple ways for users to interact with technology, companies can cater to a diverse range of preferences and needs. Which means that multimodal design can change the game in digital interaction. 

By combining voice, touch, gestures, and more, it can make user experiences smoother and more intuitive. With technology evolving and AI getting smarter, the future of interaction looks even brighter. 

This adaptability not only enhances user satisfaction but also drives higher adoption rates, as users engage with the system in their preferred way. By offering a flexible, user-centered experience, multimodal products will stand out in the crowded marketplace by delivering a rich and engaging user experience.  

Banner saying "make lasting impressions with tailored experiences"

What Value Does Multimodal Design Bring to Users?

Multimodal design is crucial for users because it addresses their diverse needs by integrating multiple modes of interaction into a cohesive user experience. This approach enhances accessibility, usability, and engagement, making products more adaptable to various contexts and user preferences. 

Enhanced Accessibility

By using different sensory inputs, we can ensure that users with various needs can easily interact with the technology. For instance, auditory cues can assist visually impaired users, while visual cues benefit those with hearing impairments. 

Additionally, it's important to acknowledge the growing number of children with atypical developmental needs, such as autism, attention disorders, and ADHD. Multimodal design can offer significant benefits for these children (who will soon become adults) by enabling them to use digital products in the most comfortable and effective way for their individual needs.

A blind user working on a computer

Enhanced User Experience

Besides accessibility, multimodal design can play an important role in enhancing overall user experience, making interactions with digital products and services more enjoyable and engaging. 

More personalized experience

Offering users the flexibility to choose their preferred mode of interaction adds a personalised touch to the experience, catering to individual preferences and making interactions feel more natural and intuitive.

More fun experience

Integrating playful animations, sound effects, or haptic feedback can make interactions more dynamic and entertaining, which will result in deeper engagement.

More unique experience

Though multimodal design has been known for decades, we rarely have the chance to use them, so exploring multimodal interactions will be a unique memory for users.

Ultimately, the benefits for both users and businesses are closely aligned, as improved user experience drives business success.

We believe that multimodal interfaces will be the future, and pioneers in the field will have a great advantage in engaging and retaining their users. 

Voice Control as the Engine of Multimodal Interfaces

Communication, especially verbal communication such as speech, is an essential part of many of our lives. It requires no special equipment / place / posture for most; we can speak and make sounds at any time of day and under almost any circumstances.

In light of these facts, it's not surprising that voice control is one of the cornerstones of multimodal design.

Although we know more and more about voice-controlled digital products, their number is still negligible compared to the background technology that has been available for years. However, voice-controlled design solutions could solve many problems users might face. Let’s see some examples where voice controlled design already enhances user experience: 

1. Accessibility for People with Disabilities

Example: For individuals with physical disabilities, particularly those with limited mobility or dexterity, voice-controlled devices can be a game-changer. For instance, smart home systems like Amazon Alexa or Google Assistant allow users to control lights, thermostats, and even appliances through simple voice commands, making homes more accessible.

2. Enhanced User Experience in Complex Interfaces

Example: Voice commands can simplify interactions with complex systems. In enterprise settings, voice-controlled software (eg.: Zoho CRM) allows professionals to update CRM entries, schedule meetings, or retrieve information without navigating through multiple screens, thereby saving time and reducing cognitive load.

3. Multitasking and Efficiency

Example: Professionals who need to multitask can benefit from voice-controlled assistants. For instance Microsoft Cortana helps to manage their schedules, send emails, or even dictate documents while working on other tasks. 

4. Voice-Activated Learning and Education

Example: In educational settings, voice-controlled applications can facilitate learning, particularly for young children or those with learning disabilities. Tools like Read&Write provide support for students with reading and writing difficulties. It offers a voice-controlled option where students can use speech-to-text to write essays or complete assignments. 

But voice-controlled design only works well if it's done right. Otherwise, it only increases the user's frustration and doesn't solve their problem. 

How to Create Voice Interface Design?

When I first tested our voice multimodal design I got nervous after just a few seconds, because I gave the system the ‘proper’ voice command but it did not navigate me to the page I wanted to.

You can see some funny and not that funny examples with Alexa, too. If you have time, I also highly recommend watching this short advertisement on youtube, so you get insights on how multimodal design can cause problems if it isn’t designed well. 

Previously, we provided an overview of the primary challenges and key insights essential for crafting voice-controlled designs

This is the story of our personal experiences, the challenges we faced, and the lessons we learned along the way.  Watch the video of the voice-controlled multimodal webshop created by our great colleague, Karthikeyan Krishnamoorthy. If you would like to try it out, open the prototype.

Our primary goal was to explore how sound can enhance an existing graphical user interface (GUI), specifically through the integration of voice in online shopping scenarios. This exploration was not only about improving current designs but also about gaining insights and experience that will enable us to deliver exceptional, innovative solutions for our clients in the future.

Our voice prototype, where besides a traditional search bar, you can enable Alexa

We often do internal projects or training to deepen our expertise and stay at the forefront of industry developments. This commitment to continuous learning ensures that we consistently provide our clients with the highest quality solutions.

Our Top 3 biggest Challenges

We encountered several challenges while prototyping our voice-controlled design concept. Here are the three major hurdles we faced.

1. Creating Seamless Navigation

Ensuring smooth navigation in response to voice commands proved essential for creating a seamless user experience, but it was far from straightforward. For example, when a user gave a command to search for shirts, the system needed to understand not just how to find those items but also how to navigate back to the previous page or section without friction. 

Unlike a simple click, voice navigation requires the system to interpret more nuanced commands, such as "go back," "previous page," or "return to search." This challenge highlighted the importance of building a robust framework for voice-based navigation that could handle various user intents while maintaining a natural and intuitive experience. 

We learned that designing effective voice navigation requires us to think like our users to anticipate their needs, potential frustrations, and how they would naturally expect to move through the system.

2. Product Recognition Accuracy

Product recognition accuracy was another major hurdle, particularly due to the subtle differences in similar-sounding product names. For instance, distinguishing between "shirts" and "shorts" might seem simple to a human ear, but for a voice-controlled system, these nuances can lead to significant errors. 

We found that implementing advanced natural language processing (NLP) and machine learning algorithms was crucial, but even then, training the system to understand context and user intent was key. For example, if a user is already browsing a clothing section, the system should prioritize related items, reducing the likelihood of confusion between similar-sounding products. 

3. Clear Descriptions and Onboarding

Voice interfaces require a completely different approach to onboarding and user guidance compared to traditional visual interfaces. We quickly realized that clear and concise instructions were critical for ensuring users could effectively interact with the system. However, this needed to be balanced carefully to avoid overwhelming or frustrating users. 

Through our journey, we learned that prototyping voice-controlled designs requires a deep understanding of user behavior and expectations. Let’s see our key suggestions that will help you if you ever think about designing voice controlled digital products. 

An in-home Alexa device

12 Suggestions for Designing Voice-Controlled Digital Solutions

Here are our top 12 recommendations. These insights are drawn from our experience and can help you navigate common pitfalls and create a more seamless user experience. 

1. Start with Deep User Research:

Before diving into design, take the time to truly understand your users. Engage them in conversations about their needs, daily challenges, and how voice control might simplify their lives. It’s crucial to ask if voice control is genuinely the best solution for them or not. Sometimes, a simpler or more traditional interface might be more effective. 

Also, don’t forget to explore the context in which your users will interact with your voice-controlled product. Consider their environment (noisy, quiet, busy, etc.) and tailor the voice interaction to address these factors too to ensure that it will add value and will fit seamlessly into their daily routines.

2. Craft the User Journey with Voice in Mind:

Map out the user journey, focusing on the moments where voice interaction will be most beneficial. Identify the steps where voice commands can simplify tasks or remove friction. Consider common user paths and how voice can enhance these experiences. 

Also, think about scenarios where users might switch between voice and touch, and how to make that transition as smooth as possible.

3. Dive Deep into Command Research:

Conduct thorough research to understand the specific voice commands users are likely to use. This isn’t just about knowing the keywords: they need to be intuitive and match users' mental models. Interview users, analyze existing voice interfaces, and identify the most common commands that will take users from point A to B without frustration. 

The goal is to make voice interaction feel as natural as possible.

A researcher analyzing data

4. Design Intuitive Voice Commands:

Based on your research, carefully craft voice commands that are easy to remember and align with the way users naturally speak. Avoid complex phrases or jargon. The commands should be concise, straightforward, and should mimic natural conversation as closely as possible. 

Consider the context: what might a user say in this situation, and how can you design the command to match that?

5. Implement Clear Feedback:

Always provide clear feedback after a voice command is given. Users should know whether the system understood their request and what action is being taken. If there’s an error, offer helpful suggestions or alternative commands to keep the user engaged and reduce frustration.

For instance, the system can reply like: ‘The blue shirt has been added to your cart. Would you like to continue shopping or proceed to checkout?’

6. Seamless Navigation and Contextual Awareness:

Ensure that navigation via voice commands is smooth and contextually aware. The system should remember the user's previous actions and provide relevant options without requiring them to start from scratch. This is particularly important for tasks that involve multiple steps or require users to switch between different sections of the product.

For this, iterative testing is a key. Test multiple options, learn from what doesn’t work, and refine the experience until it’s intuitive. Remember, voice interactions can’t be seen, so the flow must feel natural and logical.

7. Make Voice Options Obvious and Accessible:

Ensure that voice control is not just a secondary feature, but a core aspect of the user experience. Design your voice-controlled product to be accessible to all users, including those with disabilities. Consider users with visual impairments, hearing loss, or mobility challenges, and ensure the system can be easily navigated by voice alone. Inclusivity should be at the forefront of your design process.

To achieve this, continuous user research is vital to understand what works well and to identify areas for improvement.

8. Design Robust Error Handling:

Voice commands are not always understood perfectly, so be prepared for errors. Design your system to handle misunderstandings nicely, offering clear and friendly guidance to help users get back on track. Consider how your system will respond if it doesn’t understand a command - does it ask for clarification, offer alternatives, or suggest the next step? 

Thoughtful error handling can transform a frustrating moment into a positive user experience, turning potential disappointment into an opportunity for user satisfaction.

9. Provide a Clear Onboarding Process:

Voice interfaces are still unfamiliar to many users, so a well-designed onboarding process is crucial. Introduce users to the key commands and features gradually, using interactive tutorials or hints. Make sure they understand how to use the system effectively without feeling overwhelmed.

10. Test, Iterate, and Gather Feedback Continuously:

Usability testing is essential at every stage of development. Conduct tests with your target audience in different environments to gather feedback and identify areas for improvement. Use this feedback to iterate on your design, ensuring that the final product meets user needs and expectations.

A low fidelity prototype drawn on paper

11. Assess Voice API:

In multicultural and multilingual environments, it's essential that your voice-controlled product can understand and process various languages and dialects. An API that supports a broad range of languages will make your product accessible to a wider audience and ensure that users from different linguistic backgrounds can interact with it effectively.

12. Try out Protopie

We suggest using Protopie when adding voice control to your design. It helps you create interactive prototypes to visualise how voice commands work in your interface. Plus, Protopie's user-friendly interface and features make prototyping linear and easy, and also it allows designers to refine their voice-controlled designs efficiently.

The interface of Protopie

Final Thoughts

For future generations, multimodal design will be a fundamental requirement rather than a ‘catchy opportunity’. They will expect to interact with technology in the most convenient and intuitive ways, because that is what they get used to. They are growing up in a world where AI can solve their homework in seconds and where technological solutions are available for nearly all their questions. If technology doesn’t offer flexible interaction options, they may become frustrated, and if they are dissatisfied with a digital product or service, they can quickly switch to another option.

Generation Alpha will be the major consumer in a few years, so we need to start developing with their expectations in mind. Therefore, we strongly advise incorporating multimodal design solutions into your product. We are here to help you with it!

Searching for the right UX agency?

UX studio has successfully worked with over 250 companies worldwide. 

Is there anything we can do for you at this moment? Get in touch with us, and let’s discuss your current challenges. 

Banner saying "inspiring digital products designed for the future"

Our experts would be happy to assist with the UX strategy, product and user research, or UX/UI design.