Assessing the 10 Usability Principles for AI Interfaces

As a design agency, we must keep usability heuristics aligned with emerging technologies like AI. This way, we can apply professional, up-to-date evaluation practices for clients developing AI products and experiences. Iterating heuristics, that is, the pragmatic mental shortcuts of our users, ensures we master industry best practices.

This year is the 30th anniversary of Jakob Nielsen’s 10 usability heuristics. Heuristic evaluation, according to Nielsen, is a method for finding usability issues in a digital interface based on a set of evaluators. By revisiting these usability heuristics, we can maintain their usefulness as guides for creating human-centered AI interfaces. In this article, we will take a look at Nielsen’s 10 usability heuristics and give you a short review of the principles examined on the 5 most used AI platforms: ChatGPT, Microsoft Copilot, Runway, DALL-E 2, and Gemini.

Re-visiting the usability heuristics for human-AI interaction

The 10 Usability Heuristics for Human-Computer Interaction provide a useful framework for evaluating AI systems.

Created by Copilot

1. Visibility of system status

When users interact with these AI systems, they should be kept informed of what the system is doing. Providing this real-time visibility into the status keeps users aware of progress, sets expectations about response times, and helps them perceive the causality between their inputs and the AI’s outputs.

Without proper status visibility, users may become confused why it is taking some time for these systems to produce responses. They may retry inputs thinking it was not received correctly.

ChatGPT provides clear indicators when it’s searching and when it has completed a response. It provides typing indicators to show it’s generating a response, aligning well with this heuristic.
Copilot, similarly to Chat GPT, indicates when it’s processing information, thus maintaining user expectations.
Runway has a progress bar while generating images, which requires users to adapt to its artistic language. It could be made more apparent when an image is finalized.
DALL-E2 shows a progress animation and text during image generation. This makes the system status unambiguous. There are limited options to leave feedback.
Gemini does not indicate when it starts and finishes writing, and sometimes forgets to continue writing.

Adding writing indicators with explanations improves visibility. Overall, these systems usually aim to provide clear, transparent feedback at all stages to create a predictable interaction, build user trust over time, and reinforce it’s an intelligent system and not a black box. This is why most of these tools work.

2. Match between system and real-world

AI systems should use language, concepts, and examples that correspond to the real world and match the user’s context. Design interfaces should use language, concepts, and examples familiar to users and avoid internal jargon. It’s important to mimic real-world logic and patterns. Systems will be more intuitive and usable when they match the user’s mental models.

ChatGPT uses natural language, making its responses easy to understand. However, it occasionally generates responses that lack practicality or context: in some cases, it ignores parts of the prompts, but misunderstandings and AI hallucinations may also cause problems.
Copilot’s may not always align with industry-specific jargon.
Runway uses everyday terms to describe iterations of image creation.
DALL-E2 generates realistic images matching text prompts, but it needs to ensure that any textual or visual outputs resonate with users’ expectations and experiences.
Gemini writes in fairly natural language, but can sometimes deviate into incoherence.

These systems are still evolving day by day, and their responses are not fully validated yet. The information they provide can have mistakes or be entirely false. Overall, the AI systems should build outputs that correlate to the user’s prompt in an intuitive, natural way. The closer the match between the prompt and response, the easier it is for the user to understand and continue interacting. Matching the real world builds trust and a sense that the AI understands the concepts. A mismatch will confuse users and undermine the experience.

3. User control and freedom

It’s essential that these platforms provide users with clear ways to exit or undo actions if they make a mistake or change their mind. If there is an opportunity to easily reverse a step, it can enhance user control and freedom, building confidence in utilizing the full system capabilities.

ChatGPT allows the user to provide follow-up corrections and guidance.
Runway, Gemini, and DALL-E2 let users retry with different inputs, but they should offer clear options for users to adjust or revert generated content, possibly through versioning or editable parameters.
Copilot can be accepted, rejected, or ignored.

People feel more comfortable exploring these AI systems knowing they can easily back out of unwanted outcomes. Without clear escape hatches, users can feel trapped once initiating a process. They may avoid certain interactions for fear of being stuck. This undermines adoption. Most systems currently lack undo or edit features that would give users more control, but they do offer options to start a new chat or provide suggestions for further actions. Some have voice control to add prompts without typing.

4. Consistency and standards

Adopting common patterns and standards could improve consistency. Also, these platforms must ensure that terminologies, actions, and outcomes are consistent with user expectations across similar tools. Using familiar interaction patterns and outputs reduces the user’s cognitive load. Leveraging familiar mental models makes the AI feel like a natural extension rather than a foreign outlier. Aligning with expectations integrates the experience seamlessly.

What ChatGPT and Copilot do is a good example: they adopt conversational UI norms, using common phrases and syntax when responding or generating code. The tone and style are predictable.
Runway and DALL-E 2 should generate images following standard perspective, lighting, and composition principles. Editing behaviors in Gemini should also align with typical image manipulation software.

Adhering to consistency and standards streamlines adoption and creates intuitive, easy-to-use experiences. Overall, leveraging established conventions improves usability and accessibility. It demonstrates awareness of how these AI systems fit into the broader technology landscape users are familiar with.

5. Error prevention

The main aim of this heuristic is that the system should minimize errors by both preventing error-prone conditions upfront and detecting potential mistakes before users commit. Error prevention in AI involves not only avoiding user mistakes but also anticipating and mitigating errors in AI-generated content. They should incorporate guidance to help users frame requests effectively and offer real-time adjustments based on potential misunderstandings or inaccuracies in AI outputs. One more characteristic of AI tools is that it’s hard to predict all of the potential errors, so a lot of testing is required to discover edge cases.

ChatGPT could warn if a user prompt may lead to harmful, unethical or dangerous content before generating a response. Copilot might flag potentially incorrect code suggestions for user confirmation.
Runway and DALL-E 2 can check image generation prompts for issues prior to processing, as they have NSFW filters. Gemini, on the other hand, could analyze edits before applying them to avoid degrading image quality.

Where possible, the AI systems should constrain inputs and steer users away from known bad outcomes. When errors can’t be eliminated entirely, the system should give clear confirmation messages to prevent mistakes. Preventing errors not only avoids frustration, but builds user trust that the AI will act as an assistant, and not take harmful or unpredictable actions. This provides a layer of safety and control – however, users should always be careful and consider to what extent they trust the AI, as it is not yet capable of realizing its own mistakes.

Overall, error prevention enhances the experience by guiding users, optimizing for positive outcomes, and minimizing unnecessary mistakes and backtracking. It demonstrates the thoughtfulness and care put into the AI design.

6. Recognition rather than recall

Applying this heuristic means that interfaces should minimize the need for users to memorize information when operating them. Information should always be visible in the interface or conversations so they can better guide users on effective prompts and inputs. AI interfaces have to minimize the user’s memory load by making options, commands, and potential actions visible or easily retrievable.

ChatGPT relies more on the user remembering what to ask, but a good thing is that it shows the user’s prompt to provide context for its response.
Copilot utilizes conversational memory effectively but could improve by suggesting related queries or commands
Runway and DALL-E 2 provide suggestions based on the user’s inputs, reducing the need for a recall, also they show the prompt that has been used, they should keep the image generation prompt visibly paired with results.
Gemini should preserve the original photo along with edits.

Reducing dependence on memory lessens the user’s cognitive load. They can rely on recognition instead of having to recall details from previous steps. Minimizing recall makes AI systems easier to use for a wider range of users. Interfaces that require heavy memorization create accessibility barriers.

7. Flexibility and efficiency of use

AI tools should cater to both inexperienced and expert users, offering shortcuts or advanced features that can speed up interactions for frequent, expert users without overwhelming novices.

ChatGPT can be helpful for creative writing to a certain extent, and rewards more thoughtful prompts with better responses, but it could accept commands like /rephrase or /expand to modify its responses for power users.
Copilot excels in coding assistance, and it also might offer snippet insertion shortcuts and variable auto-complete for rapid coding.
Runway and DALL-E 2 could support batch image generation with templates, chaining, and callbacks for advanced users.
Gemini could enable customizable filters and one-click undo/redo.

Enabling efficiency customizations cater to both beginners and experts. Novices use the basic interfaces while veterans can activate advanced options for accelerated workflows. Without shortcuts and tailoring, expert users may find the AI systems limiting. Providing flexibility allows a broader set of users to integrate the AI into their own processes. Overall, allowing power users to optimize interactions to their needs demonstrates thoughtful design. However, these advanced options should stay hidden until consciously activated to avoid confusing new users.

8. Aesthetic and minimalist design

These interfaces are generally clean and minimalist and they adhere to this principle, but continuous evaluation is necessary to ensure that new features or information do not compromise design clarity.

ChatGPT and Copilot conversations should stay streamlined, without cluttering the chat with secondary content.
Runway and DALL-E’s image generation interfaces should only expose key parameters.
Gemini’s photo editing UI should spotlight the core tools needed for common adjustments, hiding advanced technical controls.

Keeping the interfaces simplified and visually minimalist focuses user attention on key content and functionality. Removing irrelevant options reduces cognitive load. Overly dense interfaces overwhelm users, undercut usability, and make the AI feel opaque. An aesthetic, minimalist approach highlights what matters most. Well-designed interfaces should have the visual clarity and power of a sharp photograph: drawing the eye to the subject while fading unnecessary details into the background.

9. Help users recognize, diagnose, and recover from errors

AI-specific errors, such as misunderstanding a prompt or generating inappropriate content, require clear, understandable feedback and straightforward correction paths. When errors inevitably occur, these AI systems should help users understand the problem and how to get back on track. Plain language error messages should explain what happened and why.

DALL-E 2 should state “Could not generate image – prompt contains prohibited content” when image generation fails. The error should provide clear, actionable next steps to recover, while defining which type of content is prohibited.
ChatGPT when a prompt is problematic or not entirely true, it could suggest rephrasing it.
Copilot could link to the relevant documentation for a code error.
When image generation errors occur in Runway or photo editing failures happen in Gemini, plain language error messages should explain the specific issue and how to resolve it. The error could state “Image generation failed – JPEGs not supported. Please upload PNG, GIF or SVG files.” If Gemini runs into issues merging edits, it could indicate “Unable to apply adjustments – image exceeds size limits. Please try reducing resolutions or number of layers.”

Visual treatments like color, icons, and animations should call attention to errors so users do not overlook them. Good error handling guides users to recognize, diagnose and overcome errors. Without support, people can feel confused and frustrated when issues arise. Purring care into errors establishes trust and confidence that the AI can gracefully handle unpredictable situations. This way, users gain resilience skills to productively move forward.

10. Help and documentation

While the ideal experience is fully intuitive without assistance, helpful documentation can support users and improve adoption. These AI systems should provide easy access to documentation explaining core capabilities, limitations, and best practices. When these assistants offer help, it’s often scattered, incomplete, or not specific to tasks. More context-sensitive documentation tailored to specific use cases and tutorials integrated into the workflow would be a game-changer.

ChatGPT itself explains how to interact with it, but it could link to FAQs from conversations.
Contextual help could overlay on Runway and Gemini interfaces, and also Gemini lacks any help or documentation.
DALL-E2 has basic built-in hints.
Microsoft provides Copilot documentation on their site.

Content should be written with the user’s goals and terminology in mind, not developer jargon. Instructions should break down tasks into clear, concise steps. Search should make it easy to find help for common use cases. Tutorials and examples can further build user competence. Thorough documentation increases safe, effective use of the AI. However, over-reliance on help indicates the interfaces could be more intuitive. Strive for self-evident interactions. Well-designed help emphasizes learning over troubleshooting. It demonstrates care for enabling long-term user success, not just resolving immediate confusion.

Refining Evaluation Heuristics for the AI User Experience

As we have seen, the 10 classic usability heuristics continue to provide a strong foundation for evaluating and enhancing the user experience of AI systems. While the core principles hold up well, some expansion and refining of the heuristics is likely needed. The keys will be accommodating AI’s emergent capabilities while centering human needs and ethics at every turn. With the rapid advancement of AI systems, some aspects of the 10 Usability Heuristics may need re-examining or amending to continue serving as effective evaluation guidelines. Here are a few thoughts on what may need updating:

The “match between system and real world” heuristic could be expanded to also consider whether the system’s capabilities and limitations are communicated clearly to users. As AI grows more capable, setting appropriate user expectations will require even greater care.
The “user control and freedom” heuristic should account for the fact that some loss of control is inherent in delegating tasks to AI systems. However, users should still be empowered to override or adjust behaviors where possible.
“Error prevention” may need to weigh protection from harmful content against the risk of limiting beneficial creativity and discovery. More nuance may be required in this area.
“Help and documentation” should consider whether systems provide context-specific guidance at moments users need it. Documentation alone may not suffice as AI grows more conversational.

Image generated by DALL-E

Can There Be New Heuristics to Consider?

Although Nielsen’s heuristics are universal and provide a strong foundation, we’ve concluded that in the world of AI, it might be valuable to consider new heuristics specifically tailored to AI user experiences.

By expanding our evaluation frameworks, we can strive to create AI experiences that are not just usable, but socially responsible. While the basics endure, adapting heuristics to new technologies is key to upholding human values amidst AI’s rise. Nielsen’s fundamentals get us started on the right foot, but may only partially cover AI’s expanding terrain. With care and foresight, we can walk further down the path of ethical, humanistic AI UX design. Here are some examples of what can be new elements of the heuristics to consider in the future.

“Transparency” heuristic could be helpful. System should be fully open about what the AI system can and can’t do, as well as what data it was trained on. Without transparency, users may not trust the system. Making capabilities, limitations, and training data clear manages expectations and builds user confidence.
Users shouldn’t feel like their AI assistant is a black box. Understanding how outputs are generated builds trust and allows for better collaboration.
AI assistants shouldn’t amplify biases or promote harmful content. They should be aligned with the user’s ethical principles. Careful design that considers potential biases and incorporates mechanisms for user feedback and control is essential.
“Value alignment” heuristic could help assess whether the system’s goals align with human values. Without shared values, even a capable system can produce harmful outcomes.
“Explainability” should make it easy for users to understand why certain outputs or behaviors occurred. Lack of explainability erodes trust. The goal is effective communication without excessive personification.
“Flexibility and Efficiency of use” should also consider customizability and personalization. As AI becomes more integrated into individual workflows, adapting to user needs and contexts will be key. A one-size-fits-all approach doesn’t work in AI. Assistants should learn and adapt to individual users’ needs and preferences over time, offering an increasingly personalized experience. Incorporating machine learning algorithms that personalize output, suggest relevant features, and adjust to user feedback becomes crucial.
Consider a new heuristic around “Human-AI collaboration” as well, that assesses how well the system supports fluid teamwork with human counterparts. The goal should be complementary capabilities, not replacement.
Expand the “user control” heuristic to also account for user autonomy and consent over data collection. Usage data can improve the system but overreach is a risk.
Bias and Fairness heuristics could help AI systems recognize and mitigate biases in their outputs, requiring guidelines for equitable and inclusive design practices.
Privacy and Data Security: with AI processing sensitive data, users need assurances about data handling practices, suggesting a heuristic centered on privacy preservation and user trust.

Summary

While the core heuristics remain relevant, some expansion and refining are likely required to accommodate AI’s emerging capabilities and focus on human needs and ethics.

However, the analysis also reveals opportunities to iterate on the heuristics to make them even more applicable to AI.

As AI capabilities grow, evaluation guidelines must evolve to maintain human-centric design, trust, and ethical alignment. Refining heuristics like Nielsen’s for the AI context will enable continued assessment of usability and progress toward truly human-friendly AI systems.

This article explored how to update classic usability heuristics to responsibly guide AI’s rapid growth. By refining enduring principles and proposing ethical new measures, we can uphold human values as AI capabilities advance that can support products with AI systems.

We have also tested the best AI tools for Design and Research. Check our website for the latest articles.

Searching for the right UX agency?

UX studio works with rising startups and established tech giants worldwide.

Should you want to improve the design and performance of your digital product, message us to book a consultation with us. We will walk you through our design processes and suggest the next steps!

Our experts would be happy to assist with the UX strategy, product and user research, pr UX/UI design. Check out our full list of services here.

Mariann Fülöp

An enthusiastic digital creator who is not afraid of paper & scissors. Nature and coffee addict.