UX Research
UX Case Study
March 10, 2025

Using data science methods in UX research: a case study

Fanni Zsófia Kelemen

When most people think about UX research, they picture usability tests, user interviews, and carefully crafted surveys. While these traditional methods form the backbone of our work as UX researchers, sometimes we need to think bigger – much bigger. In this case study, we'll share how we combined the powers of UX research and data science to unlock insights that would have been impossible to discover through conventional methods alone.

Background

UX research has always been about understanding how people interact with products and services. But what happens when you have thousands of user comments to analyze? 

In this article, we'll walk you through a recent project where we faced this exact challenge. You'll see how we used data science techniques like topic modeling and sentiment analysis to enhance our traditional UX research methods, what we discovered, and most importantly, how this combination of approaches led to better outcomes for our client.

 image summarizing what sentiment analysis and topic modelling is: Sentiment analysis: The process of analyzing text to determine its emotional tone.(mostly positive or negative) topic modelling: A type of statistical modelling that identifies groups of similar words in a text

Before we jump into the project details, let’s see some examples of when data science can be useful in UX research.

Use cases for data science in UX research

In user churn prediction, data science algorithms can analyze patterns in user activity (like login frequency, feature usage, and engagement metrics) to identify early warning signs of users who might leave the product. This allows teams to proactively address issues before users actually churn. 

Similarly, user segmentation becomes more sophisticated with data science. Instead of relying solely on demographic information or basic usage metrics, machine learning algorithms can identify natural groupings of users based on complex combinations of behaviors, preferences, and interaction patterns (like what features they use and how, when they are active, session duration, etc.). These algorithms might discover that users naturally fall into groups like "power users who mainly work late at night" or "occasional users who focus on specific features". These insights can inform both product development and targeted UX improvements for each segment.

These examples just scratch the surface of data science applications in UX research. From predicting user satisfaction scores to automating the analysis of open-ended survey responses, data science methods can significantly enhance the modern UX researcher's toolkit.

Banner saying "research-based design for measurable results. Message us."

If your company has encountered similar challenges, or would simply like to become more data-driven and strategic, UX studio can help. By combining UX research with data science methods, we have the best chance to find the answers (and the solutions) you’re looking for. Book a free consultation with us and let's discuss how we can help. 

Case study

The challenge 

It all started when our client approached us with an ambitious goal: developing a new application in a competitive market. Before making this significant investment, they needed data-driven insights to validate their strategy.

Research Objectives

The client sought to understand 3 main things:

  1. The competitive landscape 
    • Deep dive into nine major market competitors
    • How users perceive each competitor
  2. Factors that influence user acquisition
    • How users discover and decide to download competitor apps
    • What encourages them to try a new app
    • What factors influence their willingness to download an app
  3. Factors that influence user retention
    • What makes users stick with a competitor app long-term

Project Constraints

We faced a challenging one-week timeline for the initial research phase. However, we tried to look at the bright side: delivering strong initial insights would unlock resources for deeper investigation. This made rapid, high-quality execution essential.

The solution

To address this request, we decided to leverage data science techniques on existing data to identify key themes and user perceptions. We decided to combine sentiment analysis and topic modeling for two main reasons:

  1. To understand the user sentiment related to each competitor
  2. To extract common patterns about app discovery, adoption, and retention

Our goal was to develop hypotheses from these findings that can inform further user research, like interviews and surveys, to refine the concept of the new app.

The methods 

The data collection process

In our case, the first step was to get the data that we could base the analysis on. Due to the short timeframe, we wanted to rely on already existing data. However, you can also use sentiment analysis and topic modeling with data that you collected yourself, for example, through surveys. 

There is a huge amount of data available on the internet for any given topic: you can think of reviews, online forums, social media, or scientific publications. We decided to use product reviews of our nine competitors as these offered direct insight into the user sentiment.

To get the data, we used publicly available reviews. We made sure we only got reviews from websites that allow data collection. We ended up having between 2000 and 4000 reviews for each competitor.

An iPhone displaying its home screen is shown next to several app reviews with star ratings.
We used similar reviews for our sentiment analysis and topic modeling.

The next step was to organize and clean the data. To do this, we first needed to merge the connected datasets. This means we needed to organize reviews from different sources, about the same competitor, in a single file. To make this process quick and efficient, we developed a custom Python script for data merging. This script also made sure to standardize the data format for consistent analysis.

The Python script used to merge datafiles
The script we used to merge datafiles.

Sentiment analysis

Sentiment analysis is the process of analyzing text to determine if the emotional tone of the message is positive or negative (or neutral in some cases).

There are many different ways to approach sentiment analysis. The 3 most common ways are the following: 

  1. Lexicon-based methods that rely on a set of predefined rules and heuristics to determine the sentiment of a piece of text
  2. Traditional machine learning approaches that train on labeled data using algorithms
  3. Deep learning solutions using complex neural networks and pre-trained models trained on massive amounts of text data

While rule-based approaches are simpler to implement, deep learning typically offers the highest accuracy. However, it also requires more computational resources and training data.

In our case, we went with the more simple lexicon-based solution, using Python's Natural Language Toolkit (NLTK). The biggest reason was the time pressure: NLTK is beginner-friendly yet powerful, with built-in lexicons and pre-trained models that make implementation straightforward. The library's VADER (Valence Aware Dictionary and Sentiment Reasoner) tool is particularly valuable as it understands emoticons, slang, and punctuation emphasis. 

While not as sophisticated as deep learning approaches, NLTK provides good baseline results with minimal setup and computational resources. We used it to get an overview of how users perceive the different competitors. 

To do this, we used a custom-made Python script that does the following:

  • Imports the necessary packages and datasets
  • Pre-pocesses the text from the dataset
  • Categorizes the sentiment as positive or negative
  • Counts the number of positive and negative reviews
  • Visualizes the positive-negative  review ratio in the form of a bar chart and a pie chart 

As we can’t show the results from our actual project for data privacy reasons, we are showcasing the results with different data. We ran the same Python script with reviews from a popular neo bank to generate the chart that you can see below. 

These charts are highly customizable, we are using a very simple layout with accessible colors here. 

After we ran the analysis for each competitor, we compared the results between them. Based on this, we could see which competitors had a high number of negative reviews and we could start digging into the reasons behind this — first, with topic modeling. 

Topic modeling 

Topic modeling is a natural language processing technique that automatically discovers the main themes or topics present within a document or collection of documents. This technique is particularly useful for analyzing large volumes of text data, helping researchers and analysts understand the key subjects being discussed without having to read through every document manually.  In our case, we used it to get an overview of the most commonly mentioned topics in the reviews, without having to read through the thousands of reviews one-by-one. 

Think of it as organizing a pile of documents into natural categories, like sorting through thousands of news articles and automatically finding that some are about sports, others about politics, and others about technology. The algorithm identifies groups of words that frequently appear together and uses these patterns to uncover the main themes, without needing predefined categories.

For example, in a set of customer reviews, topic modeling might reveal distinct themes like "price," "customer service," and "product quality" based on the words commonly used together in those contexts. 

Topic modeling can also be approached through several methods. BERT-based topic modeling,  which is the one we used, is one of the more advanced methods. It was important for us that the model would produce good results with short sentences and comments. Also, the documentation is quite detailed and well-written, which was a big help for us.

However, it’s important to note that BERTopic categorizes every single piece of text as a single topic. This means it doesn’t recognize if there are multiple topics mentioned in the same review. This was quite an important limitation that we needed to keep in mind when performing the analysis. 

In the picture below, you can see how BERTopic recognizes different topics and tells you how many times each theme has appeared in the text. It also gives you examples for each topic. For privacy reasons, we’re not showing the analysis for our actual project, but instead, the topics identified for a banking app. 

The script that we used for the analysis does the following:

  • Imports the necessary libraries and datasets
  • Extracts topics across all documents
  • Displays all the topics that the model identifies
  • Looks at the most frequent topics one-by-one
  • Visualizes these topics using various methods
  • Creates interactive visualizations so you can zoom in on the things that interest you
  • Visualizes the reviews inside the topics to see if they were assigned correctly, and whether they make sense

For example, in the intertopic distance map, you can zoom in and out of certain topics and see their relative distance. You can see a zoomed-in version in the picture below, but if you zoom out, you can also see all topics in a coordinate system. 

Please note that for all the visualizations, we’re using data for a banking app instead of the actual project dataset to respect the privacy of our client. 

example of intertopic distance map where several circles represent different topics, with their size indicating topic prevalence.

The similarity matrix visualizes topic similarity using different colors.

Example of a similarity matrix heatmap, showing the similarity scores between different topics. The color gradient represents similarity scores, ranging from light green (low similarity) to dark blue (high similarity).

The hierarchical clustering shows you the connection between the given topics. 

Example of hierarchical clustering chart showing the connections between topics

Finally, the last visualization allows you to look at the individual reviews one-by-one, while you also see their relative distance. If you click on each dot, you can see the actual review word-by-word. This allows you to better check the accuracy of the model as you can see if the categories make sense or not.

Example of a scatter plot visualization of documents and topics, where each colored cluster represents a different topic.

The result

As the final step, we manually went through the visualizations that the scripts produced (like the ones we showcased in the previous sections of this article), and extracted insights from them. 

This is important for multiple reasons:

  1. To check if the model correctly categorizes sentiments and identifies topics
  2. To make any necessary changes in the analysis process or the visualizations
  3. To interpret the findings and identify common patterns 

The end result was a document containing the insights we gathered about all 9 competitors. For each competitor, we displayed the sentiment analysis bar chart and then wrote down the positive and negative topics based on the topic modeling. We closed the document with a summary of the common patterns and some unique features that we could further look into as discussed with the stakeholders. 

Findings

Here are some interesting findings from the topic modeling that we identified using reviews of a neo bank platform: 

  1. The positive-negative review ratio of the app is quite good: there are 7.5 times as many positive reviews as there are negative ones. 
  2. Most of the positive reviews focus on topics such as fast and reliable service, user-friendly platform, and smooth transaction processes. 
  3. Most of the negative reviews mention that it’s quite difficult to open or close an account, or to change personal data, like a mobile  phone number. It also seems like some users have a hard time logging in once they could set up their account. The lack of security features such as a face ID also came up in some of the reviews. 

Overall, our client was very satisfied with the results, especially the details about what users appreciate or dislike when it comes to certain competitors. The presentation started a discussion of what other topics we could focus on in the future when developing the concept of the new application. It helped the stakeholders decide what common behavioral patterns they should rely on and what unique app features they should look into. 

Banner saying "we deliver results with in-depth research. Book a meeting."

It also formed the basis of future research activities. Based on the findings on this large sample, we could collect our hypothesis and questions for an interview round that would provide crucial qualitative data about the underlying reasons for the behavioral patterns we identified. 

4 things to keep in mind

Through this project, we gained valuable insights into effectively incorporating data science techniques into UX research. In this section, we’ll walk you through the most important takeaways to set you up for success in a similar endeavor. 

1. Setting up takes time

Expect to spend at least 1 day getting your development environment set up with the necessary programming tools and packages. You may encounter setup challenges, but perseverance is key; solutions are often readily available online.

2. Handling errors

Prepare to encounter unexpected error messages. When you hit a roadblock, copy the error and search for it online. Python's large community means forums are filled with discussions and fixes for common issues. Thoroughly reading the documentation for your chosen libraries and models can also provide valuable troubleshooting guidance.

3. Ensuring legal compliance

When working with publicly available data, always adhere to the usage guidelines of your data sources. Maintain legal and ethical standards throughout the data collection process.

4. Staying adaptable

While following tutorials can be helpful, it's crucial to avoid blindly replicating others' methods. Stay flexible and adapt approaches to best suit your unique needs and goals. Experiment with different models and visualizations, and document your learnings along the way. The key is to approach this process with patience and a willingness to iterate. 

Main takeaways

Table summarizing when to use data science methods in UX research and how to plan with it in terms of timeline and tools

Embracing data science techniques as part of your UX research toolkit can be a valuable, if challenging, endeavor. As professionals, we must resist the temptation to stick to the familiar

By combining the scale and pattern-finding power of quantitative analysis with the depth and context of qualitative research, we unlock a more holistic understanding of user behaviors and needs.

For example, topic modeling can quickly surface high-level themes across large datasets, which can then be explored in-depth through targeted user interviews. This blended approach offers the best of both worlds —breadth and depth— to uncover insights that inform more impactful product decisions.

While the initial setup process may present challenges, the long-term benefits of integrating data science into UX research are well worth the effort. With patience, resourcefulness, and a commitment to continuous learning, researchers can confidently expand their methodological toolkit and deliver powerful, data-driven outcomes. 

Banner saying "get actionable insights from our researchers."

Need help getting started? Consult our team of expert researchers.