Advanced User Research Techniques: UX Statistics (part 2)

Neil O'Donoghue
9 min readJan 2, 2017

--

This is part 2 of Advanced User Research Techniques. This post will focus on the use of Statistics as a User Research Technique. More specifically this post will look at the types of behavioural research, how to prepare data, and the factors to consider when running descriptive statistics including comparing means and the different variables in a test.

Statistics are useful when conducting design research to answer specific questions. They allow for better understanding of data and for teams to talk confidently in the numbers. Frequently used research methods for studying interfaces and applications, such as, observations, field studies, surveys, usability tests and controlled experiments are all kinds of empirical investigations that can be catagorised into three groups: descriptive investigations, relational investigation experimental investigation (Shneiderman et al, 2009; Rosenthal and Rosnow, 2008).

Figure 1: Relationships between descriptive research, relational research and experimental research (Lazar et al, 2010)

In any typical research project a combination of two or even the three investigation types may be used to get a deeper understanding of what happened and why something is happening (Lazar et al, 2010).

Descriptive investigations, such as surveys and focus groups, are often the first step of a research project, that focus on identifying an accurate description of a situation or a set of events. For example, a researcher may observe 5 out of 10 kids who watch football on TV being able to hit a target by kicking a football, while only 2 out 8 kids who don’t watch football hit a target). This might be insightful however, it does not establish if there are correlations or relationships between factors or explain why certain things happen.

Relational investigations enable a researcher to establish if there are relationships between factors in a situation or set of events. For example, if teenagers who read books for 2 hours per day improves spelling compared to teenagers who don’t read 2 hours per day. Relational investigations enable discovery of connections between events (spelling ability) and variables (amount of time spent reading). However, relational studies are not suited for determining the causal relationships between multiple factors (Cooper and Schindler, 2000; Rosenthal and Rosnow, 2008).

Experimental investigation enables the identification of causal relationships. It can tell how a situation or set events occurred and, in some experiments, why it happened. Experimental studies are often used in the field of medicine to identify treatment methods for disease or to create better drugs.

After an experiment is conducted the next step for a researcher is to analyse the results statistically. However, many critical decisions need to be made, such as the type of statistical method to use, the confidence threshold and interpretation of the results. Wrong method selection and misinterpretation of results can lead to false/inaccurate conclusions (Lazar et al, 2010).

Preparing Data

Data collected from experiments, usability tests, field studies and surveys needs to be carefully processed before statistical analysis can be conducted. This is done to identify and rectify errors and mistakes in the data that might contaminate the entire data set, to identify higher level coding themes and to organise the data into predefined layouts or formats depending on what software is being used (Delwiche and Slaughter, 2008)

Coding the Data

In many studies, original results and data collected, such as demographic information from a survey will need to be coded before conducting any statistical analysis. Coding involves assigning a numerical value to a response. For example, for coding gender in a demographic study males and females could be assigned the numerical values 1 and 0 (female = 1. Male = 0) This makes it easier and possible to theme, group and sum up data and values (Lazar et al, 2010).

Descriptive Statistics

Once the data is cleaned up, it is useful to run descriptive statistical tests to understand the nature of the data collected such as the range in which the data points fall into or how the data points are distributed. The most commonly used measures are means, medians, modes, variance, standard deviations and ranges.

The central tendency is where the bulk of a data is located and can be measured by the mean, median and mode (Rosenthal and Rosnow, 2008). The mean measures the “arithmetic average” and can be used to show how groups relate to each other. If the mean of one group is much higher than another group, significance tests, such as a t test can examine if the difference is statistically significant (Lazar et al, 2010). The median measures the middle score of a data set, and mode measures the value with the greatest frequency in a data set.

The other important group of descriptive measurement is the measure of spread (Lazar et al, 2010). This relates to how much the data points deviate from the center or how spread out the data set is. This is measured by range, variances and standard deviation. The range measures the distance between the highest and lowest score, the variance is the mean of the squared distances of all the scores from the mean data set, and the standard deviation is the square root of the variance. A common way for distributing a data set is by normal distribution which can be defined by the mean and standard deviation (Image 1). Many attributes from various fields are distributed normally including, ages of populations, student grades and salaries of job types (Lazar et al, 2010). Applied to a UX design project this information is useful to identify specific user groups.

Testing the data set to see if it is normally distributed is necessary for selecting the type of significance test to conduct. If the data is normally distributed parametric tests are appropriate but if the data needs to be transformed so that they are normalised then non-parametric testing tests should be considered. Some applications like Microsoft Excel offers built-in functionality to test these. For more information on the measures refer to Hinkle, Wiersma, and Jurs, (2002) and Rosenthal and Rosnow, (2008).

Image 1: Normal Distribution Curve (taken from: Medical Dictionary)

Comparing Means

The ultimate aim for any researcher conducting user studies is to find out whether there is any difference between the conditions or groups (Lazar et al, 2010). In any case there are 2 ways to design the test. These are between-subjects and within-subjects.

Between-Subjects: 2 groups of participants are recruited and each group performs tests under different conditions. For example, evaluating the effectiveness of two checkouts; group one uses Amazon checkout to purchase a book, group two uses Ebay checkout to purchase a book.

Within-Subjects: 1 group of participants is recruited and performs tests under all conditions. In this case, the group of participants will purchase a book from Amazon followed by purchasing a book from Ebay.

In either case, the aim is to compare the performance measure of the two groups or conditions to find out whether there’s a difference (between Amazon checkout and Ebay checkout). The choice of experiment will depend on the what the desired information from the experiment is (MacKenzie, 2013). For example, if an experiment seeks to investigate the acquisition of skill over multiple sessions of practice, then within-subject should be used. If an experiment requires participants not to learn a behaviour between a set of task then between subjects is more suited. However, if there is a choice, Within-Subject is generally more preferred as less participants are required, and recruiting, testing and analysis is quicker than performing two sets of tests (MacKenzie, 2013).

In many studies, 3 or more conditions need to be compared. However, due to variances in the data it is not possible to directly compare the means of multiple conditions at the same time (Lazar et al, 2010). Instead, statistical significance tests need to be used to evaluate the variances. This is done by measuring and comparing variables in a test. There are 3 main variables in a test with multiple conditions. These are:

Independent variables: this is one condition changed in each experiment. In the above example the checkout/websites (Amazon and Ebay) are the independent variable.

Controlled variables: These are the measurements and methods used to measure the change in the independent variable. In the Amazon/Ebay checkout example this is the device the participant performs the task on (e.g. mobile/tablet) remaining the same, and the type of measurement used to determine the differences (e.g. time to complete task).

Dependent variables: this is the change that happens as a result of the independent variable changing. In the above example this is time increasing or decreasing as a result of switching from Amazon to Ebay to complete the task.

In any test, independent and controlled variables are conditions the researcher can control while dependent variables are usually outcomes the researcher needs to measure (Oehlert, 2000).

Significance tests suggest the probability of the observed differences occurring by chance. If the probability of the difference is less than 5% then a claim with high confidence can be made that the observed difference is due to the difference in the independent variables (Lazar et al, 2010).

Comparing the means of multiple groups can be done using various significance tests. Commonly used tests include t tests and the analysis of variance (ANOVA). When comparing two groups or conditions independent samples t test and the paired-samples t test can be used.

Studies that involve more than two conditions require the use of an ANOVA test. Commonly used ANOVA tests include one-way ANOVA, factorial ANOVA, repeat measure ANOVA and ANOVA for split-plot design (Lazar et al, 2010).

Below is a table that summarizes the appropriate significance test for each design.

Figure 2: Commonly used significance tests for comparing means and their application context. (Lazar et al, 2010)

Summary

There are many factors to consider when conducting statistical analysis. Preparation and collection of data is essential for grouping and interpreting the data sets and results. Knowledge and understanding of the various factors and careful thinking is required to establish what are the number of conditions, groups and variables associated with the projects predefined hypothesis(es). Only after identifying these is it possible to select the appropriate test group type (within-group vs between-group) and significance test to apply.

Unlike web analytics, the relationships between variables can be tested to determine “why” something happens. Using one of the above examples, kids who read more are better at spelling. However, statistics like analytics only identify behaviour. They cannot be used to understand attitudes, emotion, motivation and frustration, which are all key components of user experience design.

Applied to agile software development the use of statistics may be difficult to implement due to quick iterations. It’s also difficult to identify and determine granular level testable hypotheses in short time frames. Working in sprints is time consuming and in smaller organisations with limited resources, could be difficult to include. Drawing from experience, more accessible, less time consuming, and less expensive research methods, such as user testing, interviews and analytics are more suitable than statistics to gather insight into user behavior.

However, for industries that would result in harm to an individual or extreme severity of circumstance such as in medicine, airline engine turbines or banking software the use of statistics would play a vital role.

The first post in this series discusses Analytics as an Advanced User Research Technique.

Reference

Cooper, D . and Schindler. P. (2000) Business Research Methods, 7th edition. Boston: McGraw Hill.

Delwiche, L. and Slaughter S. (2008) The Little SAS Book: A primer, 4th Edition. Cary NC: SAS Institute Inc.

Hinkle, D. Weirsma, W., and Jurs, S. (2002) Applied Statistics for the Behavioral Sciences, 5th edition. Houghton Mifflin Company.

Lazar, J., Feng, J.H. and Hochheiser, H. (2010). Research methods in human-computer interaction. Wiley.

MacKenzie, S. (2013). Within-subjects vs. Between-subjects Designs: Which to Use? Retrieved November 14, 2016, from http://www.yorku.ca/mack/RN-Counterbalancing.html

Oehlert. G (2000) A First Course in Design and Analysis of Experiments. New York: Freeman and Company.

Rosenthal, R. and Rosnow, R. (2008) Essentials of Behavioral Research: Methods and data analysis, 3rd edition. Boston: McGraw Hill.

Shneiderman, B., Plaisant, C., Cohen, M., and Jacobs, S. (2009) Design the User Interface: Strategies for effective human-computer interaction, 5th edition. Boston, Massachusetts: Addison-Wesley.

--

--

Neil O'Donoghue

Principal Product Designer. Curious by how people think, how things work and what makes something useful, usable and desirable