There are many ways to format rating scales. We recently investigated
Numerous studies highlight both the pros and cons of each of these formats. Our research on this topic in the context of measuring UX usually does not show significant differences depending on the format. When there is a statistically significant difference, the effect size is usually small. A notable exception in our recent work suggests that having only three response options on a rating scale is clearly insufficient to accurately define attitudes and sentiments, such as the likelihood of a product recommendation.
Until recently, we did not investigate the differences in the number and graphic formats of items (for example, clicking one to five stars to rate, or dragging a slider). We now have data collected using a two-item UX questionnaire, a UMUX-Lite with (1) standard Likert scales (linear numbers), (2) Amazon-style five-star elements, and (3) sliders. In this article, we will focus on comparing scores using numbers and sliders.
What are sliders?
Figures 1 and 2 show examples of the numerical and slider versions of the UMUX-Lite questionnaire.
Because of their design, sliders (sliders) require more physical space than numeric scales. Sliders come in a variety of formats. Some sliders display a limited number of answer options (for example, five), so the only difference from the standard numeric scale is the interface used to select an answer (click for numeric scale; drag for slider). Often, sliders cover a wide range of response options (for example, as shown in image 2, from 0 to 100). Some sliders require you to drag the slider to the desired position using your mouse, touchscreen, or other pointing device. This interface design is difficult for some users (e.g. Chyung et al., 2018).
Other slider designs allow, in addition to dragging a slider, to simply click on the desired position and then fine-tune as needed by combining drag and click (which is typical of sliders in MUIQ). Sliders of this type are also known as visual analog scales (VAS).
What does the research say?
The publications are quite controversial regarding the benefits of sliders over numeric scales. In general, everyone is in favor of sliders based on hypothetical benefits of psychometry and respondent engagement, but the results based on the data are mixed. For instance:
- Joyce et al. (1975) found that VAS was more sensitive than a four-point numerical scale.
- Sauro and Dumas (2009) found that the SMEQ (subjective mental effort in completing the questionnaire) of the 150-point VAS scale was slightly more sensitive than the seven-item SEQ.
- A number of researchers have reported better results for numeric scales compared to VAS, in terms of completion time (Couper et al., 2006; Rausch & Zehetleitner, 2014), completion rates (Couper et al., 2006; Davey et al., 2007), and respondent preferences (van Laerhoven et al., 2004; van Schaik & Ling, 2007).
- Respondents, especially in a clinical setting, sometimes have more problems with physically filling the VAS than with numerical scales (Bolognese et al., 2003; Briggs & Closs, 1999; Jensen et al., 1986).
- Toepoel and Funke (2018) found that users less often interact with sliders than with 4 radio buttons. In addition, they noted the lower performance of the sliders that require dragging the slider compared to the VAS sliders.
- Several studies comparing VAS with numerical scales ranging from 4 to 20 response options found no significant or practical difference in psychometric properties (Bolognese et al., 2003; Couper et al., 2006; Davey et al., 2007; Larroy, 2002; Lee et al., 2009; Lewis & Erdinç, 2017; Rausch & Zehetleitner, 2014; van Laerhoven et al., 2004; van Schaik & Ling, 2007).
An experiment comparing sliders and numeric scales
Since our main interest is in how differences in item format affect the user experience measurement, we created a Greek-Latin square to support in-subject comparison of UMUX-Lite scores for various streaming services (Netflix, HBO Now, Amazon Prime Video, Hulu, etc. Disney +). In particular, 335 participants from the American expert agency evaluated the services of streaming services from May to June 2020. And out of that larger sample, 180 participants rated them using sliders and numeric scales.
In this pilot project, there were three independent variables:
- Element format (linear numeric; VAS slider – see images 1 and 2)
- Rating context (rating for most recent service experience; rating for overall service experience)
- Order of presentation (numeric / recent, then slider / general; numeric / general, then slider / recent; slider / recent, then numeric / general; slider / general, then numeric / recent)
Participants were randomly assigned to one of four orders formed by overlapping item format, rating context, and presentation order. This allows the nuisance variables of the rating context and the order of presentation to be controlled throughout the experiment. In addition to the experiments that we do for ourselves, we also use this design research for our clients projects when we need to effectively control “side” variables and improve design accuracy.
Figure 3 shows the key result of the experiment – nearly identical mean values for numeric scores (83.9) and sliders (83.7). Analysis of variance, simultaneously assessing the statistical significance of all three main effects and their interactions, showed no significant effect (all p> 0.20 with 176 df errors).
Given the large sample size, this difference (0.2) was insignificant (t (179) = 0.48, p = 0.64). The 95% confidence interval for the difference ranges from -0.7 to 1.1, so at 95% confidence a difference of 0 is likely, but a difference greater than 1.1 is not.
When collecting data when studying user experience, it can be tempting to use sliders instead of the more standard linear numeric scales. It seems that they should better reflect the feelings and attitudes of the people who give the ratings (the slider should seem to them cooler).
However, the results of previously published research and our own research indicate that sliders do not offer a specific measurement advantage over commonly used linear numeric scales, especially if the linear numeric scale has at least five responses.
This does not mean that you cannot use sliders in your designs. But, if you are using sliders, you should be aware of their potential disadvantages compared to numeric scales. These include the need for more physical space, the possibility of lower completion rates, and greater complexity to use for some populations (especially when the user is required to drag the slider to the desired position). All of this can be avoided by using standard linear numeric scales.