We recommend that you begin by
answering questions before interacting with the simulation.
General Instructions
This is a demonstration of a very complex issue. Experts in the field disagree on how to interpret differences on an ordinal scale, so do not be discouraged if it takes you a while to catch on. In this demonstration you will explore the relationship between interval and ordinal scales. The demonstration is based on two brands of baked goods. The data on the left side labeled “interval scores” shows the amount of sugar in each of 12 products. There are two columns of data. The column labeled “Brand 1″ contains the sugar content of each of 12 brand-one products. The second column (“Brand 2″) shows the sugar content of the brand-two products. The amount of sugar is measured on an interval scale.
A rater tastes each of the products and rates them on a 5-point “sweetness” scale. Rating scales are typically ordinal rather than interval.
The first objective of this demonstration is to let you see concretely what it means for a scale to be an ordinal scale. The scale at the bottom shows the “mapping” of sugar content onto the ratings. Sugar content between 37 and 43 is rated as 1, between 43 and 49, 2, etc. Therefore, the difference between a rating of 1 and a rating of 2 represents, on average a “sugar difference” of 6. A difference between a rating of 2 and a rating of 3 also represents, on average a “sugar difference” of 6. Therefore, the orginal ratings displayed are on an interval scale. They are rounded off, but they are on an interval scale. It is likely that real ratings would not be on an interval scale. You can change the cutoff points between ratings by moving the vertical lines with the mouse. As you change these cutoffs, the ratings change automatically. For example, you might see what the ratings would look like if people did not consider something very sweet (rating of 5) unless it was very very sweet.
Step By Step Instructions
The first product in brand one has 38 units of sugar. Take a look at the scale at the bottom of the window. It shows that any value between 37 and 43 would be rated 1. That’s why the rating for the first product is a 1. Now look at the 5th product from Brand 1. It has 45 units of sugar. Since this is more than 43, it is rated 2. Examine other products and make sure you understand how the sugar contents combined with the scale produce the ratings.
This demonstration allows you to change the way sugar units are transformed into ratings. Lets suppose that out rater would not give a brand a sweetness rating of 1 unless it was truly not sweet at all. For example, the rater might only give ratings of 1 if the sugar content was less than 40. To see what would happen, move the vertical line above 43 to the left. Notice that as you move it, its label reflects its current value. Keep moving it to the left until it equals 39. Now look at the ratings of the products. With our original “mapping,” the first 4 Brand-one products were rated 1. Now the only the first product has so little sugar to get a rating of 1. Now suppose that our rater was pretty generous in awarding 3′s. Lets say that all a brand needed to get a 3 was a 43. So move the divider between 2 and 3 from 49 to 43. And, just for the sake of the example, let’s assume that the rater required a product to be very sweet to get a rating of 4. Specifically, lets say that it needed a sugar content of 60. Move the divider between 3 and 4 from 56 to 60. Notice how the ratings are automatically updated. Finally, lets assume that our rater does not require much more sweetness in order to give a rating of 5. So lets leave the cutoff between 4 and 5 at 62.
Our rater is generating very “non-interval” ratings. A difference between a 4 and a 5 could represent, at most, 2 units. In contrast, a difference between a 2 and a 3 could represent as much as 17 units.
Now consider how the mapping of the sugar content onto the sweetness rating affects our interpretation of the difference between Brand 1 and Brand 2. The mean difference in sugar content is 55. With the original mapping, the mean difference in ratings is 0.69. When we changed the mappings so that the boundary between ratings of 1 and 2 became 39, between 2 and 3 became 43, and the boundary between 3 and 4 became 60, the difference in ratings became 3.23-2.85 = 0.38. You can see that the mappings did make a difference. But qualitatively, whether you were looking at the sugar content or the ratings, you would conclude that Brand 2 is sweeter than Brand 1. Experiment by changing the various boundaries. You will find that qualitative conclusions based on the mean ratings are valid.
Now choose Data Set 2. Just as with Data Set 1, the mean difference in sweetness is 5.0. The data are quite, different, though. Brand 2 has the three lowest sweetness levels as well as the three highest. Brand 1 is in the middle. The initial difference in ratings is 3.08-2.62 = 0.46. If you change the boundaries you get slightly different results, but you will probably find that the mean difference on the ratings is not misleading. However, there are ways of getting a misleading result. Notice that there are four 43′s for Brand 1 and that these are associated with sweetness ratings of 2. Move the boundary between ratings of 2 and 3 to 42. Then you will see that the ratings for these products changes from 2 to 3. With all the sugar contents of 43 receiving a rating of 3, Brand 1 now has a higher mean sweetness rating than Brand 2 even though the mean sugar content for Brand 2 is higher. This effect can be made even larger by moving the boundary between 4 and 5 to 75. This will lower the ratings for the Brand 2 products with 71 and 72 from 5 to 4 thus lowering the Brand 2 mean without affecting the Brand 1 mean. The mean for Brand 1 will be 3.23 compared to a mean for Brand 2 of 2.92. Again the important point is that even though Brand 1 has a lower mean sugar content than Brand 2, it has a higher mean rated sweetness score.
Summary
When an interval scale such as sugar content is mapped onto a rating scale such as judgment of sweetness, the resulting rating scale is probably not an interval scale. For most real-world situations, the means of ordinal-level rating scales allow valid conclusions about the direction of the means on the interval scale. However, it is theoretically possible for means on the ordinal scale to be in the opposite direction from means on the interval scale. Experts disagree on the importance of this in real-world data analysis. We believe that the chances of misinterpretation with real data are extremely low, and that it is only with contrived artificial data and mappings of the interval to the ordinal scale that these problems occur.
Questions
Begin by answering the questions, even if you have to guess. The first time you answer the questions you should not check your answers. Once you have answered all the questions, answer them again using the simulation to help you. The second time through click the “Check Answer” button to get feedback.