The other day, I was thinking back to the year 2000 and the website, Hot-or-Not. If you are not familiar with this website, the concept was simple: men and women uploaded their photos and then you rated their attractiveness on a scale from 1-to-10 (10 being the most attractive). As trivial as this may sound, rating on a scale from 1-to-10 actually stressed me out. Thirteen years later, the same thing holds true. I don't like scales from 1-to-10; and, I think the reason for this is that selecting an appropriate rating poses a serious user experience (UX) problem.
On A Scale From 1 To 10
In a complete vacuum, if looked at a scale from 1-to-10 where 1 is the least attractive and 10 is the most attractive, what would a 5 indicate? Logically speaking, a 5 is in the middle - it is neither attractive nor is it unattractive. So, if you rated someone a 5, logically speaking, they should neither feel (too) insulted nor (too) complimented.
The problem with this Spock-like analysis is that we don't live in a vacuum. In fact, most of us have at least 12 years of deeply influential experience under our belt. Namely school and scholastic testing. In school, tests were typically graded on a scale from 1 to 100, which can be easily mapped to a scale of 1 to 10 in the mind. And, what did school teach us about this scale:
- 1 - Fail
- 2 - Fail
- 3 - Fail
- 4 - Fail
- 5 - Fail
- 6 - Basically Fail
- 7 - Passing
- 8 - Good
- 9 - Great
- 10 - Excellent
Now, in this context, we have an extremely different picture. 5 is no longer indifferent; 5 is decidedly bad - very bad. And, so is 6 for that matter. Maybe when you get to 7 you can start to feel good; but, really, you should be aiming for an 8 or above.
My point here is not that test-taking is bad; my point is that rating things on a scale of arbitrary size is a shockingly complex user experience. Not only does it require a rich and consistent mental model, but it has no way of taking personal experience into account.
A better user experience (UX) would be to get a user to select an action that he or she would theoretically take. Different users will have different motivations; but at least with a selected action, we add a meaningful abstraction layer between analysis and outcome.
The Likert Scale - How Strongly Do You Agree?
The Likert scale is the most widely used approach to scaling responses in user surveys. The Likert scale starts to move in the right direction in that it is geared more towards Action rather than arbitrary scale. The problem with Likert, however, is that it pairs a set of nuanced choices with a non-nuanced statement.
A typical Likert scale question will pose a statement and then ask you how strongly you agree or disagree with said statement. So, for example, after staying at a Hotel, you may be presented with the following survey question:
I would recommend this hotel to my friends and family:
- Strongly Agree
- Strongly Disagree
This approach is almost good because it poses the question in terms of an action: recommending the hotel. The downfall here is that the user's relationship to said action is inappropriately nuanced. Actions are not nuanced; they are black and white. As Yoda once said, "Do. Or do not. There is no try."
Decision making is definitely nuanced. Decision making is very complex. Decision making calls on personal history, culture, knowledge, self-esteem, analysis, context, pros, cons, etc.. But, once you make a decision, the outcome is simple - yes or no; agree or disagree.
Trying to merge these two different gestures into a single question forces the user to start performing overly-complex mental gymnastics. For example, I recently stayed at the Radisson Blu hotel at this year's cf.Objective conference out in Minnesota.
Would I recommend this hotel: Yes.
How strongly do I agree with this decision? Well, now I have to think about it, not just in my context, but in the context of the people to which I would give the recommendation. The food was great. Everybody loves good food, right? But, they don't stock Monster energy drinks at the bar, only RedBull. Monster is really important to me because it's my caffeine. So, every night, I had to remember to trek into the Mall of America to buy Monster for the next morning. That kind of sucked. A lot. But, also I know that not every one drinks Monster... so, should I let that influence my decision? Maybe I don't strongly-agree... maybe I only agree. But, now I'm feeling a lot of anxiety. What if I recommend this hotel and the people don't like it; am I going to be judged? What are they going to think about me?
At this point, you either try really hard to provide an answer based on deep thinking and exhaustive analysis; or, you do what I typically do and just say, "Screw it, Strongly agree."
In either case, the answer is probably not desirable; it's either overly simplified or overly complex.
Leverage Actions And The Wisdom Of Crowds
To provide the best user experience (UX) for rating things, you should present the user with a small set of clear-cut actions. While this is easy for the user, it is hard for the User-Experience Engineer. Anyone can throw up an arbitrary scale and get feedback; but, it takes deep analysis to figure out how to map a context onto a set of mutually-exclusive actions.
Take movie-ratings, as an example. Typically, movies are rated on a four-star scale. And while this is a relatively small scale, it suffers from all the same short-comings that the Hot-or-Not 10-point scale presents. Specifically, it doesn't abstract the decision making behind a set of outcomes.
I live this problem all the time when me and my Girlfriend are trying to select a movie to rent. She sees a two-star rating and thinks, "Eww, it only got two stars, no way am I watching that!" I, on the other hand, see the same two-star rating and think, "Two stars - it's got to be at least decent, let's give it a shot."
The problem is that the two of us have two different mental models on which to draw. This allows us to consume the same information and make two different decisions. Neither of us is wrong.
If, however, movies were rated based on actions, rather than stars, our two mental models may be more in alignment. So, as an example, instead of stars, what if movies were rated based on three mutually-exclusive actions:
- See it in the theater.
- Wait for the rental.
- Don't ever watch it.
Then, rather than aggregating the wisdom-of-crowds as a star rating, you could present the distribution of choices:
- 54% of users said, "See it in the theater."
- 37% of users said, "Wait for the rental."
- 9% of users said, "Don't ever watch it."
If this result was presented as a star-rating, maybe it would get two-stars, maybe two-and-a-half (it's hard to tell). But, looking at the results in this format, here's what I can conclude:
91% of all users said, "see it" in one format or another, even if that format meant, wait for the rental.
This is powerful information; this is meaningful information; and, it's the kind of information that can be gathered most easily when users are asked to choose actions rather than attempt to explore, analyze, and expose their underlying decision-making process.