Graph all the things

analyzing all the things you forgot to wonder about

Review Doping

interests: online reviews (skim for the results), statistics

If you're like me, you don't usually leave a review online with your exact opinion. Instead, you leave a review to move the average review in the direction you want. For instance, if I see a restaurant on Yelp I think deserves 4 stars but only has 3 stars, I am likely to give a 5 star review to move the average closer to what I believe. Similarly I might give a 1 star review to a restaurant I think is overrated.

I'll call this tactic "review doping". Review doping certainly sounds bad, but how does it actually affect the average review? And is it prevalent?

The answers are that it tends to moderate the average reviews in most cases, but that not terribly many people seem to be doing it. So perhaps I'm just unusual.

How review doping changes the mean

Let's assume:

  • Each person's true opinion about what the rating should be is a random variable drawn from the same (time-invariant) distribution.
  • Everyone review dopes.
  • Every review is on a continuous scale from 0 to 1.
Under these assumptions, the average review will converge to some value which I'll call the "doped mean". For now, let the nth person's true feeling about the product be x_n so that they leave a review of y_n. The mean review after n votes will then be
\mu_n = \frac1n\sum_{i=1}^ny_i
We would like to find the limit of \mu_n as n\to\infty in terms of the distribution of x's. Since this limit exists, let's consider a very large value of n so that each reviewer is almost certain to review either 0 or 1 to sway the average review as much as possible. In this case, the expected value of \mu_{n+1} is
E(\mu_{n+1}) = P(y_{n+1} = 0)E(\mu_{n+1}|y_{n+1} = 0) + P(y_{n+1} = 1)E(\mu_{n+1}|y_{n+1} = 1)
And y_{n+1} will be 0 if the n+1st person thinks the review should be lower, and 1 otherwise, so
E(\mu_{n+1}) = P(x_{n+1} < \mu_n)\frac{n\mu_n}{n+1} + P(x_{n+1} > \mu_n)\frac{n\mu_n+1}{n+1}
E(\mu_{n+1}) - \mu_n = -P(x_{n+1} < \mu_n)\frac{\mu_n}{n+1} + P(x_{n+1} > \mu_n)\frac{1 - \mu_n}{n+1}
If \mu_n = \mu is the equillibrium doped mean, then this average change between votes must be 0, given our large n assumption:
-P(x_{n+1} < \mu)\frac{\mu}{n+1} + P(x_{n+1} > \mu)\frac{1 - \mu}{n+1} = 0
-P(x_{n+1} < \mu)\mu + P(x_{n+1} > \mu)(1 - \mu) = 0
This is the condition for \mu, given the distribution of x's. We can simplify it slightly further since P(x_{n+1} < \mu) + P(x_{n+1} > \mu) = 1 and \mu + (1-\mu) = 1. Plugging these in gives
P(x_{n+1} < \mu) = 1-\mu

But what does that mean actually mean?

This can be read as "a proportion 1-\mu of the reviewers think the average review should be less than \mu". For example, if \mu = 0.8, then the 20th percentile reviewer thinks the review should be 80%. If \mu = 0.5, then the median reviewer thinks the review should be 50%.

Here are some example probability distributions for the reviewers' true opinions in green, and the true mean and doped mean in blue and red.

mean and doped mean for a probability density function mean and doped mean for a probability density function
mean and doped mean for a probability density function mean and doped mean for a probability density function

In most cases, \mu is closer to 0.5 then \overline{x} since the reviews of the contrarian faction move \mu more (by voting 0 when \mu_n > 0.5, for instance). However, in bizarre cases like the bottom right, \mu can be more extreme than \overline{x}.

But do people actually do this?

Not a whole lot. I analyzed Yelp's academic dataset for two statistics that might show evidence of this trend, but neither did. For both statistics I looked at the first 40 reviews of all businesses with at least 40 reviews.

The first statistic was the correlation between y_{n+1} and \mu_n. If people review like me, they should be more more likely to give low y_{n+1} when the \mu_n they see is positive, and vice versa. In other words, the correlation should be negative. Because of regression toward the mean, this correlation should actually be slightly negative (-\frac{\sigma_x}{(n-1)\sigma_\mu}) on average anyway, but doping would make the correlation even lower.

However, the correlation was higher than this due to time-based effects. Very often the initial reviews of the business are higher than later ones, so both x_{n+1} and \mu_n are decreasing sequences with positive correlation. I did investigate the businesses which seem to have the most negative correlations, and the most extreme in the dataset is possibly the Golden Gate Hotel and Casino in Las Vegas - according to both the ratings and text of the reviews, many on both ends of the spectrum are responding to the hotel's rating and want it to be different. Since this is the most extreme case, though, it is almost certainly a statistical anomaly.

The second factor I looked at was the distribution of reviews as the number of reviews increases. If people are review doping, they will be more likely to review extremely as the number of reviews increases. Here's a plot of the review distribution as a function of review count:

Probability of review stars given number of reviews n before this point. After around n=10, 1 and 5 star reviews slowly become more likely.

There is a hint that later reviews are more polarized (more 1 star reviews and fewer 3 and 4 star reviews), but certainly no evidence that more than 5% of a restaurant's first 40 reviews are doped. If everyone were like me, the vast majority (probably about 96% by a back-of-the-envelope calculation) of reviews would be either 1 or 5 by that point. Thank goodness they aren't, though - I get more voting power this way.