Magic: The Gathering

2018-02-18
interests: competitions, unsupervised learning, Bayesian statistics

My friend Allen Wu recently gave me a very interesting dataset - a table of all 1765 matches from a recent Magic: The Gathering pro tour (Ixalan, 2017). It includes the two players in each match, what their Elo ratings were, what the final score was, and what types of decks they used.

Though I know very little about Magic, I looked at these results and tried to answer a question:

What deck types (if any) give players an advantage or disadvantage over others?

I created a Bayesian model that attributes an Elo "boost" to each deck type, controlling for player's ratings. Ultimately, the evidence was not strong enough to claim that any deck type was credibly better. It's also unlikely any such Elo boost was huge. The following box plot shows the 2.5th, 25th, 50th, 75th, and 97.5th quantiles of the distribution of possible values for each Elo boost.

box and whisker diagrams of possible Elo of Vampires, Tokens, GPG, Control, Aggro, and Energy magic decks

Magic Elo

Elo ratings work by giving each player a rating such that if players A, B with ratings $x_A$ and $x_B$ play against each other, player A's chance to win should be a logistic function of the difference in ratings:

$P(\text{A wins}) = \frac{1}{1 + 10^{-(x_A - x_B)/r}}$

For other games, $r$ is usually 400, which means that a player with a 400-point higher Elo rating should have a 90% chance to win a game. The people who make Magic ratings decided to go with $r=1135.77$ instead. The motivation for that large number comes from the fact that there is a lot of random chance in Magic, and even the strongest players can't always win against intermediate players. Another important fact about Magic's Elo system is that it predicts the chance to win a "match" - the best of 3 rounds - rather than the chance to win an individual round. You can read about their system here.

The Model

The Bayesian model I made is remarkably simple and can be summarized by this piece of JAGS code:

model {
  for (j in 1:n_decks) {
    boost[j] ~ dt(0, 0.00025, 3)
  }
  for (i in 1:n_matches) {
    p[i] = 1 / (1 + 10 ^ (-(elo_diffs[i] +
      boost[decks_a[i] + 1] - boost[decks_b[i] + 1])
      / 1135.77))
    likelihood[i] = p[i] ^ outcomes[i] *
      (1 - p[i]) ^ (1 - outcomes[i])
    ones[i] ~ dbern(likelihood[i])
  }
}

Here, ones is a list full of ones (a silly JAGS pattern) and outcomes is a list containing a 1 for each win by player A, 0.5 for each draw, and 0 for each loss. To put this model into words, we assume that each deck type gives an Elo boost that is drawn from a Student's t-distribution, then update that distribution based on the outcome of each match, using the Elo likelihood.

The assumption that Elo boosts follow a chosen distribution is called the "prior", and keeping it as uninformative as possible (by using a heavy-tailed t-distribution, for instance) is essential when we don't have much pre-existing knowledge about the scenario. I used a scale of 200 for the t-distribution with 3 degrees of freedom, which gives a 95% chance that each Elo boost is between roughly -640 and 640.

The use of a prior offends some, but it allows us to apply our data in an extremely natural way; given this one assumption, any question about the distribution of Elo boosts can be answered. It's hard to come up with a frequentist solution to answer the original question at all, and other arbitrary decisions would need to be made to do so.

The main downside to using a Bayesian method is the computational challenge of getting results. I used JAGS (which I parallelized on 4 cores!) to run an MCMC simulation that returned a large number of samples from the distribution of Elo boosts. For each sample, I subtracted the average boost for the 6 deck types so that the boosts always sum to 0.

What I find so wonderful about this aproach is that I only needed to tell the model two things:

Each Elo boost has a certain prior distribution.
There is a known formula for the likelihood of a match outcome given the difference in Elo ratings.

Given the data, the model figures out the rest.

Interpretation

At 95% credibility (which is very weak for this many comparisons; I should be using over 99%), no Elo boost is significantly different from 0. We see this even though we've narrowed down the credible intervals substantially; without evidence, each would be over 1000. With evidence, some of the intervals shrink to widths as low as 140. So in all likelihood, an Energy deck can't change a player's chance to win an otherwise balanced matchup by more than a few percent.

Even if we compare each pair of deck types against each other, we can't conclude that one deck type is significantly better than another. Here are the probabilities that the deck on the left gives a higher Elo boost than the deck on top:

	Energy	Aggro	Control	GPG	Tokens	Vampires
Energy		74%	61%	82%	61%	23%
Aggro	26%		45%	71%	49%	17%
Control	39%	55%		71%	52%	20%
GPG	18%	29%	29%		34%	12%
Tokens	39%	51%	48%	66%		21%
Vampires	77%	83%	80%	88%	79%

I did a separate analysis (not shown here) where I gave an Elo boost to each deck type pairing (so ${6\choose2} = 15$ parameters), but the data was too small to draw any meaningful conclusions from that approach.

I take this to mean that Magic players at the pro tour level have efficiently chosen decks that are almost equally strong. That makes a lot of sense, considering that Magic is probably the most-played trading card game in the world, and that pros compete over real money.

< previous next >