Skill Ceiling?

Hey Folks!

I’ve got other games idea which stems from Chess (sorry about doubling up), but like the aphantasia article, it’s more about an abstraction derived from Chess than the game itself.

So this article requires a little background. I attended the Game Developers Conference (GDC) in San Francisco this year and one of the presentations invited an interesting question. It was hosted by an industry analyst/academic who was interested in the skill/luck ratio of different games when played at professional levels. (I think. There wasn’t a perfectly clear question or methodology, but this was my understanding.) In other words how ‘skillful’ are different games at a competitive level?

Now his model incorporated a wide array of variables including number of players globally (implication being that greater global play-hours will improve the meta, drive up the skillfullness–but notably shouldn’t affect the theoretical ‘skill ceiling’ of the game, only the current state of play), number of ‘interactions’ per time interval (for example he considered most physical sports more skillfull than Chess, because they involved more interactions per second), and the stringency of the requirements to attain mastery of the game.

Now, okay, we can probably pick at the basic premise. Is it accurate or even useful to model skill and luck as diametrically opposed? Does a higher level of interactions per second really decrease the likelihood of a ‘lucky’ win? But it’s still an interesting question to ponder.

I went up to him at the end and asked whether/why he hadn’t incorporated ELO rankings into his model. He wasn’t familiar with the term, and you might not be either, so I’ll give a brief outline.

ELO is a ranking system primarily associated with Chess, but it can be (and is) applied to almost any zero-sum activity. It pegs its points system to win/loss estimates relative to other points/ranks in the model. E.g. if I am rated 1568 and my opponent is rated 1468 I should have a 64% chance of winning. If we played 1000 games I should, in theory, win 640 of them. In practice the system corrects as we play to accommodate growth. If he wins against me the system says ‘Okay, maybe he’s a little higher than 1468’ and bumps him up. At the same time it says ‘Okay, maybe Doug isn’t 1568, I’ll bump him down a few points.’ If I were to lose to a 2400 player, the adjustment to our points would be minimal, because the system more or less expected that result, and the current points seem accurate.

The upshot of all of this is that we have an objective way of measuring a chess player’s skill, and even better it (in practice) doesn’t have a cap. It’s not like we said ‘100 is the best player, 0 is the worst.’ If someone comes along and plays against Magnus Carlsen (the current world champion, with an ELO around 2800) and beats him, we might say it’s a fluke. If they played 1000 games and the newcomer won 640 we would conclude that the newcomer is rated 2900, and so on (actually this *does* happen with Chess AI, who are much better than the world’s best humans).

Since our ELO ceiling naturally grows as the range of skill in the game increases, the ELO rank of our world’s best player gives us a pretty good idea of how ‘skillfull’ the game is. For example you could line up chess players from a theoretical ‘0’ (an AI which randomly selects legal moves) to Magnus Carlsen where every player has a 64% chance to beat the player on their left, and a 36% chance to beat the player on their right. Following these parameters, you could have 280 people in your line! That’s awesome!

I think that gives us a pretty good idea of how ‘skillfull’ the game of Chess is, and I don’t see why the same metric couldn’t be applied to other games. We effectively already have a system for determining the ‘skillfullness’ of games (ELO ranking), though I don’t know if it’s ever been applied in that way before.

The main focus of this presentation at GDC was on eSports like Dota and Overwatch. Both use variations on the ELO ranking system for matchmaking and tournament play, so I don’t see any reason why this measure couldn’t be applied…

What do you think? Am I on to something here, or is there something that I’ve missed? It would be very interesting to conduct a game survey to analyse the depth of skill in the competitive games we play today.

Leave a Reply