Friday, September 28, 2012

20 thoughts on skill vs. chance in poker, part 19: The German "predominantly chance" study

<-- Part 18: The relationship between level of stakes and degree of chance

I had always imagined that one of the points I'd write about in this essay would be how no academic studies have ever concluded that poker is predominantly chance, but the recently-published paper out of Germany, Is Poker a Game of Skill or Chance? A Quasi-Experimental Study, changed that1.

The paper is not made public freely anywhere, but the lead author is willing to share it with interested parties who contact him directly and has been polite and receptive to constructive feedback. Additionally, some other summary of and commentary on the paper has been published openly:
  • Neuroskeptic was the first to break the story to the poker community with a short but mostly-complete summary of the paper.
  • Poker's own Short-Stacked Shamus at Hard-Boiled Poker wrote about his impressions from his read.
  • Jennifer Ouellette of the Scientific American blog network took a thorough look at this paper, the DiCristina ruling, and a variety of other perspectives on poker from a scientific standpoint. I found this to be a solid read, and it included a great quote for a mainstream article:
    “Good poker requires that you make sound game-theoretic decisions but there is still plenty of freedom to try and outsmart your opponents,” [Vonk] said. “Other casino games miss that second element. All you can do in blackjack or roulette is make the best possible mathematical decisions, and even then, you will still lose in the long run. I have never been attracted to those games. It’s the fact that you play against other people that makes poker so interesting, and that makes it possible to actually be a winner at the game.”
  • In Is skill in poker – and elsewhere – just one great big bluff?, Tom Chivers of The Telegraph uses the study as a basis for extrapolating some broader ideas from behavioral economics, though I find his take on the study to be a bit shallow and tangential... and the headline to be pretty close to unforgivably sensationalist, inaccurate, and unrelated to the content.

First, a brief overview of the study's methods. 300 real poker players were recruited and, through self-identification, were sorted into two groups, "experts" and "average". Each participant played 60 hands of 6-handed Holdem, split between No Limit and Limit, against other participants via computer in a duplicate poker format where the deal of each hand was rigged. The rigging was done in such a way as to attempt to measure the effects of chance by giving better-than-average, average, and worse-than-average cards to each of the players in a predetermined way. The participants were not aware of any rigging of the deck and believed that they were playing a typical poker game. After play was complete, the winrates of both the expert and average players across each of the three rigged card conditions were computed. The two core findings were:
  1. The rigged-card conditions for receiving better cards had more of an impact on winnings than the skill of the players, therefore "chance clearly dominates skill; thus, poker should be classified as gambling".
  2. While experts outperformed average players overall and were able to lose less money with worse-than-average cards, it turns out that, in the "better-than-average cards" rigging condition, average players outperformed expert players, as illustrated in the below graph. The authors take this to support the conclusion that the cards are what primarily affect outcomes and that players' strategies are much less impactful.

Before we delve into the flaws in these conclusions and in the methodology and assumptions that lead to them, it's worth noting that the seemingly very-small sample size of 60 hands per player is not really the problem here. Many poker players are inclined to laugh the study away at the first mention of this sample size, but the authors are actually quite self-aware and acknowledging of the shortcomings and limitations of their study. The 60-hand cutoff was likely important for practically gathering the needed data with real human volunteers.

However, the authors also argued in favor of the 60-hand timescale due to their interpretation of German law, which attempts to measure whether or not game outcomes depend
"solely or principally on chance rather than on the players' abilities... under [the conditions] which the game is typically initiated and played, which depends on the skills and experience of the average player... an individual who is generally interested in playing the game, has learned the fundamental rules and has had some practice playing."

From the paper:
On the one hand, it could plausibly be argued that the influence of strategy and skill would be more prominent in longer poker sessions and would entail a stronger impact on the game’s outcome. On the other hand, it could be assumed that with longer play periods, the difference in players’ level of skill would decrease. This would lead to a greater contribution of chance to the outcome and a need for new, inexperienced players to reduce the effect of chance.
I think this is a dubious, hand-waving argument to justify such a short time horizon. I'm not sure how I'd estimate how many hands of play in a poker game are necessary for an average or amateur player to start to closely-approximate the skill level of an expert, but I'm sure it's way, way more than a few hundred.

Regardless, even if the sample size is low, I don't think that's the problem with this study. I would expect the results to be the same even with a large sample size; the issue is with the rest of the methodology and the biases it introduces. The two core findings each result from problems in the approach taken.

Finding #1
The authors concluded that chance predominates skill in poker because being dealt into a seat rigged to get dealt the winning hand more often than usual increases one's winrate by more than the difference in winrates between strong and weak players. This suggests that the interpretation of predominance being used is one in which any game in which there is some nonzero probability of random game elements causing a player to be unable to win despite their skill is a game of predominantly chance.

This is a theoretically-interesting definition of predominance, but it would classify almost all games with random elements as games predominated by chance as long as the deck (or other source of randomness) is fixed to make the probabilities extreme enough. The average player would almost surely win in a backgammon game where his rolls were much more likely to come up 6-6 and the expert’s were much more likely to come up 1-2. The average player would almost surely win in a Magic: the Gathering game (or any other card game) where the expert’s deck was rigged to deliver a very skewed, unplayable mix of card types. An average player would likely win a Scrabble game where the probabilities were altered so that his expert opponent received very few consonants.

To properly take this sort of perspective on predominance, a quantitative refinement is needed. If you go far enough into the tails of the random distributions, any game of skill with random components would be concluded as a game of chance, so the real question here is what p-value of randomness is necessary for the average player to overcome the expert's skill advantage. If the average player beats the expert poker player with the aid of only 51st-percentile random in-game outcomes in his favor, then I think that would be a reasonably intuitively convincing argument as to a game being predominantly chance. However, if the rigging has to push the favor of the cards into the 99th percentile for the average player to beat an expert, that doesn’t really show anything. I expect that no expert at any game would beat a weaker player over 99% of the time.

In my correspondence with the lead author, he acknowledged that this would be a challenging target for future work, but found it to be unnecessary for the scope of this study since, regardless of the p-value of the extreme randomness given in the rigging condition, each player received this rigging an equal number of times among the 60 hands. However, the degree of good fortune given in this chance-shifting condition will certainly affect the conclusion. If less-extreme randomness were given in the "better-than-average cards" condition of this study, i.e. if it were a "only-very-slightly-better-than-average cards" condition, then the skill edge of the experts would dominate. The specific nature of the rigging seems to suggest a rather extreme perturbation of the randomness in poker.

Finding #2
The study found that weaker players outperformed expert players in the "better-than-average cards" condition, particularly in Limit Holdem. This should be a direct consequence of experts making proper poker folds that, unknowingly, turn out to be really bad folds in rigged poker when you're artificially more likely to win the hand with whatever cards you happen to be holding. (In case it's not obvious, you shouldn't fold very often in that game.)

The particular nature of the rigging, while still not quite clear to me, favors the naïve tendencies of the novice player. As described in the study:
During the game, one expert player and one average player received (a) the winning hand 15 times and the losing hand 5 times (winner’s box condition), (b) the winning hand 10 times and the losing hand 10 times (neutral box condition) and (c) the winning hand 5 times and the losing hand 15 times (loser’s box condition)
Using this computer-based method of playing, the hands of individual players and the flop, turn and river cards were manipulated to produce a standardized ranking order for each hand in terms of the probabilities of winning (cf. the standardized sequence of play of "duplicate poker"). It was established in advance that the cards of the opening hands, and the associated distribution of chances of winning, were reflected in the river in the same order for the first three places. In contrast, places were allowed to vary with respect to the flop and turn.
So the rigging is done in a way which controls whether or not the player's hand goes on to be the best hand among all six players by the time the river is dealt, and this is further controlled so that the best preflop hands end up being the best river hands, even though the flops or turns could be unfavorable.

This means that the typical amateur mistake of continuing with what was once a strong starting hand after a bad flop or turn will go on to be rewarded. For example, an expert may prudently fold 7♦7♣ on a Q♣K♥2♥ flop, or A♦K♦ on a Q♣J♣6♦9♣ turn. These may be correct moves in poker, but are pretty terrible moves in rigged poker where the game has controlled for the fact that you're going to spike your card on the river or that none of your opponents will make their draws. Meanwhile, the average players will incorrectly chase their draws and be rewarded on the river much more often than regular poker probabilities would dictate. This effect should be exacerbated in Limit Holdem, where the stronger players will find the right folds on flops and turns with overcards despite high pot odds.

The study does take note of the fact that average players call more often than experts and that experts fold more. The study does acknowledge that this bias may exist and may impact results:
It is unclear whether the described advantage of average players with good cards under "fixed limit" conditions, due to their less purposeful [meaning continuing too often with weaker hands] style of play, is an artifact of the applied design or a phenomenon that can also be detected in the reality of poker play.
I expect that this effect and the bias it introduces is indeed highly significant, enough so to fully explain the outperformance of average players in the "better-than-average cards" condition.

The only way I can see around introducing some sort of bias is to not rig the deck at all, which would dictate an approach that doesn't gather its own data and instead uses a large real-world database of hands provided by a commercial internet poker site, as some other studies have done. These real hands could be filtered to find which hands involved "good hands" by whatever metric was desirable, and this would prevent manipulated probabilities from favoring one player type over another. The question of what metric to use would still be difficult.

The author defended this part of the methodology, again believing that it was fair because both expert and average players had the same conditions. The fundamental issue here, though, is that changing the probabilities of the random elements in poker changes the game to something other than poker.

The subjects were essentially lied to (not maliciously) in that they were not playing the game they thought they were. The strategies for rigged poker are different than the strategies for poker, and if the subjects had known about the methodology and when they were in the rigged conditions, then the expert poker players probably would have properly picked up on the strategic adjustments and continued to outperform the weaker players.

Also, I expect that expert poker players would be more willing to trust that an ostensibly normal poker game in an experimental setting is being run fairly. In contrast, weaker players may be guided by instincts to "play a rush" or to otherwise irrationally manipulate their assessment of what are supposed to be independent probabilities, which could benefit them in this rigged poker game. Regardless, that the subjects are misled as to the probabilities of the game outcomes means that the impact of proper strategy will be obfuscated, as the skilled players are trying to apply skills from a different game.

Overall, the authors approach the task of measuring predominance in poker from a reasonably sophisticated scientific perspective with no evidence of anything but an earnest effort. If one was tasked with attempting to produce a formal, science-based argument that poker is predominantly chance, these authors have done so fairly well. Perhaps a lack of practical poker experience led them to overlook or underestimate the impact of these methodological biases on the results. I don't think they deserve the ire of our community, but I also don't think there is any meaningful validity to their conclusions.

While I don't believe that this was the motivation of the study, it would be wrong to omit the observation that German poker players owe personal income tax on their poker winnings only if poker is considered to be a game of skill, but owe nothing if poker is considered to be gambling. This sort of tax rule could certainly shape a cultural and social willingness among poker players in Germany to want to keep poker treated as gambling — an amusing (or depressing) contrast with the interests of players in the U.S. to have poker seen as a game of skill.

Still, a study in an academic journal has global impact. Even if German poker players would be better off if poker were treated as predominantly chance and gambling, this is not the case in most of the rest of the world, and it's also a classification that I feel to be intellectually dishonest and fundamentally wrong.

Part 20: Summary and assessment of approaches to predominance -->

(back to index)
1In fact, the lead author of the German paper brought my attention to two of its cited papers which also contend that poker is predominantly chance, at least under some conditions:
  • Best Hand Wins: How Poker is Governed by Chance by Vincent Berthet — This study references the Cigital study but takes the further step of considering that the cards dealt to players affect their actions, and hence that a hand that does not end in showdown is not necessarily a hand where chance played no role at all. This study finds that 72.8% of hands which end in no showdown were nonetheless won by the player who held the best hand at the point when the hand ended. This study comes off as fairly informed and aware of poker and is worth a quick read. It's a reasonable different perspective than the Cigital study of what it means for cards to "control" an outcome, but the logical misstep in concluding that this means that poker is predominantly chance comes from assuming that the threshold of predominance occurs halfway between 1 and 1/N, where 1 is the probability of the best cards winning if poker were determined entirely by the deal of the cards and where 1/N is the probability of the best cards winning if poker were determined entirely by skill (i.e. if the cards don't matter at all). There is no quantitative basis for assuming that the threshold would be at the midpoint, especially when player actions and strategies do directly impact this statistic. One could easily design different games which have varied levels of this best-hand-winning rate statistic which have high skill, low skill, high chance, or low chance. Since this approach admittedly focuses on winners of pots rather than winners of money and thus ignores betting, all of my observations of the shortcomings of showdown-based studies would apply here.
  • The work of Ingo Fiedler — The work by Fiedler and Rock is commonly used as support of the predominance of skill in poker, as their statistically-minded critical repetition frequency shows that, by their measure, skill overtakes chance in poker after about 1,000 hands. Meyer and this German study, however, note that a 2011 paper by Fiedler, unrelated to the topic of skill and chance in poker, finds that the median online poker account in a large database played less than 1 hour/month of poker, causing these players to not quite reach the CRF threshold within a year. Considering the median online poker account is pretty close to considering the median human who has ever played poker even once and will almost certainly include far too many former or very infrequent players. Again, this might be a reasonable interpretation of these particular German statutes, but I think it's unrealistic for other uses.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Related Posts Plugin for WordPress, Blogger...