Tuesday, February 22, 2011

How far from Gaussian (normal) are poker results?

In a perfect world, as far as mathematical modeling and easy closed-form manipulation are concerned, we'd be thrilled if every random variable we ever dealt with had the Gaussian (a.k.a. Normal) Distribution. The most important of its many desirable properties is that the average of any finite-variance random variable converges to a Gaussian distribution, due to the Central Limit Theorem.

Poker results, on a per-hand or per-tournament basis, of course, do not have the Gaussian distribution. For one, the probability distribution of poker results are discrete, rather than continuous. Beyond that, we would generally expect much more weight on the extreme outcomes in a poker result distribution than the corresponding Gaussian distribution of the same mean and variance.

So, how "close" to Gaussian are the distributions of poker results for different games? If the distribution of the results of a single hand of a certain game of poker are not close to Gaussian, how many hands must be played before the average becomes close to Gaussian via the Central Limit Theorem? And what are the practical implications for bankroll management?

Data

I ran some simulations on some old data from my own play, assuming that the approximate probability distribution of per-hand results in each game was simply the empirical distribution based on my historical data. There are a number of problems with this that will keep this from being anywhere near a perfect assumption, but it's the best we can do. With large enough sample sizes in (hypothetical) constant game conditions, it would be fine.


$1/2 CAP NL play is 6-handed NL Holdem with 30bb (30 big blind) stacks. The betting cap ensures that no hands are ever played deeper than 30bb, though some are played shallower against shortstacked opponents.

$1/2 RUSH NL is exclusively 9-handed NL Holdem at Rush tables, with 100bb starting stacks (auto-reloading to 100bb every hand) and frequent deeper stacks. In the current online poker marketplace, the Rush games seem to be the best opportunity for gathering data on deeper-stacked play due to their high liquidity and speed of play.

$1/$2 PL is 6-handed PL Omaha, mostly on shallow or cap tables, around 40bb stacks.

Visualizations of Normality

For the sake of developing an intuition for "how long the long run is" for achieving approximate normality, I looked at histograms of the empirical probability distribution of poker hand outcomes, plotted against the Gaussian distribution of matching parameters (the thin blue curve).

I plotted the behavior of these distributions after 1 hand (top-left), 10 hands (top-right), 1,000 hands (bottom-left), and 100,000 hands (bottom-right). The x-axis is in big blinds, rather than dollars.

For the $1/2 CAP data:
As will be the case for each of the data sets, after only 1 hand, the distribution of results is not close to normal, as it is much too clustered around points near zero and puts much more weight on tail outcomes (for example, +30 and -30, though it's hard to see in the picture). After 10 hands, there are some interesting small "bumps" in the frequencies around +/- 30bb, probably an artifact of so many 30bb CAP hands involving getting stacks in against one other player.

After 1,000 hands, I was surprised to see just how close to Gaussian the distribution already was. There isn't that even that much visual improvement going up to 100,000 hands. The shortstacked nature of the cap games seems to produce fast convergence to approximate normality.

For the deeper-stacked $1/2 RUSH data:
Similar results here. The convergence is smoother, in that there are no "bumps" after 10 hands caused by frequent +/- 30bb pots, as in the $1/2 CAP data. Though it is not easily visible, the fit after 100,000 hands is better in the tails here than after 1,000 hands, as opposed to the $1/2 CAP data. Still, the results are not too far from Gaussian after even 1,000 hands... less than an hour of 4-tabling Rush!

For the $1/$2 PLO data:
Not too different than the $1/2 RUSH data, though perhaps a little slower to converge. The distribution of PLO results should have significantly higher variance than comparable NL Holdem data, but here, my only PLO data is in shallow-stacked games. Even so, we can visually observe that the convergence of these 40bb PLO hands is similar to that of 30bb NL Holdem hands. The higher-variance effects of PLO might only really start to matter with deeper stacks.

As a final point of comparison, I thought it would be interesting to look at heads-up tournaments, where the mass of the probability distribution would be only on the two extreme possibilities of -1 and + ~1 (adjusting for rake). I used some fictitious data based the rake structures of $55+$2.50 HU SNGs and on a winrate of 55%, with the buyin size normalized to 100:
Here it takes at least 100,000 hands (tournaments) before the distribution begins to look anything like a continuous Gaussian. Though I haven't tried anything with multitable tournaments (much higher skew), we'd expect those to be even less Gaussian.

So, as we've seen here, and as the Central Limit Theorem guarantees, once we play enough hands or tournaments, our results will be very close to Gaussian. Consistency of yearly results might be the #2 concern for profit-minded players, and as far as this goes, it looks like Gaussian approximations should be great here for cash game players putting in any amount of volume. Live players, however, will be suffering from both fewer hands/year and higher variance from deeper stacks... my guess would be that a live player would want to be putting in at least 500 hours/year to be able to assume approximate normality or annual results.

Most online cash game players, full-time or part-time, will be reasonably accurate by projecting their annual results to follow a Gaussian distribution with appropriate mean and variance.

However, the #1 concern for a poker player is the ability to sustain one's bankroll. It turns out that a popular and effective bankroll management formula is derived from an assumption of perfect normality — is this assumption a good fit?

Accuracy of Gaussian ruin probabilities

While we've seen that year-end results should be quite close to Gaussian, a poker player who goes broke in the middle of the year due to some perhaps non-Gaussian sudden downswings is not going to be able to reach the end of the year to achieve his nice, nearly-Gaussian result. So, in terms of the probabilities that the path of one's bankroll would cross a certain lower bound (usually zero), does this data behave similarly enough to that of perfectly Gaussian data?

Note that the true Gaussian distribution is unbounded, so a very small percentage of paths will have quick movements of very large magnitude. Actual poker distributions are bounded, so in this way, we would expect the Gaussian paths to fall to zero more often. On the other hand, actual poker distributions have higher kurtosis (a.k.a. fatter tails, that is, more likelihood is put on extreme results than in the Gaussian distribution), which would counteract this effect. Which of these effects will dominate?

If poker hands were perfectly Gaussian, then we could very closely approximate our (discrete) bankroll path by the (continuous) stochastic process of Brownian Motion — essentially, a continuous extension of Gaussian random variables. In this case, the ruin probability, the probability of ever hitting 0 from a given starting point and a winrate with a given mean and variance, would follow this formula, widely-known from Chen and Ankenman's The Mathematics of Poker but also an easily-derived property of Brownian Motion:

where B is (starting) bankroll, μ is the mean of the poker results process, and σ is its standard deviation.

Of course, we don't have a nice formula for the actual ruin probabilities given that our poker results are not perfectly Gaussian, but we can run Monte Carlo simulations to approximate these ruin probabilities for different starting bankrolls and compare them to the exact result for the Gaussian approximation.

The left column is starting bankroll, in big blinds (for the tournament, a value of 100 represents one tournament buyin).

For the $1/2 CAP data:

For the $1/2 RUSH data:

For the $1/$2 PLO data:

For the fictitious HU tournament distribution:

We first notice that the risk of ruin for the $1/2 PLO data set is 1 in all cases, as of course will always be the case when one's winrate is negative... your author is still working on his PLO game and has a very limited sample size so far. Whoops.

We then notice that, across the board, the Gaussian approximation to the ruin probability is higher than the simulated ruin probability with the empirical distribution. The difference appears to be decreasing in bankroll size, as we would expect from the Central Limit Theorem. The first few lines are for excessively small bankrolls, so they are not of any particular practical interest. The difference appears to be largest for the tournaments, as we would expect.

It looks like the boundedness of the true distribution is a bigger effect than the higher kurtosis, so the result is that actual ruin probabilities are lower than the easy formula suggests, which is great! Moreover, since the errors are small (especially for reasonable bankroll sizes which yield reasonably low practical risks of ruin), at least for cash games, we'll be a little bit extra-conservative by simply using the easy Gaussian formula.

The formula for Gaussian ruin probabilities is a very close approximation to true ruin probabilities for poker results for reasonable bankroll sizes.

Conclusions and Implications

Basic cash game results seem to be close enough to Gaussian for both terminal results and probabilities of hitting sufficiently far-away lower bankroll bounds along the way. Therefore we can rely on the easy Gaussian ruin probability formula for modeling purposes, and we will approximate long-term results (such as when evaluating expected utility over one year) with Gaussian distributions, at least outside of tournament play.

We should be careful about drawing definitive broad conclusions from these simulations. We have treated only a few different poker games, at only one moment in time for the poker economy, and only of one player's particular strategy. We should expect that the results may be different for poker games with deeper stacks, or with looser players. Games with higher variance or games with less continuous one-hand result distributions (such as limit games) should be further from Gaussian.

2 comments:

  1. Nice post Mike. I know you’re still in the ‘building assumptions’ mode here (and sorry if this is jumping the gun) but:

    - I’m curious how, if you are on a bad losing streak, we could use this to find the optimal number of buyins in your bankroll you would drop to before moving to lower stakes, using our utility function. Your risk of ruin probabilities might be more practically modified to ‘risk-of-dropping-a-level’ probabilities.

    - Conversely, at what point can you justify taking a shot at the next highest level without being accused of recklessness?

    - You mentioned a while ago that you would talk about choices to be made where you won’t be able to reach the Long Run; what kind of effect does that have here? Saying “I’ll get to that later” is fine.

    ReplyDelete
  2. Not jumping the gun at all, these are all things I intend to discuss soon.

    The utility function we've established should make it pretty straightforward to solve problems about moving up and moving down when we make assumptions at our winrates and variance at the two stakes. For each level of bankroll, we can compare the utility of playing the higher stake versus that of the lower stake, and there should be a unique point of indifference. This may take some more subtlety than this, but I'll figure it out soon. Adding uncertainty about winrates (ex. when moving up to a new stake) into this would be interesting and may be reasonable enough to try.

    The "long run" issues of a unique opportunity might actually be less interesting of a topic than I originally thought, though I still want to say something more about it at some point. You get no normality on a unique event, but if you can approximate its actual distribution, you can just plug in and go with the utility function. The WSOP certainty equivalent analysis basically did this for the case when the unique event is the WSOP ME - you only get one try at it, so you look at the expected utility of (X+Y), where X is all of your other play for the year (approximately Gaussian) and Y is the unique event. A similar approach could be used for any opportunity where you "can't get to the long run". There are more implications of this that could come up that I might discuss later. For example, it might be the case that, for a given player with a given skill level, bankroll, and risk aversion, it is -EV (after tax/util) to play the Main Event, but it would be +EV if he were able to play 2 identically-distributed Main Events in succession. It should be the case that any positive-pure-EV opportunity would be worth taking for *anyone* as long as it could be repeated at least N times for some N.

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...