Saturday, 12 September 2009

Be Wary of Poker Statistics: Part 2

Yesterday I demonstrated that a poker statistic generated from a small amount of data is unreliable. More data = more reliable statistics.

Just how much data do you need to use the statistics reliably? I found the answer through a Monte Carlo simulation showing how likely it is that a statistic is within +/- 5% of the opponent's real playing style:

Screen shot 2009-09-11 at 3.43.47 PM.png


After 275 hands, 90% of stats based upon # of times seen are within +/- 5%. After 400 hands, 95% of the stats are pretty accurate.

[Note: Take this as a guide only. I made some simplifying assumptions]

Be really, really wary

95% chance of pretty good accuracy sounds good right? Does that mean that if you have played all your current opponents 400 times, then you can trust the statistics? No, for two reasons:

1) This only includes statistics based on "# of times seen". Many statistics are based on much less frequency occurring situations.

2) Poker Copilot has 18 HUD statistics. If you have 8 opponents, that's 18 times 8 = 144 statistics available. If each of these have a 95% chance of being accurate, then it is very likely at any given time that several of these stats do not realistically represent some aspect of an opponent's playing style.

To conclude, as I wrote in the part 1: Statistics are only reliable if they are based on a large enough sample space. The larger the sample space, the more reliable the stats.

9 comments:

megalofvia said...

1) with this info in mind, do you think you could up the HUD threshold option past 100 hands?
Also, would it be possible to add support for downloaded hand histories? Many sites currently offer the ability to download hundreds of thousands of cash game hands, and both pokertracker 3 and Holdem manager allow the importing of these hands, but copilot does not.

Terence Kearey said...

Hm, I think Steve you should give your users a little more credit.
Most of us know the fallibility of statistics so let us make our own decisions please and decide when street figures are reliable and when they are not. The warning is fine but let us decide. Why are you assuming that you know best?

Apart from this you have made calculations to +/- 5% and this is far from necessary. As we already know there is a variation with a small sample number we can also make allowances and a more realistic figure would be 25% or even 33%, it is only a guide after all. You don't need to be that accurate to show a tendency.
If someone is playing 10% flops who cares if in reality they are playing 8 or 12%, ie around 20% off this figure. It is not important.

Apart from this you yourself are making one of the biggest and most misleading errors of all, ie calculating percentages on hands when people are sitting out in a tournament. The error here can be far more, exceeding several hundred % and you think it is fine. I question the fact that you think you are in a position to tell us what is right and what is wrong.

It's about time you gave us the correct figures yourself or at least the option to choose correct figures or your incorrect ones.
When you do this I will immediately buy your program but until then I have, unfortunately, to wait for PT3.

You say you listen to customers or in my case prospective customer but in this regard you insist on dishing out the wrong data and justify it with 'people are used to it' or some such. Now let us see if you have the bottle to allow this to be posted.

You know I really do want to buy your program and support your effort but I cannot as long as this situation exists. Change it and I will be shouting how good your program is from the rooftops.

Terence Kearey said...

Further to my post of yesterday I would like to comment on the fact that the HUD reports statistics for a couple of factors to two decimal places, the argument being that there are so few occasions when these occur, (fold to 3bet for example).
In actual fact this is misleading as a figure reported to two decimal places is believed to be more accurate than for example a figure to one or zero decimal places.
I would suggest that here you should report only to one decimal place or better still zero decimal places. People will then know that a figure of 1% for example can mean zero or 2% in reality. Apart from having a built in tolerance the figures would also be in line with the other stats and produce a neater display.

Anonymous said...

I don't believe that Steve thinks he can make better decisions than his customers.
When I read long complaining (and in my opinion harsh) comments I wonder if the reader has understood what Steve wanted to say. I guess the last thing Steve would want is to make decisions for other people but I think he has a good point in saying that stats are more reliable if they are based on big numbers.
If a poker evening does not turn out as succesful as expected I wouldn't be surprised if people think 'Why did I loose so much; the stats said something else.'

BTW: In 100% of all Tuesdays in rains when I cycle to work. So I'm going to take the underground again on Tuesdays.


...Nobody has to know that I started cycling to work just three weeks ago. ;-)

Big D said...

Thanks for that. Very interesting post.

KeithX said...

"Lies, damn lies, and Statistics" I can never remember who made that quote lol. Here's why I don't believe sample sizes matter when it comes to predicting human behaviors: we're creatures of habit. Poker players learn to switch things up, mask their tendencies and randomize their play, but it doesn't come that easily for most people. Some consistent losers show tendencies and trends from the word go and are eminently predictable within as few as 100 hands.

This is the statement that I take issue with, in the context of human behavior and poker: "Statistics are only reliable if they are based on a large enough sample space." Events that happen rarely are more likely to evoke a non-random response from a poker player. This is a basic fact of human behavior which has nothing to do with statistical sample sizes or mathematical laws.

Here's a real-world example: The Wife Slap. A man who slaps his wife once in a high-stress situation may or may not repeat the behavior, it's 50/50. However the man who slaps his wife a second time under similar circumstances is highly likely to repeat the behavior. In fact, barring some sort of intervention or training, it's a near certainty. Yes, this is indeed a valid sample size of two events. Human beings are not steel balls being dropping through a pin array.

Terence Kearey said...

Reply to Anonymous (why anonymous?)

Sure I understand what Steve is saying. The accuracy of statistics data is very much dependent on sample size but I question the implication that you need to be so accurate.

Steve is, however, more or less making the decision for us when he says 'would be doing users a diservice' so does not plan provide these stats.

Harsh, I'd agree. The reason, though, is that Steve has ignored or rejected this comment (wrong stats when sitting out in a tournament) from practically the day the program was released.
Look at the competition (or coming competition), no one else counts this way, it is wrong - period. Why persist with it when it is such a simple matter to fix or provide an option?

Steve McLeod said...

@Terence,

That seems to me unnecessarily harsh. I don't ignore/reject suggestions. I make them open for feedback and prioritise my limited time to deal with an ever-increasing list of suggestions.

I blogged your specific request, in order to get views from others, and it received little response. Nor has it been getting many votes up on our Get Satisfaction site compared to other issues.

Meanwhile there are other issues encountered by many Poker Copilot users. I simply need to prioritise.

Terence Kearey said...

I find it quite amazing that so few have reacted over this.
I think you know my standpoint that you, being the first person to provide a decent tracker for the Mac, have my support for that reason alone.
You have also intimated that you could concede that my standpoint is correct but leave it up to users to yay or nay it as they are used to it.
This is the reason I have written in such harsh terms. Whatever it takes to get this option. There is of course the risk that direct confrontation can produce the opposite effect and increase the unwillingness to implement it but that is a risk I'm prepared to take.
There's a right way and a wrong way, I don't care what other people say.
Quite honestly I think your future may be at stake in doing things the current way when PT3 is eventually released.
How can you compete with a company of that size and resources. You have so far succeeded by being virtually the only one to care about Mac users but can it continue in the light of fierce competition? Maybe, maybe not.

You must know, however, that I do want you to be continue to be successful otherwise I would not be so adamant over this point which surely is a simple thing to implement and should really have a much higher priority regardless of what others think.
Just to reiterate the tournament example of someone sitting out for 90 hands then sitting in and playing ten hands in a row.
They have, according to your way of thinking paid blinds voluntarily 18 times in 90 and now vip ten more times, ie 28 times per 100.
How can you possibly say their VIP is 28% when it is in fact 100% of the time they have been sitting in?
If you have missed this for one reason or another you can make quite the wrong decision.

I wish you the best for the future and now I'm out of here. Enough said. I know the options. Currently none...

Poker Copilot for Mac OS X helps
online poker players improve their game with easy-to-interpret
statistics and real-time analysis.

Optimise your poker game immediately with simple, understandable graphs and tables created from your hand history.