Pleasant Error Handling and Nasty Error Handling

Loyal Poker Copilot user Manuela was having problems getting Poker Copilot to recalculate equity values. It would work for about 11,000 hands, then act as if it was finished. Only it wasn’t.

The problem was that one hand caused a problem for Poker Copilot. The code to handle an error that occurred while recalculating equity values looked something like this:

if error occurred while calculating then
Stop calculating.
Pretend everything is okay.
end if

Well, it was a lot more complicated than that, but once I worked out the cause, it could be reduced to naive error handling.

There are three ways one can handle such problems in code:

1) Try to recover as silently as possible, don’t bug the user, and keep running. This seems user-friendly, but actually hides the underlying problem, which causes worse, hard-to-diagnose problems later on.

2) Give the user an error message, but continue when they click OK. “There was an AQ-787-34.2 error while processing your data. Please report this to the Poker Copilot team. Click OK to continue.” This just puzzles most users. Even a human-understandable error message is puzzling and perhaps disconcerting. Most users won’t report the problem. Of those who do, most will report something like, “Poker Copilot has a problem. What shall I do?” Which is useless as a bug report.

3) Crash violently, in a way that can’t be ignored. Automatically generate a crash report which the user can send with one click. Because the buggy action stops the program from continuing, and the user probably wants to continue, they’ll most likely click the “Send Crash Report” button. This forces the error to reveal itself as soon as possible. The crash report contains all the information needed to pinpoint the location in the code where the crash occurred. It also contains other useful info, such as the version of Poker Copilot, the Mac OS X version, the Java version, and the user’s Mac OS X language settings.

Somewhat counter-intuitively, the best way to have low-error software is for the software to crash completely and immediately when it encounters an unexpected problem (an “unknown unknown”). Perhaps this wasn’t so when software came on a disk you bought in a bricks-and-mortar shop, and getting updates was hard. But with always-on, high-speed Internet where software can be updated frequently, it works well.

In the next update Poker Copilot stops suddenly and creates a crash report if recalculating equity values fails.