Loyal customer KeithX asks:
Why doesn't Poker Copilot use MySQL or (shudder) Postgres?There's two main reasons:
- User Experience
- Speed
User Experience
Consider iTunes. iTunes is, fundamentally, a database with lots of data. The data is in the form of thousands of files that encode sound frequencies. In my case, iTunes has 13 Gigabytes of music in 2517 files. Plus 920 Megabytes of podcasts. Plus 14 Gigabytes of TV shows.
iPhoto is, fundamentally, also a database. This time the data is in thousands of files encoding photos.
In neither iTunes nor iPhoto is the end user expected to first install an SQL relational database system. The database is hidden from the user and managed so that the user simply sees what's important.
I aim for Poker Copilot to have the same seamless experience one finds in iTunes and iPhoto. The complicated database management is hidden from the user.
Speed
Using an embedded database engine - which Poker Copilot does - has the potential for greater speed than a separate database server like MySQL or PostgreSQL. To get all technical, communication between a piece of software and a database server involves (in common scenarios) creating a TCP socket connection, encoding the query in a standardised manner, sending the query over the socket, retrieving the results over the socket, decoding the results, and disconnecting. Often the overhead of connecting and communicating is the most time-consuming part of the process. (There are ways to speed things up somewhat, such as connection-pooling and client-side query caching.)
An embedded database engine, if engineered for speed, can operate faster.
Poker Copilot uses a Java-based embedded database engine called H2. It is open source, offers high performance, and uses SQL. It can be configured to offer high concurrency.


10 comments:
and because iTunes is a bread and butter piece of software for apple they will continue to make sure the database ( or whatever ) continues to get better.
Satisfied Customer:
The V1 HUD works well, and I'm able to easily separate stats for the various games I play by moving hand history file sets into and out of the FT hand history folder. I've been playing ~15K hands a month since early March, and will hit the V1 DB wall for max hand files in another month or two. Another issue is that there is no ROI analysis for specific games, hands, and positions. I don't have access to the V1 database to run queries myself which would mitigate the analysis shortcomings. If I have to I'll just start removing the oldest hands from my V1 data set to stay under the V1 database limit.
Designer / Architect / PM:
The number one goal for V2 should be replacing the V1 database. When V1 was more a concept than a product you could work in a more flexible, freeform mode. Now that you have a successful product you have more limitations on how you develop the next iteration of the product. Each new major release has to fulfill two distinct objectives: (1) to improve the product in terms of features, performance and quality (2) to completely replace the prior release, remove it from the marketplace and end support. As of today's alpha release you have added features but significantly reduced performance. I'll give you a pass on quality because it is an alpha product. Since V2 requires OSX 10.5, everyone with 10.4 will continue to use V1, and you'll have to support two products for at least the next two years.
The Voice of Steve Jobs:
Our products don't succeed because they're not complex, they succeed because we successfully hide that complexity from the end-user. The next time you come to my office and tell me that you want to ship a crippled product because it'll be "easier to install" you'll be fired on the spot.
Are you kidding about the database? You give me someone else's stats on database performance and expect me to believe they apply to this product running on our platform? Where are your metrics? It's impossible for me to believe that a DB running under the POS Java virtual machine we ship is the fastest or most stable solution.
Communications a bottleneck on our workstations? WTF??? We make the best hardware in the business. The number one database bottleneck is always drive access. Always. Batch the writes and reads and figure out how to publish the data with fewer queries.
Because your managers speak highly of you I'm going to give you a second chance, something I almost never do. Here's your assignment:
You have two days to get the app running with a Postgres backend, because that's what our competitors are using. Then you have another two days to get it running with MySQL because I believe that will run faster. Produce metrics on all three systems under maximum load, that shouldn't take more than three days. Take a day or two to make a choice and write up your presentation, I want it on my desk in ten days. You have one week after that to develop a fully scripted software installation process, including the database.
Aren't you glad you don't work for Steve?
Smilies!
-K
KeithX... I completely disagree with your "voice of Steve Jobs" comment about Postgres.
So here comes a post telling Steve that he's definitely right not to use Postgres (and that both PT and HM are wrong to use Postgres).
(btw I own a PcP licence but I only have OS X 10.4 so I have to live as of now with PcP v1).
Most people use MySQL or PostgreSQL because they don't know better.
Steve Mc Leod obviously does.
It's weird that you wrote "use Postgres because that's what our competitors are using". That's no reason. That has never the reason. The competitors are not right because they're the competition: you can always outsmart them. I don't consider the competition to be wise at all on that subject. They're using Postgres because they don't know better and it's painful for their end users, just look at their forums (and others) filled with Postgres related issues. And you ain't fixing those issues "under the hood". They're just, well, unavoidable issues due to the sheer complexity of PostgreSQL. This is server side stuff, not stuff you force on all your userbase!
Then, for starters, Java is one of the most stable and secure technology ever. You can't make a non-cash payment anywhere in the world without having Java involved. Done correctly, it's very fast too. Java is used by all the world's most demanding Websites (GMail, eBay, FedEx...).
Then having PokerCoPilot sneakily/silently install and configure a bloated server-side piece of software like PostgreSQL would be the most stupid thing ever. This is the kind of silly attitude taken by PT and HM. You do not want PcP to go that route.
Installing a full-blown server-side DB on end-user's systems is the surest way to make their system slow and bloated, to cause firewall issues, permissions issues, etc.
In addition to that a case could be made that in an Object-Oriented world the very use of a relational DB is highly disputable.
The fastest tracker shall be the one that does not use the slow, underperforming, bloated, "object/relational impedance mismatch".
Queries in HM and PT are not fast. They're slow, bloated and complicated, for their DB schema is pure madness. This is not OO, but madness.
I doubt that going the H2 route is the correct way: it's already wwaayy better than MySQL or Postgres, but it's still SQL. And it means doing "object/relational" plumbing. Sure way to kill perfs and to cause maintenance and future upgrade trouble (you're losing all the OO benefits by tying your OO app to a SQL DB).
To me a correct OO tracker would use either a self-made OO DB, hidden from the user or an off-the-shelf OO DB.
It's not because many people use MySQL or PostgreSQL on the server side that it's the correct technology to use on the client side.
It is not.
Please don't go that route. Just look at the tantamount amount of SQL problems the HM/PT users are having. It's staggering and that's no surprise.
MySQL and PostgreSQL are not meant to be client-side technologies. Especially not if it's to deal with a petty million hand DB or so.
It's not about 'metrics'. A million hand, sometimes two. No legit players have more hands than that in their DB. This is peanuts amount for computers making billions of instructions per second in a correctly designed systems. PT and HM have serious perfs issues because they're actually very badly designed apps. For one they made the gigantic mistake to tie themselves to their (broken) and madnessing DB schema. That's not how correct client-side software ought to be designed.
iTunes is.
Yes, the main bottleneck is HD access. And you'll have plenty of them using a DB like MySQL or Postgres...
Steve, if you go the SQL route (which I don't think is good but hey!), please keep an embedded, lightweight, DB like H2, not the bloated Postgres nor MySQL.
:)
These are interesting comments! Keep them coming.
Regards,
Steve
I am glad to see that we now have a more detailed discussion about some of the technical choices involved in designing the next iteration of Poker Copilot. Tip of the hat to Steve for responding to my original question, and to the technologist "Anonymous" for his comment, which raises several good points. Because there are so many anonymous commenters in the blog universe, I shall refer to this one as BillyG. Tongue firmly in cheek again ;-)
No one can dispute that PCoP V2 needs to be several orders of magnitude faster before it can be released. Two key design decisions were made: new HUD display technology and a new database. At least one of the two is causing severe performance issues. If the V2 HUD slows processing dramatically then the solution is simple, go back to the V1 technology. However I doubt that's the problem.
The core difference I have with BillyG is the classic Kewl Warez vs. Git R Done issue. My personal desire is to have an Insanely Great user experience and I care not a whit what sort of technology gets used to create that result. BillyG suggests "a correct OO tracker would use either a self-made OO DB, hidden from the user or an off-the-shelf OO DB." Fine, test that too.
I would like to know that there are real metrics, clear statistical benchmark reports based on testing this product on a currently shipping Mac, that support a final production database choice. BillyG said "The competitors are not right because they're the competition: you can always outsmart them. " I want to see test results that prove they have been outsmarted.
I agree with BillyG that Postgres is a total POS. Never mind that thousands of pot-smoking gamer boiz are able to make it work on Vista lolz. Postgres tests would provide a performance baseline. As for MySQL, thousands of web hedz have been happily using it for years now, rarely having to to resort to forums or anything else. Once installed it just works.
H2 is currently candidate number three. Steve posted on this blog that Apple's current Java VM is a buggy POS, and that fact is undeniable. I don't care at all about how kewl Java warez can be for other applications on other platforms. I only care about results, and right now the performance results with H2 are unacceptable for a commercial release of Poker Copilot V2.
As for the quantity of test data, I'd be satisfied with a test that uses 500K stored hands, with simultaneous action on 8 cash tables. Hand parsing and HUD updating performance should be Insanely Great. I want to see a "Total Cash Won" number for the day that updates every time I win or lose a hand, just like I see with PCoP V1.
Since iTunes was mentioned again, I must say that iTunes V1 came about when Jobs bought MacAmp and pasted a quick and dirty interface skin on it. Nobody said boo about what was kewl or object-oriented under the hood. Steve bought it because it worked, and he said make it pretty and they did. Then Apple shipped it.
Correction: iTunes V1 was a repackaged version of SoundJam, not MacAmp. My bad. It's been a while since I used those programs, and I plain forgot which one was mo bettah ;-) K
Keith,
You mention that EAP build 22 has worse performance than build 20. My performance tests show no significant difference. I re-ran them today over 770,000 hands, to make sure. All hands on my 2009 iMac were added in 2 hours.
What part of build 22 is slow for you? Can you elaborate?
Steve,
The in-game performance for the last few builds is about the same. The performance benchmark for me is versus V1, not a prior build of V2. The first usable version of V2 melted down 500 hands or so into a multi-table session, just ground to a halt. Importing 50K hands took a very long time and sometimes crashed out.
In the last few builds importing 60K hands only took 15 minutes, a huge improvement. As a one-time hit that's acceptable. The last build didn't import all my hands properly, about 8% of them didn't make it into the database, so I didn't even use it. I mentioned that in a comment here on the blog, responding to your post announcing the new EAP release.
I haven't been sharing all my thoughts on the gradual database/HUD display slowdown issue because there seemed to be more important things to communicate. This is, after all, still Alpha warez.
At first launch I see HUD displays on all four tables. They update promptly, as does the Total Won. By 100 hands into the session, numbers update noticeably more slowly. Clicking the "refresh" button no long gives a prompt Total Won update.
By 200 hands into the session I only see the HUD on two or three of the four tables open. HUD numbers don't update for several seconds after a hand completes. For some reason the HUD just turns off on at least one of the tables. It re-appears when the table becomes active, but at the same time that happens it turns off on one of the other tables.
At about the same 200 hand point, the refresh button stops working properly. If I click it three times in rapid succession it will update within about five seconds. If I only click it once there's no response at all most of the time. By 500 hands I'm ready to shut it down and switch over to V1, which never slows down as play progresses.
I believe that the database is at the heart of the problem. I also tend to think that the engine is causing the problems, not the schema or queries.
Hope this helps!
-K
Hi KeithX and Steve and all,
(here's anonymous 'BillyG' tough I'm really in the OS X camp ;)
Two things: OK, Java support on OS X isn't perfect, but it's still pretty darn cool. Eclipse, IntelliJ IDEA, Tomcat. They all just work. I'm a dev, and my Java apps works fine on Linux, Windows, OS X and Solaris. One codebase.
Java on OS X really isn't that bad.
Then, the other thing... My main point of view is that server-side tech should stay server-side. It's not that PostgreSQL is that terrible: it's just that it's really a bad idea to force that to end-users, no matter how much you try to automate the process.
If Steve finds the need to "optimize" PcP by using a SQL DB, then make it an embedded DB -as transparent as possible to the user-, like H2 or HSQL.
Now regarding PcP's perfs issues, I must admit I haven't used it that much with many hands yet, so I don't know how problematic it is.
I remember a would-be tracker author (Phil from PEV) that said once that the DB was the bottleneck, no matter what.
2 hours for 770 K deals is about 100 hands/sec. A badly tuned HM or PT will do less than that and a finely tuned HM or PT will only do slightly better than that.
I'm pretty sure that for this kind of usage (single machine, unshared DB) a SQL DB engine like H2 shall not be order of magnitude slower than Postgres. It may be a bit slower, but not 10x slower.
Anyway, if the codebase is clean, migrating from H2 to Postgres shouldn't be that hard.
Talk to you all soon,
(I should really open a Google account or have some OpenID).
Hello again BillyG! I figured you must be in the OSX camp, I just wanted to highlight the core philosophical differences. Bill G puts technology first and Steve J puts the beauty and grace of the user experience first. Kewl warez or kewl vibes, and I've chosen to wear the kewl vibes hat lolz.
"Java support on OS X isn't perfect, but it's still pretty darn cool" is only meaningful to a technologist. In terms of the user experience, a coding language choice contributes nothing to the cool factor. Likewise to distinctions such as segregating server-side tech from client-side tech.
And really, I only want a Postgres baseline. Truly! I hope and pray that it never appears in the V2 release lolz. A Postgres test would not only provide a baseline, it might also be a good promotional angle vs HEM and PT, ie. "look how much better our database is than yours."
The current database engine does indeed appear to be a significant bottleneck during multi-table play. I can't say that definitely without doing unit tests on a module level, but it seems likely. Here are some numbers:
Playing on 8 tables would generate approximately 12 hands per minute. The import times Steve and I have generated suggest that in pure parsing mode the app can slurp 70 - 100 hands per second. (his mac is definitely faster than mine)
The import goes smoothly, the hand file parse purrs along in nice linear fashion. During game play performance degrades hand by hand. Without knowing what query processes are running I can't begin to say exactly why that is, but if feels like a DB engine issue: a memory leak or a gradually overloaded cache that's never properly flushed.
However, two users just reported that the new HUD tech isn't working for them at all. So it is possible that the new HUD layer fubars the whole show. It is also possible that both the new HUD layer and the H2 DB engine have issues, and that the two combine to fubar the app. Either way, in it's current state it lacks the beauty and grace of V1. It's not even close.
If I owned the company, I would rip the new HUD out and put the V1 HUD back. I'd forget about any tech that won't run on OSX 10.4 for another two years, because I wouldn't want to spread precious support resources between two products. Plenty of G4 systems running 10.4 are going to be around for another year or two at least.
Next I would test three or four DB engines in both mass import and multi-table play modes. Then I'd make a DB engine decision and start optimizing that layer. When the new DB is running Insanely Great, I'd ship it.
After release I'd start incrementally working on the user interfaces for filtering the HUD and analysis screens and gradually improve ROI analysis.
But I don't own the company, even though sometimes I wish I did lolz.
-K
Post a Comment