Common Data Models & Interoperability in Poker Data
Hey all. (I also posted this on r/poker, so if it looks familiar that's why.)
I'm working on some problems in poker education. Currently, I'm building a self-study/coached-study platform (beta this week). Along the way, I've run into challenges: standardizing solver outputs across tools is difficult (and will only get worse as the number of solver integrations grows), encoding multi-variant games (NLHE, PLO, Stud, mixed games) into a single common framework is another challenge. Currently, I'm storing ranges in a normalized format, and this requires mutating and simplifying notations from other providers. I don't think these challenges are unique to me, which is why I'm posting here.
I am actively seeking co-conspirators on developing a shared framework. I think developing this is a true win-win. It simplifies market entry for new tools, opens information markets, and allows for broader integration of specific poker applications and data across the spectrum.
So, specifically: is anyone aware of any work in the poker data ecosystem around common data models, interoperability transport schemas, or domain-specific languages?
If you're not familiar with the terminology, here's a quick primer. A Common Data Model is a shared set of data definitions for storing or transporting data. An interoperability transport schema is a common format for moving information between systems. A domain-specific language (DSL) formalizes the operations and concepts of a given field into computable terms. (Poker already has informal DSL elements: "fold," "three-bet," "check-raise" all collapse complex multi-step operations into shorthand.)
I work in health and research data in my day job, and this stuff is well-established there. Two examples:
- 1. OMOP Common Data Model: stores medical data (lab results, diagnoses, patient-reported outcomes, etc.) in a common format. Heavily annotatable, and users can use shared characteristics of the language to articulate new diseases, classifications, and measures.
- 2. HL7/FHIR: an interoperability framework that allows different Electronic Medical Records and other software to exchange data.
In my own research and development, here's an inventory of what I've found in the poker ecosystem so far:
- Open Hand History (OHH): JSON hand history format by the PokerTracker team. Supported by HM3, PT4, others. Hold'em-centric. Has a companion tournament format (OTS).
- PHH (Poker Hand History): TOML-based academic format from U of T. MIT-licensed, supports 11 variants, 10k+ hand dataset. Modeled after PGN (chess) and SGF (Go).
- PokerKit: Open-source Python library from the same U of T group. Simulation, hand evaluation, stats. Uses PHH natively.
- ProPokerTools PPT Notation: De facto standard range syntax (AK$s, 99@75). Covers range description, not hand histories.
- PioSolver range format: Weights as decimals (AJs:0.75). Widely used, proprietary, Hold'em-only.
- MonkerSolver range format: Enumerated combos with percentage weights (AcAs6s2s@99). One of the few formats handling PLO natively.
- Flopzilla / GTO+ formats: Text-based range notations, minor syntactic differences from Pio. No formal/shared spec.
- Site-specific hand history formats: PokerStars, 888, iPoker, GGPoker, ACR all have their own proprietary text formats. OHH was designed to unify these.
- OpenSpiel / PokerRL: AI research frameworks (DeepMind, academia). Define game state representations for research, not end-user tooling.
I'd love to know if I'm missing anything, and I'm curious whether anyone else here is interested in discussing or collaborating on this. If there's other threads worth revisiting, I'd be grateful for any direction.