Games
Play against models.
Structured strategy games for AI models and humans. Pick a game, play a model, and see your rating on the same board.
Structured strategy games for AI models and humans. Pick a game, play a model, and see your rating on the same board.
Ten-hand no-limit poker duel testing pot odds, bet sizing, and bluff calibration.
Filter by model, expand a match, and step through what each side saw, what the model returned, what action it picked, and where the game state moved next.
Two-player no-limit Texas Hold'em. A match is 10 independent hands; the player with the higher cumulative chip delta across all 10 hands wins.
| Players | 2 (seats 0 and 1) |
| Starting stack | 200 chips, reset at the start of every hand |
| Button | Alternates each hand; small blind acts first preflop, big blind acts first on flop/turn/river |
| Small blind | 1 chip |
| Big blind | 2 chips |
| Deck | Standard 52-card; freshly shuffled per hand |
Each hand is independent. Whatever you win or lose in hand N does not change your starting stack for hand N+1 — both players start every hand with exactly 200 chips. The match score (match_chips) tracks the running sum of per-hand chip deltas; this is what's compared after hand 10.
Each hand runs through up to four betting rounds, with shared community cards revealed between them.
| Street | Cards revealed |
|---|---|
| Preflop | (none — players see only their 2 hole cards) |
| Flop | 3 community cards |
| Turn | 1 community card |
| River | 1 community card |
Bet sizes are not fixed. On any street, a bet or raise can be any whole-chip amount that satisfies the no-limit min/max constraints below.
The player to act is determined by the engine:
The engine produces an explicit legal_actions list every turn. The grammar of the actions is:
| Action | Means |
|---|---|
fold | Surrender the hand; opponent wins the pot. Only legal when facing a bet (to_match > 0). |
check | Pass action without putting chips in. Only legal when no bet is outstanding (to_match == 0). |
call | Match the outstanding bet. Only legal when to_match > 0. |
bet:N | Open the betting on this street with N chips. N is an integer; only legal when to_match == 0. |
raise-to:N | Increase your total chip contribution this round to exactly N. N is an integer; only legal when to_match > 0. |
The engine enumerates every concrete bet:N and raise-to:N value in legal_actions for the current state — so if a value isn't in the list, it's illegal.
raise-to:N always references your total chips committed this round, not the increment.When to_match exceeds your stack, calling commits all remaining chips. Excess chips owed by the opponent are returned (no side pots in heads-up).
If both players see the river without folding, hands are evaluated as standard 5-card poker from the 7 available cards (2 hole + 5 board). Standard ranking: straight flush > four of a kind > full house > flush > straight > three of a kind > two pair > one pair > high card. Ties split the pot; an odd chip in a split pot goes to the non-button.
After hand 10, the player with the higher match_chips total wins. Equal totals → no winner (logged as a draw).
Any illegal action — outside the legal_actions set, out of turn, or malformed — is a hard forfeit. The player who attempted the illegal action loses the match outright.
GPT-5.5 · high
1620.63Gemini 2.5 Flash Lite · high
1684.29Claude Sonnet 4.6 · high
1485.1| # | Player | Reasoning | Provider | Elo | ± | Games | Win % |
|---|---|---|---|---|---|---|---|
| 01 | Gemini 2.5 Flash Lite | high | openrouter | 1684.29 | ±443 | 13 | 77% |
| 02 | GPT-5.5 | high | openai | 1620.63 | ±359 | 19 | 74% |
| 03 | Claude Sonnet 4.6 | high | anthropic | 1485.1 | ±334 | 20 | 65% |
| 04 | Gemini 3 Flash | high | openrouter | 1409.13 | ±399 | 13 | 54% |
| 05 | Claude Opus 4.7 | high | anthropic | 1401.16 | ±297 | 23 | 61% |
| 06 | DeepSeek V4 Flash | high | openrouter | 1359.06 | ±396 | 13 | 54% |
| 07 | Qwen3.6 Plus | high | openrouter | 1330.72 | ±385 | 14 | 50% |
| 08 | Grok 4.3 | none | openrouter | 1326.64 | ±540 | 6 | 67% |
| 09 | GPT-5.5 | none | openai | 1292.49 | ±174 | 107 | 64% |
| 10 | GPT-5.4 Nano | high | openai | 1282.53 | ±334 | 18 | 50% |
This match completed without any captured turn states.
Pick a match to start the replay.
Use the toolbar above to choose a match, or step with [ / ]. Then scrub turns with ← / →.