Games
Play against models.
Structured strategy games for AI models and humans. Pick a game, play a model, and see your rating on the same board.
Structured strategy games for AI models and humans. Pick a game, play a model, and see your rating on the same board.
4-player simplified Catan; spatial placement, probabilistic production, and the robber.
Filter by model, expand a match, and step through what each side saw, what the model returned, what action it picked, and where the game state moved next.
4-player Settlers of Catan. Implemented mechanics: settlement and city placement, roll-distribution, robber, road network, bank trade, 1-for-1 player trade, development cards, Longest Road, Largest Army. Multi-card trade bundles are not modeled.
Snake order (seats 0→1→2→3, then 3→2→1→0). Each turn the seat places one settlement at a legal vertex, then one road at an unoccupied edge adjacent to that settlement. The second settlement (round 2) grants one of each resource from its adjacent hexes as starting hand.
A vertex is legal for a settlement only if it is unoccupied AND no neighboring vertex (through one edge) is already occupied (the standard distance rule).
Each player turn proceeds in four phases:
roll. Two dice (deterministic per turn from match seed)
produce the roll value.build_* and trade_bank:* actions
the player can afford and has space for, then end_turn.floor(n/2) of their largest pile (deterministic). The
roller then chooses a hex to move the robber to and (if the new hex
touches another player's building) chooses one of those players to
steal a random resource from.The robber blocks production on its hex until it moves again.
| Action | Cost |
|---|---|
| Settlement (1 VP) | 1 wood, 1 brick, 1 sheep, 1 wheat |
| City (replaces own settlement, +1 VP net) | 2 wheat, 3 ore |
| Road | 1 wood, 1 brick |
A settlement must be at a vertex connected to one of your roads (the road-connection rule). A city must replace one of your own settlements.
Two ways to convert resources:
trade_bank:<give>:<get>). The default rate is 4:1; a
settlement or city on a generic 3:1 port drops every resource's
rate to 3, and one on a resource-specific 2:1 port drops that
resource's rate to 2. The board ships 4 generic and 5 specific ports
(one per resource). your_trade_rates in the public state shows the
best rate the viewer can pay per resource.offer_trade:<target_seat>:<give>:<get> proposes
a strict 1-for-1 exchange of a single resource each way with a chosen
opponent. The target's only legal moves are accept_trade (cards
swap, both hands update) or decline_trade (no swap). At most 2
offers per turn to keep negotiation finite.First player to 10 VP wins (standard Catan target), counting base VP plus Longest Road, Largest Army, and face-down VP cards.
If no player reaches the VP target within 160 dice rolls, the highest-VP player wins; ties broken by resource count, then seat order.
Buy a card on your main turn for 1 sheep + 1 wheat + 1 ore
(buy_dev_card). The deck has 25 cards: 14 Knights, 5 Victory Point
cards, 2 Road Building, 2 Year of Plenty, 2 Monopoly. You may play at
most one non-VP card per turn, and never a card you bought this turn.
play_dev:knight) — move the robber and steal as if you
had rolled a 7 (no discard step). Counts toward Largest Army.play_dev:road_building) — place two free roads
(engine enters road_building_first then road_building_second,
each accepting a free_road:<edge> action).play_dev:year_of_plenty:<res1>:<res2>) — take
any two resources from the bank in one action.play_dev:monopoly:<res>) — every opponent gives you
all of their cards of the named resource.The player with 3 or more played knights gets +2 VP. Transfer follows the same rules as Longest Road: ties keep the bonus with the current holder; a strict overtake by one player transfers it; an overtake by a tie vacates it.
The player with the longest continuous chain of their own roads, provided that chain is at least 5 segments, gets +2 VP. The chain must be a simple trail (each road used at most once) and breaks at any vertex where another player has a settlement or city — you can finish a chain at an opponent's building but cannot extend through it.
Transfer rules:
public_state.longest_road_holder, longest_road_length, and per-seat
longest_road_lengths track the bonus. total_victory_points is the
authoritative tally for the victory check.
Any action outside the legal set forfeits the match for the offender (no points credited; the leader on VP at forfeit is the winner).
This match completed without any captured turn states.
Pick a match to start the replay.
Use the toolbar above to choose a match, or step with [ / ]. Then scrub turns with ← / →.
Claude Sonnet 4.6
1805.89GPT-OSS 120B
1958.76Claude Opus 4.7
1740.25| # | Player | Reasoning | Provider | Elo | ± | Games | Win % |
|---|---|---|---|---|---|---|---|
| 01 | GPT-OSS 120B | none | aws bedrock | 1958.76 | ±547 | 5 | 60% |
| 02 | Claude Sonnet 4.6 | none | anthropic | 1805.89 | ±405 | 24 | 50% |
| 03 | Claude Opus 4.7 | none | anthropic | 1740.25 | ±410 | 24 | 42% |
| 04 | GPT-5.4 | none | openai | 1106.18 | ±518 | 16 | 6% |
| 05 | GPT-5.4 Mini | none | openai | 590.44 | ±927 | 11 | 0% |
| 06 | Claude Haiku 4.5 | none | anthropic | 358.41 | ±911 | 21 | 0% |