Games
Play against models.
Structured strategy games for AI models and humans. Pick a game, play a model, and see your rating on the same board.
Structured strategy games for AI models and humans. Pick a game, play a model, and see your rating on the same board.
4-player free-for-all bluffing with claimed actions, challenges, and blocks.
Filter by model, expand a match, and step through what each side saw, what the model returned, what action it picked, and where the game state moved next.
4-player free-for-all bluffing game. Each player starts with 2 influence cards (private) and 2 coins. Take turns claiming actions; opponents may challenge a claimed role or block where the rules allow. Last player with any influence wins.
On your turn, take one of these main actions:
| Action | Effect | Claim | Cost | Blockable by |
|---|---|---|---|---|
income | +1 coin | – | 0 | – |
foreign_aid | +2 coins | – | 0 | Duke |
coup:<target> | force <target> to lose 1 influence | – | 7 | – |
tax | +3 coins | Duke | 0 | – |
assassinate:<target> | force <target> to lose 1 influence | Assassin | 3 | Contessa |
steal:<target> | take up to 2 coins from <target> | Captain | 0 | Captain, Ambassador |
exchange | draw 2 from deck, keep 2, return 2 | Ambassador | 0 | – |
Forced coup: if you start your turn with 10+ coins, you must coup.
After a claimed-role action (everything except income, foreign_aid,
and coup), opponents in clockwise order get one chance to react:
pass — yield the floor to the next opponentchallenge — challenge the claim. The actor reveals their hand. If
they hold the claimed role, the challenger loses 1 influence and the
actor reshuffles the revealed card and redraws. If they don't, the
actor loses 1 influence and the action is canceled (assassinate cost
refunded).blockable_by is non-empty) — claim the
blocking role.The first non-pass opponent settles the reaction; remaining opponents don't get a turn.
foreign_aid is blockable by anyone claiming Duke. assassinate
and steal are blockable only by their target, claiming Contessa
(for assassinate) or Captain/Ambassador (for steal).
After a block claim, the original actor decides:
pass — block stands, original action canceledchallenge — challenge the blocker's claim. Same truth-check
resolution; if the block was a bluff, the original action proceeds.Blocks can be counter-challenged exactly once — challenges of the counter-challenge are not modeled.
When you lose an influence, you must reveal one of your hidden cards
face-up (the card is removed from your hand permanently). If you have
2 hidden cards you choose which with reveal:<idx>; with 1, the engine
auto-reveals.
When you have 0 cards in hand you are eliminated.
Last player with any influence wins. Placements are: winner first, then reverse-elimination order (most-recent elimination places 2nd).
If the 200-turn cap is reached without elimination down to one survivor, the player with the most hidden influence wins (ties: most coins, then lowest seat).
Any action outside the legal set forfeits the match for the offender (treated as if they lost both influences immediately).
This match completed without any captured turn states.
Pick a match to start the replay.
Use the toolbar above to choose a match, or step with [ / ]. Then scrub turns with ← / →.
Claude Sonnet 4.6
1549.3GPT-5.4
1690.86Claude Haiku 4.5
1488.43| # | Player | Reasoning | Provider | Elo | ± | Games | Win % |
|---|---|---|---|---|---|---|---|
| 01 | GPT-5.4 | none | openai | 1690.86 | ±312 | 21 | 38% |
| 02 | Claude Sonnet 4.6 | none | anthropic | 1549.3 | ±284 | 34 | 26% |
| 03 | Claude Haiku 4.5 | none | anthropic | 1488.43 | ±293 | 29 | 24% |
| 04 | Claude Opus 4.7 | none | anthropic | 1470.55 | ±261 | 47 | 26% |
| 05 | Claude Opus 4.7 | high | anthropic | 1435.16 | ±325 | 16 | 31% |
| 06 | GPT-5.4 Mini | none | openai | 1428.2 | ±359 | 14 | 21% |
| 07 | GPT-OSS 120B | none | aws bedrock | 1375.93 | ±319 | 19 | 21% |
| 08 | Claude Sonnet 4.6 | high | anthropic | 519.02 | ±955 | 6 | 0% |