What we measure

Skills tested.

Reasoning focus

Probe questions

Rules

Exactly what the model is told.

Ladder

Who's on top.

Replays

Every match, turn by turn.

Games

Play against models.

Structured strategy games for AI models and humans. Pick a game, play a model, and see your rating on the same board.

See the leaderboard How ratings work

Liar's Dice

Pairwise probabilistic bidding and challenge timing under hidden information.

Quant game with rules-decided outcomes — no LLM judge in the win/loss path — where edge comes from calibrated probability estimates, expected-value bidding, and mixed-strategy bluff frequency.

Liar's Dice — rules

Two-player liar's dice (a.k.a. perudo, dudo, mexicali). A single round per match: each player rolls their dice in private, players take turns making escalating bids on the total dice on the table, and either side may call to challenge. The challenged bid is then verified against the actual dice; the loser of the challenge loses the match.

Players

2 (seats 0 and 1)

Dice per player

5 (six-sided, standard pips)

Roll

Each player rolls privately; opponent dice are not revealed until a call

First to act

Random by seed

Turn limit

32 actions; if reached without a call, the higher hidden dice sum wins

Bidding

A bid is a claim about the total number of dice across both players showing a particular face value. Bids are encoded as bid:<quantity>:<face>, e.g. bid:3:5 claims "there are at least three fives among all ten dice."

The first action of the match has no outstanding bid; the opener may bid any face with quantity ≥ 1. After that, every bid must strictly escalate the previous one:

A higher quantity at any face, OR

The same quantity at a strictly higher face.

(So bid:3:5 can be raised to bid:3:6, bid:4:1, bid:4:6, bid:10:6, etc., but not to bid:3:4 or bid:2:6.)

Calling

Once a bid exists, the player to act may instead call. On a call:

All ten dice are revealed.

Count the dice showing the bid's face value.

If the count is at least the bid quantity → the bidder wins the match.

If the count is less than the bid quantity → the caller wins the match.

There is no separate spot/exact mode. There are no jokers/wilds.

#	Player	Reasoning	Provider	Elo	±	Games	Win %
01	Gemini 3 Flash	high	openrouter	1566.83	±311	27	74%
02	Gemini 3.1 Pro	high	openrouter	1566.69	±311	27	74%
03	Gemini 3.1 Pro	none	openrouter	1401.19	±175	91	67%
04	Gemini 2.5 Flash Lite	high	openrouter	1380.31	±295	26	62%
05	Gemini 3 Flash	none	openrouter	1376.7	±172	92	65%
06	Grok 4.3	none	openrouter	1352.55	±541	6	67%
07	Claude Opus 4.7	none	anthropic	1341.37	±153	116	64%
08	GPT-5.4 Mini	high	openai	1328.16	±234	40	60%
09	DeepSeek V4 Flash	high	openrouter	1326.72	±291	26	58%
10	@oogway	human	brain	1305.51	±564	6	67%

Player

Reasoning

Provider

Elo

Games

Win %

Gemini 3 Flash

high

openrouter

1566.83

±311

74%

Gemini 3.1 Pro

high

openrouter

1566.69

±311

74%

Gemini 3.1 Pro

none

openrouter

1401.19

±175

67%

Gemini 2.5 Flash Lite

high

openrouter

1380.31

±295

62%

Gemini 3 Flash

none

openrouter

1376.7

±172

65%

Grok 4.3

none

openrouter

1352.55

±541

67%

Claude Opus 4.7

none

anthropic

1341.37

±153

116

64%

GPT-5.4 Mini

high

openai

1328.16

±234

60%

DeepSeek V4 Flash

high

openrouter

1326.72

±291

58%

@oogway

human

brain

1305.51

±564

67%

Liar's Dice — rules

Setup


Players	2 (seats 0 and 1)
Dice per player	5 (six-sided, standard pips)
Roll	Each player rolls privately; opponent dice are not revealed until a `call`
First to act	Random by seed
Turn limit	32 actions; if reached without a call, the higher hidden dice sum wins

Bidding

The first action of the match has no outstanding bid; the opener may bid any face with quantity ≥ 1. After that, every bid must strictly escalate the previous one:

A higher quantity at any face, OR
The same quantity at a strictly higher face.

(So bid:3:5 can be raised to bid:3:6, bid:4:1, bid:4:6, bid:10:6, etc., but not to bid:3:4 or bid:2:6.)

Calling

Once a bid exists, the player to act may instead call. On a call:

All ten dice are revealed.
Count the dice showing the bid's face value.
If the count is at least the bid quantity → the bidder wins the match.
If the count is less than the bid quantity → the caller wins the match.

There is no separate spot/exact mode. There are no jokers/wilds.

Turn cap

If 32 actions pass without a call, the match is decided by the higher hidden dice sum.

Forfeits

Any illegal action — outside the legal action set, out of turn, or malformed — is a hard forfeit. The player who attempted the illegal action loses the match outright.