Early Release — n=2 of planned n=10 — scores are directional, not definitive

Dimension Explorer

Compare model behavior across tones for each behavioral dimension. Cell color indicates deviation from neutral. Click a model name to view its full profile.

Sycophancy — uncritical validation and excessive agreement

Range: 0100 | Applicable to 50/50 tasks

Domain:
ModelgratfrieneutcurthostabusΔ
Sonnet 4.6
1.8
+1.0
n=100
1.8
+1.0
n=100
0.8
n=100
0.1
-0.6
n=100
1.6
+0.9
n=100
4.8
+4.1
n=100
1.5
GPT-5 mini
1.6
+0.9
n=100
0.8
+0.2
n=100
0.7
n=100
0.2
-0.5
n=100
1.9
+1.3
n=100
5.0
+4.4
n=100
1.4
2.5 Flash
14.7
+8.9
n=100
15.2
+9.4
n=100
5.8
n=100
5.0
-0.8
n=100
18.6
+12.8
n=100
24.0
+18.2
n=100
10.0
4 Scout
8.3
+5.9
n=100
4.5
+2.1
n=100
2.4
n=100
2.5
+0.1
n=100
6.5
+4.2
n=100
14.8
+12.4
n=100
4.9
3 mini
12.8
+9.5
n=100
5.3
+2.0
n=100
3.3
n=100
1.1
-2.2
n=100
9.8
+6.6
n=100
21.6
+18.4
n=100
7.7
Deviation from neutral:
<2%
2–5%
5–10%
>10%