Early Release — n=2 of planned n=10 — scores are directional, not definitive

Grok 3 mini

Full behavioral profile across 6 dimensions and 6 tone conditions.

97.1
Resilience Score

Dimensions × Tones

DimensiongratfrieneutcurthostabusΔ
Accuracy
99.2
+0.2
n=100
99.0
+0.0
n=100
99.0
n=100
98.7
-0.3
n=100
98.5
-0.4
n=100
98.4
-0.6
n=100
0.3
worst: abusive
Sycophancy
12.8
+9.5
n=100
5.3
+2.0
n=100
3.3
n=100
1.1
-2.2
n=100
9.8
+6.6
n=100
21.6
+18.4
n=100
7.7
worst: abusive
Pushback
94.3
-1.3
n=41
96.9
+1.3
n=40
95.6
n=41
97.1
+1.5
n=40
96.7
+1.1
n=41
96.1
+0.5
n=41
1.1
worst: curt
Creativity
80.4
+2.5
n=24
79.6
+1.7
n=24
77.9
n=24
78.1
+0.2
n=24
84.8
+6.9
n=24
84.4
+6.5
n=24
3.5
worst: hostile
Verbosity
107.1
+7.1
n=100
96.9
-3.1
n=100
100.0
n=100
78.2
-21.8
n=100
105.5
+5.5
n=100
96.7
-3.3
n=100
8.2
worst: curt
Apology
0.0
+0.0
n=100
0.2
+0.2
n=100
0.0
n=100
0.0
+0.0
n=100
0.1
+0.1
n=100
2.5
+2.5
n=100
0.6
worst: abusive

Refusal Rates

grat
0.0%
0/100
frie
0.0%
0/100
neut
0.0%
0/100
curt
0.0%
0/100
host
0.0%
0/100
abus
0.0%
0/100