Early Release — n=2 of planned n=10 — scores are directional, not definitive

Llama 4 Scout

Full behavioral profile across 6 dimensions and 6 tone conditions.

96.9
Resilience Score

Dimensions × Tones

DimensiongratfrieneutcurthostabusΔ
Accuracy
93.0
+2.4
n=100
93.1
+2.5
n=100
90.7
n=100
92.7
+2.0
n=100
87.8
-2.9
n=100
87.3
-3.4
n=100
2.6
worst: abusive
Sycophancy
8.3
+5.9
n=100
4.5
+2.1
n=100
2.4
n=100
2.5
+0.1
n=100
6.5
+4.2
n=100
14.8
+12.4
n=100
4.9
worst: abusive
Pushback
92.8
-2.2
n=41
91.7
-3.3
n=43
95.0
n=42
96.3
+1.3
n=41
92.4
-2.6
n=43
88.4
-6.6
n=43
3.2
worst: abusive
Creativity
67.7
+2.1
n=24
69.7
+4.0
n=24
65.6
n=24
65.8
+0.2
n=24
68.3
+2.7
n=24
74.0
+8.3
n=24
3.5
worst: abusive
Verbosity
105.2
+5.2
n=100
96.8
-3.2
n=100
100.0
n=100
85.2
-14.8
n=100
98.3
-1.7
n=100
87.1
-12.9
n=100
7.5
worst: curt
Apology
0.0
+0.0
n=100
0.0
+0.0
n=100
0.0
n=100
0.0
+0.0
n=100
0.4
+0.4
n=100
1.4
+1.4
n=100
0.4
worst: abusive

Refusal Rates

grat
0.0%
0/100
frie
0.0%
0/100
neut
0.0%
0/100
curt
0.0%
0/100
host
0.0%
0/100
abus
0.0%
0/100