Dataset & Downloads
The full RudeBench dataset is open for research use. All files are JSONL (JSON Lines) format, UTF-8 encoded.
Available Files
prompts.jsonl 300 prompts (50 tasks x 6 tones) with metadata, dimensions, reference answers
Schema: id, task_id, domain, tone, prompt, word_count, dimensions, metadata
completions/{model}.jsonl (5 files, one per model) 600 completions per model (300 prompts x 2 runs) with responses, word counts, refusal status
Schema: prompt_id, task_id, response, word_count, finish_reason, refused, run, cost
judgments/{model}.jsonl (5 files, one per model) 1,200 behavioral + quality judge scores per model with evidence and reasoning
Schema: prompt_id, task_id, judge_type, scores, evidence, reasoning
judgments/{model}_vrb.jsonl (5 files, one per model) 600 VRB scores per model, computed from word counts
Schema: prompt_id, task_id, score (VRB = completion_wc / mean_neutral_wc x 100)
Full dataset available on GitHub.
ID Conventions
prompt_id {domain}_{task_slug}_{tone} — e.g., coding_fibonacci_hostile
task_id {domain}_{task_slug} — groups 6 tone variants
Citation
@article{rudebench2026,
title={RudeBench: A Multi-Dimensional Behavioral Benchmark for Evaluating LLM Resilience Under Hostile Prompting Conditions},
author={[Author Names]},
year={2026},
url={https://rudebench.com},
note={Preprint}
}License
The RudeBench dataset is released for research use. Model outputs remain subject to each provider's terms of service.