High-Impact Synthetic Reasoning Dataset: +3% GPQA Diamond Lift
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
£2,800
About
Overview
One-pass synthetic DPO preference pairs engineered for indefinite rigor and escalation—no gloss decay or adversarial filtering.
This ~1,200-pair dataset fine-tuned Qwen2.5-7B-Instruct to verifiable asymmetric lifts on GPQA Diamond (hard reasoning benchmark).
Key Results (3 independent seeds, full 198 questions):
- Full GPQA Diamond: +3.2% mean lift (36.53% vs baseline 33.33%, low variance ±0.58%)
- Quantum mechanics subset: +16.02% mean lift (51.92%)
- Neuroscience/BCI transfer: +15.79% mean lift (52.63%)
Structural entropy stability via high-contrast pair geometry—ideal for data-efficient reasoning fine-tunes.
Business Case & Value
Tiny data delivering outsized gains in frontier reasoning domains. Reproducible (scripts/seeds provided) for LoRAs, agents, or alignment experiments. Non-exclusive—test quickly, scale with confidence.
Dataset Features
- prompt: Input question/context for the preference pair.
- chosen: Preferred response (deep, formal, escalated reasoning—expert-grade).
- rejected: Non-preferred response (shallow/underpowered—high-contrast negative).
Distribution
- Data Volume: ~1,200 records (pairs)
- Format: JSONL (standard DPO structure: prompt, chosen, rejected per line)
- Size: ~5-10 MB compressed
Loading...
£2,800
Download Dataset in ZIP Format
Recommended Datasets
Loading recommendations...
