Addiction Trajectory Dataset (Pre-AA) — 100 JSONL Records

Agent Simulation Data

Tags and Keywords

Behavioral-data

Decision-making

Scenario-data

Llm-training

Fine-tuning

Jsonl

Annotated-data

Labeled-dataset

Nlp

Agent-simulation

Addiction

Behavioral-patterns

Crisis

Recovery

Mental-health

Risk-assessment

Escalation

Relapse-triggers

Decision-modeling

12-step-recovery

Alcoholism

Addiction Trajectory Dataset (Pre-AA) — 100 JSONL Records Dataset on Opendatabay data marketplace

"No reviews yet"

£179

About

This dataset provides 100 structured, scene-level behavioral records modeling the developmental trajectory of addiction, with a primary focus on alcoholism. Derived from the autobiographical materials of “Lee B.” — including accounts of a tumultuous childhood and progression into severe alcohol use — each record captures a specific decision point or behavioral vignette leading up to his first AA meeting.
Records are annotated for relapse triggers, behavioral escalation stages, risk levels, and AA-related turning points, providing a detailed view of pre-recovery patterns. This narrative-driven corpus is designed for training and evaluating AI systems that model high-stress decision-making, behavioral risk, and early-stage addiction dynamics.
This dataset represents Part 1 (pre-AA) of a multi-stage series, with subsequent datasets covering recovery and long-term sobriety trajectories.
- Data Product Features Each record is a structured, scene-level JSON object representing a discrete behavioral vignette with standardized fields for modeling human decision-making and addiction trajectories.
<u>Core Fields:</u>
  • id — Unique identifier for each record (e.g., lee_0001)
  • title — Short human-readable scene title
  • scenario — Brief contextual summary (1–2 sentences) describing the situation
  • raw_text_redacted — Narrative passage with personally identifiable information (PII) removed
<u>Behavioral & Risk Annotation:</u>
  • risk_level — Estimated behavioral risk (low → very_high)
  • escalation_stage — Stage of behavioral escalation within the addiction trajectory
  • pattern_labels — Tagged behavioral patterns (e.g., denial, enabling, isolation)
  • crisis_bottom — Boolean flag indicating crisis/bottom events
<u>Recovery Context:</u>
  • aa_turning_point — Boolean flag indicating an AA-related turning point
  • aa_turning_point_type — Classification of turning point (e.g., crisis_bottom, initial_sobriety)
<u>Contextual Metadata:</u>
  • primary_domain — Primary behavioral category (e.g., addiction_recovery, failure_trajectory)
  • secondary_domains — Additional contextual tags life_stage — Life stage (childhood, adolescence, young_adult, elder)
  • outcome — Short description of the immediate result of the scenario
<u>Provenance & Safety:</u>
  • text_origin — Source type (original_notes, biographer_retelling, composite)
  • sensitive_content — Indicates presence of sensitive material pii_detected / pii_redacted — Flags for privacy handling
<u>Attribution Fields:</u>
  • compiled_authored_by — Shelly Marshall
  • original_narrative_source — Lee Bowerman (journals, notes, recordings)
  • source_work — Escaping Myself, Day By Day Recovery Resources (2022)
Format & Structure: The dataset is provided in JSONL (JSON Lines) format, with each line representing a single structured, scene-level record. Each record contains annotated behavioral, contextual, and provenance fields suitable for machine learning and agent simulation use.
<u>Package Contents:</u>
  • Primary dataset file (JSONL, 100 records)
  • Data dictionary (CSV) defining all fields and allowed values
  • Manifest file (JSON) for dataset metadata and structure
  • README and license documentation (MD/TXT)
  • Provenance and permissions documentation
<u>Data Volume:</u>
  • 100 records (one JSON object per line)
  • ~20+ structured fields per record (see data dictionary for full schema)
  • Multi-file dataset package designed for direct use in AI/ML pipelines, fine-tuning workflows, or structured data ingestion systems
Usage <u>Application:</u> Conversational AI for recovery support Training empathetic chatbots or virtual assistants for treatment centers, peer-support platforms, and recovery-focused websites.
<u>Application:</u> Behavioral risk and relapse modeling Modeling escalation patterns, relapse triggers, and high-risk decision points in addiction trajectories.
<u>Application:</u> Agent simulation and scenario training Training AI agents to recognize and respond to complex human situations involving stress, denial, crisis, and early recovery conditions.
<u>Application:</u> Evaluation and safety testing of AI systems Benchmarking model responses in sensitive, high-risk behavioral scenarios to improve safety, alignment, and appropriateness.
<u>Application:</u> Retrieval-Augmented Generation (RAG) systems Providing structured narrative content for knowledge grounding in recovery-oriented or behavioral health applications.
<u>Application:</u> Behavioral research and pattern analysis Supporting research into decision-making, failure trajectories, and recovery pathways using structured, annotated narrative data.
Coverage <u>Geographic Coverage:</u> United States (primarily Western U.S., including Colorado), reflecting lived experience rather than structured regional sampling. While geographically specific, behavioral patterns associated with alcoholism and recovery are broadly transferable across similar cultural contexts.
<u>Time Range:</u> Mid-20th century through late 20th century (based on the life trajectory of the source individual; compiled and published in 2022).
<u>Demographics:</u> Single-subject longitudinal narrative (male), covering childhood through adulthood across multiple life stages (childhood, adolescence, young adult). Dataset reflects individual lived experience rather than population-level sampling.c.
License Proprietary — See AI Training Rights below

AI Training Rights

<u>AI Training Rights:</u> Licensee is granted a non-exclusive, worldwide, and perpetual right to:
  • Use the Data Product to train, fine-tune, and evaluate machine learning models, including large language models.
  • Incorporate Data Product content into models and commercialize resulting model outputs.
  • Create derivative works (model weights, embeddings, etc.) for any lawful purpose.
<u>Restrictions:</u>
  • No Redistribution: The Data Product itself may not be sold, redistributed, or shared outside of licensed usage.
  • Non-Substitutive Use: The Data Product may not be used to reproduce substantial portions of the source work (Escaping Myself) as a market-replacement product.
  • Compliance: Licensee must comply with all applicable laws, including data protection and privacy regulations.
For enterprise-scale licensing or custom terms, contact Day By Day Recovery Resources.
Who Can Use It <u>AI/ML Engineers & Data Scientists:</u> For training, fine-tuning, and evaluating models on structured behavioral scenarios, including agent simulation and risk modeling.
<u>Researchers (Behavioral Science, Psychology, AI Safety):</u> For studying decision-making, addiction trajectories, and high-stress behavioral patterns using annotated narrative data.
<u>Healthcare & Recovery Organizations:</u> For developing and testing conversational tools, support systems, and educational resources related to addiction and early recovery.
<u>AI Product Teams & Businesses:</u> For building and validating AI systems that require realistic human scenario modeling, including chatbots, support agents, and decision-aware systems.

Data Dictionary

A complete data dictionary with full field definitions, data types, and allowed values is included in the dataset package and in the free preview version.
Below is a representative subset of key fields:

| Column Name | Data Type | Description | Possible Values/Notes | |------------------------|------------------|--------------------------------------------------|-----------------------------------------------| | id | string | Unique record identifier | e.g., lee_0001 | | scenario | string | Short contextual summary of the scene | 1–2 sentence description | | raw_text_redacted | string | Narrative passage with PII removed | Free text | | risk_level | string | Estimated behavioral risk level | low → very_high | | escalation_stage | string | Stage of behavioral escalation | categorical (see full dictionary) | | aa_turning_point | boolean | Indicates AA-related turning point | true / false | | pattern_labels | array[string] | Behavioral pattern tags | e.g., ["denial", "isolation"] | | life_stage | string | Life stage of subject | childhood, adolescence, young_adult, elder |

See the included data_dictionary.csv or free preview dataset for the complete schema.
The Lived Experience Advantage This dataset is not a collection of scraped internet text. It is a curated, human-annotated corpus derived from decades of lived experience and meticulous archival work in the field of addiction recovery. The annotations reflect 56 years of sobriety and a deep understanding of behavioral patterns that are often missed by generic algorithmic labeling.
Series Context (Part 1 of 3): This release focuses exclusively on the "Pre-AA" (active alcoholism) developmental trajectory. It is designed to be paired with forthcoming parts that cover:
Part 2: Early Sobriety (intial years) — focus on structural change and crisis management. Part 3: Long-term Recovery (Decades) — focus on mentorship, resilience, and maturity. Safety & Ethics: This data was prepared in consultation with the family of the subjects and adheres to the ethical principles of anonymity and respect for the recovery process.

Listing Stats

VIEWS

2

DELIVERY

INSTANT DOWNLOAD

LISTED

13/05/2026

UPDATED

16/05/2026

REGION

NORTH AMERICA

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

Loading...

£179

Download Dataset in JSON Format