Dark Mode

Home

Data Categories

AI Training Data

Pre AA Addiction Trajectory Dataset 100 JSONL Records for AI Training

Day By Day Recovery Resources

Licensed LLM Data Provider

£179

Pre AA Addiction Trajectory Dataset 100 JSONL Records for AI Training

Name: Pre AA Addiction Trajectory Dataset 100 JSONL Records for AI Training
Creator: Day By Day Recovery Resources
Published: 2026-05-13T20:31:51.827Z
License: https://docs.opendatabay.com/ai-training-and-model-development-licenses/commercial-ai-training-and-fine-tuning-data-license

Agent Simulation Data

Tags and Keywords

Behavioral-data

Decision-making

Scenario-data

Llm-training

Fine-tuning

Jsonl

Annotated-data

Labeled-dataset

Nlp

Agent-simulation

Addiction

Behavioral-patterns

Crisis

Recovery

Mental-health

Risk-assessment

Escalation

Relapse-triggers

Decision-modeling

12-step-recovery

Alcoholism

Pre AA Addiction Trajectory Dataset 100 JSONL Records for AI Training Dataset on Opendatabay data marketplace

"No reviews yet"

£179

About

This dataset provides 100 structured, scene-level behavioral records modeling the developmental trajectory of addiction, with a primary focus on alcoholism. Derived from the autobiographical materials of “Lee B.” including accounts of a tumultuous childhood and progression into severe alcohol use, each record captures a specific decision point or behavioral vignette leading up to his first AA meeting.

Records are annotated for relapse triggers, behavioral escalation stages, risk levels, and AA-related turning points, providing a detailed view of pre-recovery patterns. This narrative-driven corpus is designed for training and evaluating AI systems that model high-stress decision-making, behavioral risk, and early-stage addiction dynamics.

This dataset represents Part 1 (pre-AA) of a multi-stage series, with subsequent datasets covering recovery and long-term sobriety trajectories.

- Data Product Features Each record is a structured, scene-level JSON object representing a discrete behavioral vignette with standardized fields for modeling human decision-making and addiction trajectories.

Core Fields:

id — Unique identifier for each record (e.g., lee_0001)
title — Short human-readable scene title
scenario — Brief contextual summary (1–2 sentences) describing the situation
raw_text_redacted — Narrative passage with personally identifiable information (PII) removed

Behavioral & Risk Annotation:

risk_level — Estimated behavioral risk (low → very_high)
escalation_stage — Stage of behavioral escalation within the addiction trajectory
pattern_labels — Tagged behavioral patterns (e.g., denial, enabling, isolation)
crisis_bottom — Boolean flag indicating crisis/bottom events

Recovery Context:

aa_turning_point — Boolean flag indicating an AA-related turning point
aa_turning_point_type — Classification of turning point (e.g., crisis_bottom, initial_sobriety)

Contextual Metadata:

primary_domain — Primary behavioral category (e.g., addiction_recovery, failure_trajectory)
secondary_domains — Additional contextual tags life_stage — Life stage (childhood, adolescence, young_adult, elder)
outcome — Short description of the immediate result of the scenario

Provenance & Safety:

text_origin — Source type (original_notes, biographer_retelling, composite)
sensitive_content — Indicates presence of sensitive material pii_detected / pii_redacted — Flags for privacy handling

Attribution Fields:

compiled_authored_by — Shelly Marshall
original_narrative_source — Lee Bowerman (journals, notes, recordings)
source_work — Escaping Myself, Day By Day Recovery Resources (2022)

Format & Structure: The dataset is provided in JSONL (JSON Lines) format, with each line representing a single structured, scene-level record. Each record contains annotated behavioral, contextual, and provenance fields suitable for machine learning and agent simulation use.

Package Contents:

Primary dataset file (JSONL, 100 records)
Data dictionary (CSV) defining all fields and allowed values
Manifest file (JSON) for dataset metadata and structure
README and license documentation (MD/TXT)
Provenance and permissions documentation

Data Volume:

100 records (one JSON object per line)
30-field schema with some fields populated conditionally based on record context (see data dictionary for full field definitions and allowed values)
Multi-file dataset package designed for direct use in AI/ML pipelines, fine-tuning workflows, or structured data ingestion systems

Usage Application: Conversational AI for recovery support Training empathetic chatbots or virtual assistants for treatment centers, peer-support platforms, and recovery-focused websites.

Application: Behavioral risk and relapse modeling Modeling escalation patterns, relapse triggers, and high-risk decision points in addiction trajectories.

Application: Agent simulation and scenario training Training AI agents to recognize and respond to complex human situations involving stress, denial, crisis, and early recovery conditions.

Application: Evaluation and safety testing of AI systems Benchmarking model responses in sensitive, high-risk behavioral scenarios to improve safety, alignment, and appropriateness.

*Application: *Retrieval-Augmented Generation (RAG) systems Providing structured narrative content for knowledge grounding in recovery-oriented or behavioral health applications.

Application: Behavioral research and pattern analysis Supporting research into decision-making, failure trajectories, and recovery pathways using structured, annotated narrative data.

Coverage Geographic Coverage: United States (primarily Western U.S., including Colorado), reflecting lived experience rather than structured regional sampling. While geographically specific, behavioral patterns associated with alcoholism and recovery are broadly transferable across similar cultural contexts.

Time Range: Mid-20th century through late 20th century (based on the life trajectory of the source individual; compiled and published in 2022).

Demographics: Single-subject longitudinal narrative (male), covering childhood through adulthood across multiple life stages (childhood, adolescence, young adult). Dataset reflects individual lived experience rather than population-level sampling.c.

License Proprietary — See AI Training Rights below

AI Training Rights

AI Training Rights: Licensee is granted a non-exclusive, worldwide, and perpetual right to:

Use the Data Product to train, fine-tune, and evaluate machine learning models, including large language models.
Incorporate Data Product content into models and commercialize resulting model outputs.
Create derivative works (model weights, embeddings, etc.) for any lawful purpose.

Restrictions:

No Redistribution: The Data Product itself may not be sold, redistributed, or shared outside of licensed usage.
Non-Substitutive Use: The Data Product may not be used to reproduce substantial portions of the source work (Escaping Myself) as a market-replacement product.
Compliance: Licensee must comply with all applicable laws, including data protection and privacy regulations.

For enterprise-scale licensing or custom terms, contact Day By Day Recovery Resources.

Who Can Use It AI/ML Engineers & Data Scientists: For training, fine-tuning, and evaluating models on structured behavioral scenarios, including agent simulation and risk modeling.

Researchers (Behavioral Science, Psychology, AI Safety): For studying decision-making, addiction trajectories, and high-stress behavioral patterns using annotated narrative data.

Healthcare & Recovery Organizations: For developing and testing conversational tools, support systems, and educational resources related to addiction and early recovery.

AI Product Teams & Businesses: For building and validating AI systems that require realistic human scenario modeling, including chatbots, support agents, and decision-aware systems.

Data Dictionary

This dataset includes a complete data_dictionary.csv with full field definitions, data types, and allowed values. The schema contains 30 structured fields designed for behavioral annotation, risk mapping, provenance tracking, and AI training use. Below is a representative subset of 8 core fields for quick review.

Column Name	Data Type	Description	Possible Values/Notes
id	string	Unique record identifier	e.g., lee_0001
scenario	string	Short contextual summary of the scene	1–2 sentence description
raw_text_redacted	string	Narrative passage with PII removed	Free text
risk_level	string	Estimated behavioral risk level	low → very_high
escalation_stage	string	Stage of behavioral escalation	categorical (see full dictionary)
aa_turning_point	boolean	Indicates AA-related turning point	true / false
pattern_labels	array[string]	Behavioral pattern tags	e.g., ["denial", "isolation"]
life_stage	string	Life stage of subject	childhood, adolescence, young_adult, elder

See the included data_dictionary.csv or free preview dataset for the complete schema.

The Lived Experience Advantage This dataset is not a collection of scraped internet text. It is a curated, human-annotated corpus derived from decades of lived experience and meticulous archival work in the field of addiction recovery. The annotations reflect 56 years of sobriety and a deep understanding of behavioral patterns that are often missed by generic algorithmic labeling.

Series Context (Part 1 of 3): This release focuses exclusively on the "Pre-AA" (active alcoholism) developmental trajectory. It is designed to be paired with forthcoming parts that cover:

Part 2: Early Sobriety (intial years) — focus on structural change and crisis management. Part 3: Long-term Recovery (Decades) — focus on mentorship, resilience, and maturity. Safety & Ethics: This data was prepared in consultation with the family of the subjects and adheres to the ethical principles of anonymity and respect for the recovery process.

Listing Stats

VIEWS

DELIVERY

INSTANT DOWNLOAD

LISTED

13/05/2026

UPDATED

18/05/2026

REGION

NORTH AMERICA

QUALITY

5 / 5

£179

Download Dataset in JSON Format

Recommended Datasets

Loading recommendations...