Opendatabay APP

Anonymised Reddit Posts for Intervention

Mental Health & Wellness

Tags and Keywords

Mental

Health

Suicidality

Reddit

C-ssrs

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Anonymised Reddit Posts for Intervention Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Unlocking the potential for targeted suicide intervention, this data product features 500 anonymised posts derived from Reddit users. Each entry is meticulously labelled using a modified Columbia Suicide Severity Rating Scale (C-SSRS) designed to assess suicidality, behaviour, and underlying mental health concerns. By enabling a clearer understanding of the severity of these issues, the data facilitates interventions that provide sustained help and support. This resource allows users to identify and refer those at highest risk for self-harm or suicide, making data analysis highly meaningful for prevention efforts.

Columns

The data includes three key fields, contained within the CSV file:
  • User: A unique string of characters serving as an anonymised identifier for each user. This ensures confidentiality while maintaining the distinct identity required for tracking.
  • Post: The textual content submitted by the user. This is provided without any revealed personal information beyond what the user typed themselves.
  • Label: An integer assigned according to the C-SSRS scale. The score ranges from 0 to 4, where a score of 4 indicates the highest risk for suicidal behaviour and 0 indicates no suicidal ideations or plans observed in the post. Entries marked 'NaN' signify that insufficient relevant signs were exhibited to warrant a formal C-SSRS label.

Distribution

The data product is distributed as a single CSV file, 500_anonymized_Reddit_users_posts_labels.csv, with a total size of 3.62 MB. It contains 500 records or rows, corresponding to 500 unique anonymised users and their respective posts. All entries across the three columns are valid, with no missing values reported initially (excluding intentional 'NaN' risk labels). The dataset contains 5 unique label values, with the 'Ideation' label making up 34% of the observed risk levels and 'Supportive' making up 22%.

Usage

Ideal applications for this data product include:
  • Temporal Analysis: Pairing the post data with timestamps (if acquired separately) to analyse the frequency and severity of depression and suicidal behaviour over time within the online community.
  • Keyword Identification: Utilising natural language processing (NLP) techniques to identify specific vocabulary or phrases that correlate highly with severe suicide risk in text responses from anonymous users.
  • Predictive Modelling: Applying machine learning algorithms to identify risk factors and predict the likelihood of suicide completions or attempts based on observed characteristics, which is invaluable for proactive prevention.
  • Intervention Strategy Development: Developing and refining more targeted interventions for at-risk populations within online social platforms.

Coverage

The scope of this data is focused solely on anonymised user-generated text content collected from the Reddit platform. The content reflects posts related to mental health, depression, and suicidality. Geographic and demographic details are not included, as the data is intentionally stripped of personal information to protect user privacy.

License

CC0 - Public Domain
The license permits copying, modification, distribution, and performance of the work, even for commercial purposes, without requiring permission.

Who Can Use It

The data is suitable for a diverse group of users:
  • Academic Researchers: Studying online mental health trends and the manifestations of suicidality in digital communities.
  • AI/ML Developers: Creating predictive models to assess suicide risk likelihood among anonymous users in real-time.
  • Mental Health Organisations: Seeking to improve proactive suicide prevention services and target high-risk individuals in online environments.
  • NLP Specialists: Identifying linguistic markers and keywords associated with severe psychological distress.

Dataset Name Suggestions

  • Reddit Suicidality Assessment
  • C-SSRS Labeled Suicide Risk Data
  • Anonymised Reddit Posts for Intervention
  • Online Mental Health Risk Data

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format