Anonymised Reddit Posts for Intervention
Mental Health & Wellness
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Unlocking the potential for targeted suicide intervention, this data product features 500 anonymised posts derived from Reddit users. Each entry is meticulously labelled using a modified Columbia Suicide Severity Rating Scale (C-SSRS) designed to assess suicidality, behaviour, and underlying mental health concerns. By enabling a clearer understanding of the severity of these issues, the data facilitates interventions that provide sustained help and support. This resource allows users to identify and refer those at highest risk for self-harm or suicide, making data analysis highly meaningful for prevention efforts.
Columns
The data includes three key fields, contained within the CSV file:
- User: A unique string of characters serving as an anonymised identifier for each user. This ensures confidentiality while maintaining the distinct identity required for tracking.
- Post: The textual content submitted by the user. This is provided without any revealed personal information beyond what the user typed themselves.
- Label: An integer assigned according to the C-SSRS scale. The score ranges from 0 to 4, where a score of 4 indicates the highest risk for suicidal behaviour and 0 indicates no suicidal ideations or plans observed in the post. Entries marked 'NaN' signify that insufficient relevant signs were exhibited to warrant a formal C-SSRS label.
Distribution
The data product is distributed as a single CSV file,
500_anonymized_Reddit_users_posts_labels.csv, with a total size of 3.62 MB. It contains 500 records or rows, corresponding to 500 unique anonymised users and their respective posts. All entries across the three columns are valid, with no missing values reported initially (excluding intentional 'NaN' risk labels). The dataset contains 5 unique label values, with the 'Ideation' label making up 34% of the observed risk levels and 'Supportive' making up 22%.Usage
Ideal applications for this data product include:
- Temporal Analysis: Pairing the post data with timestamps (if acquired separately) to analyse the frequency and severity of depression and suicidal behaviour over time within the online community.
- Keyword Identification: Utilising natural language processing (NLP) techniques to identify specific vocabulary or phrases that correlate highly with severe suicide risk in text responses from anonymous users.
- Predictive Modelling: Applying machine learning algorithms to identify risk factors and predict the likelihood of suicide completions or attempts based on observed characteristics, which is invaluable for proactive prevention.
- Intervention Strategy Development: Developing and refining more targeted interventions for at-risk populations within online social platforms.
Coverage
The scope of this data is focused solely on anonymised user-generated text content collected from the Reddit platform. The content reflects posts related to mental health, depression, and suicidality. Geographic and demographic details are not included, as the data is intentionally stripped of personal information to protect user privacy.
License
CC0 - Public Domain
The license permits copying, modification, distribution, and performance of the work, even for commercial purposes, without requiring permission.
Who Can Use It
The data is suitable for a diverse group of users:
- Academic Researchers: Studying online mental health trends and the manifestations of suicidality in digital communities.
- AI/ML Developers: Creating predictive models to assess suicide risk likelihood among anonymous users in real-time.
- Mental Health Organisations: Seeking to improve proactive suicide prevention services and target high-risk individuals in online environments.
- NLP Specialists: Identifying linguistic markers and keywords associated with severe psychological distress.
Dataset Name Suggestions
- Reddit Suicidality Assessment
- C-SSRS Labeled Suicide Risk Data
- Anonymised Reddit Posts for Intervention
- Online Mental Health Risk Data
Attributes
Original Data Source: Anonymised Reddit Posts for Intervention
Loading...
