Suicide and Depression Text Classifier Data
Mental Health & Wellness
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for detecting suicide ideation and depression within text. It was created to address the scarcity of public datasets for text classification related to suicide detection, aiming to provide a valuable resource for researchers and developers. The data comprises posts collected from the Reddit subreddits "SuicideWatch" and "depression", as well as non-suicidal posts from r/teenagers, making it highly relevant for mental health text analysis.
Columns
- text: This column contains the textual content extracted from user posts, serving as the primary input for text classification models.
- class: This is the target variable, indicating the classification of the post. The current version includes 'suicide' and 'non-suicide' labels, while a previous version (V13) also contained 'depression' labels.
Distribution
The dataset is provided as a CSV file, named
Suicide_Detection.csv
, and has a size of 166.9 MB. It contains approximately 232,074 records, with both the 'text' and 'class' columns having 100% valid entries. The 'text' column features 232,074 unique values, indicating a diverse range of content. The 'class' column has 2 unique values, reflecting the 'suicide' and 'non-suicide' classifications, with 'suicide' being the most common.Usage
This dataset is ideally suited for building and training text classifiers aimed at detecting suicide ideation and depression. It can be used in machine learning applications to analyse social media text, identify patterns indicative of mental distress, and develop predictive models for early intervention or support systems.
Coverage
The dataset's content is sourced from Reddit posts. Posts from the "SuicideWatch" subreddit were collected from its inception on 16 December 2008 until 2 January 2021. Posts from the "depression" subreddit were collected from 1 January 2009 to 2 January 2021. Non-suicide posts were obtained from the r/teenagers subreddit. All "SuicideWatch" posts are labelled as 'suicide', and "depression" posts are labelled as 'depression'. The dataset's current version features 'suicide' and 'non-suicide' labels, while an earlier version (V13) included 'suicide', 'depression', and 'teenagers' (representing normal conversations).
License
CC BY-SA 4.0
Who Can Use It
This dataset is particularly useful for:
- Researchers studying mental health, natural language processing, and social media analysis.
- Developers creating AI-powered tools for mental health support, content moderation, or risk assessment.
- Academics interested in text classification, sentiment analysis, or identifying markers of distress in online communities.
Dataset Name Suggestions
- Suicide and Depression Text Classifier Data
- Reddit Mental Health Post Dataset
- Textual Suicide Ideation Detection Data
- Online Depression & Suicide Detection Dataset
Attributes
Original Data Source: Suicide and Depression Text Classifier Data