Dark Mode

Home

Data Categories

Medical & Healthcare Data

Suicide and Depression Text Classifier Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Suicide and Depression Text Classifier Data

Mental Health & Wellness

Tags and Keywords

Suicide

Depression

Mental

Text

Trusted By

Suicide and Depression Text Classifier Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for detecting suicide ideation and depression within text. It was created to address the scarcity of public datasets for text classification related to suicide detection, aiming to provide a valuable resource for researchers and developers. The data comprises posts collected from the Reddit subreddits "SuicideWatch" and "depression", as well as non-suicidal posts from r/teenagers, making it highly relevant for mental health text analysis.

Columns

text: This column contains the textual content extracted from user posts, serving as the primary input for text classification models.
class: This is the target variable, indicating the classification of the post. The current version includes 'suicide' and 'non-suicide' labels, while a previous version (V13) also contained 'depression' labels.

Distribution

The dataset is provided as a CSV file, named Suicide_Detection.csv, and has a size of 166.9 MB. It contains approximately 232,074 records, with both the 'text' and 'class' columns having 100% valid entries. The 'text' column features 232,074 unique values, indicating a diverse range of content. The 'class' column has 2 unique values, reflecting the 'suicide' and 'non-suicide' classifications, with 'suicide' being the most common.

Usage

This dataset is ideally suited for building and training text classifiers aimed at detecting suicide ideation and depression. It can be used in machine learning applications to analyse social media text, identify patterns indicative of mental distress, and develop predictive models for early intervention or support systems.

Coverage

The dataset's content is sourced from Reddit posts. Posts from the "SuicideWatch" subreddit were collected from its inception on 16 December 2008 until 2 January 2021. Posts from the "depression" subreddit were collected from 1 January 2009 to 2 January 2021. Non-suicide posts were obtained from the r/teenagers subreddit. All "SuicideWatch" posts are labelled as 'suicide', and "depression" posts are labelled as 'depression'. The dataset's current version features 'suicide' and 'non-suicide' labels, while an earlier version (V13) included 'suicide', 'depression', and 'teenagers' (representing normal conversations).

License

CC BY-SA 4.0

Who Can Use It

This dataset is particularly useful for:

Researchers studying mental health, natural language processing, and social media analysis.
Developers creating AI-powered tools for mental health support, content moderation, or risk assessment.
Academics interested in text classification, sentiment analysis, or identifying markers of distress in online communities.

Dataset Name Suggestions

Suicide and Depression Text Classifier Data
Reddit Mental Health Post Dataset
Textual Suicide Ideation Detection Data
Online Depression & Suicide Detection Dataset

Attributes

Original Data Source: Suicide and Depression Text Classifier Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

14/07/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...