Opendatabay APP

Suicide and Depression Text Classifier Data

Mental Health & Wellness

Tags and Keywords

Suicide

Depression

Mental

Reddit

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Suicide and Depression Text Classifier Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for detecting suicide ideation and depression within text. It was created to address the scarcity of public datasets for text classification related to suicide detection, aiming to provide a valuable resource for researchers and developers. The data comprises posts collected from the Reddit subreddits "SuicideWatch" and "depression", as well as non-suicidal posts from r/teenagers, making it highly relevant for mental health text analysis.

Columns

  • text: This column contains the textual content extracted from user posts, serving as the primary input for text classification models.
  • class: This is the target variable, indicating the classification of the post. The current version includes 'suicide' and 'non-suicide' labels, while a previous version (V13) also contained 'depression' labels.

Distribution

The dataset is provided as a CSV file, named Suicide_Detection.csv, and has a size of 166.9 MB. It contains approximately 232,074 records, with both the 'text' and 'class' columns having 100% valid entries. The 'text' column features 232,074 unique values, indicating a diverse range of content. The 'class' column has 2 unique values, reflecting the 'suicide' and 'non-suicide' classifications, with 'suicide' being the most common.

Usage

This dataset is ideally suited for building and training text classifiers aimed at detecting suicide ideation and depression. It can be used in machine learning applications to analyse social media text, identify patterns indicative of mental distress, and develop predictive models for early intervention or support systems.

Coverage

The dataset's content is sourced from Reddit posts. Posts from the "SuicideWatch" subreddit were collected from its inception on 16 December 2008 until 2 January 2021. Posts from the "depression" subreddit were collected from 1 January 2009 to 2 January 2021. Non-suicide posts were obtained from the r/teenagers subreddit. All "SuicideWatch" posts are labelled as 'suicide', and "depression" posts are labelled as 'depression'. The dataset's current version features 'suicide' and 'non-suicide' labels, while an earlier version (V13) included 'suicide', 'depression', and 'teenagers' (representing normal conversations).

License

CC BY-SA 4.0

Who Can Use It

This dataset is particularly useful for:
  • Researchers studying mental health, natural language processing, and social media analysis.
  • Developers creating AI-powered tools for mental health support, content moderation, or risk assessment.
  • Academics interested in text classification, sentiment analysis, or identifying markers of distress in online communities.

Dataset Name Suggestions

  • Suicide and Depression Text Classifier Data
  • Reddit Mental Health Post Dataset
  • Textual Suicide Ideation Detection Data
  • Online Depression & Suicide Detection Dataset

Attributes

Listing Stats

VIEWS

6

DOWNLOADS

1

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format