Opendatabay APP

Twitter Mental Health Classification Data

Mental Health & Wellness

Tags and Keywords

Text

Nlp

Healthcare

Binary

Depression

Twitter

Mental

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Twitter Mental Health Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides uncleaned Twitter data, specifically filtered for English content, designed for mental health classification at the Tweet-level. It serves as a valuable resource for developing and evaluating models that identify mental health indicators from social media text. The dataset includes raw tweet text and associated user metrics. Additionally, it can be used to explore and apply data cleaning and feature extraction techniques, such as Topic Modelling Features using Latent Dirichlet Allocation (LDA) to summarise tweets into top k topics, and Emoji Sentiment Features to count positive, negative, and neutral expression emojis present in tweets.

Columns

  • post_id: The unique identification number for each Twitter post.
  • post_created: The timestamp indicating when the post was created.
  • post_text: The raw, uncleaned text content of the tweet.
  • user_id: The unique identification number for the user who posted the tweet.
  • followers: The number of followers the user had at the time of the post.
  • friends: The number of friends (accounts the user is following) the user had at the time of the post.
  • favourites: The total number of likes (favourites) the user's account has received across all their tweets.
  • statuses: The total count of statuses (tweets) posted by the user.
  • retweets: The total number of retweets received by the current tweet.
  • Label: The classification label for mental health, intended for binary classification tasks.

Distribution

The data files are typically provided in CSV format and are in an uncleaned state. While a specific total number of rows or records is not explicitly stated, the dataset contains approximately 19,102 unique post IDs and 19,488 unique user IDs. Further details on the distribution of specific metrics like followers, friends, favourites, statuses, and retweets are available within the dataset's meta-information, showing various ranges and their corresponding counts.

Usage

This dataset is ideal for:
  • Developing and testing mental health classification models using social media data.
  • Practising and demonstrating Natural Language Processing (NLP) techniques, including text analysis and feature engineering.
  • Exploring and applying data cleaning methodologies on raw social media text.
  • Implementing and evaluating Topic Modelling using algorithms like LDA.
  • Conducting sentiment analysis based on emoji usage in tweets.
  • Research in social media analytics, public health, and digital epidemiology.

Coverage

The dataset's coverage is global, with tweets specifically filtered to contain English context only. There is no specific time range for the collection period of the tweets provided, but the dataset was listed on 05/06/2025.

License

CCO

Who Can Use It

This dataset is suitable for:
  • Data scientists and machine learning engineers working on text classification and NLP projects.
  • Researchers in mental health, social sciences, and computational linguistics.
  • Students and academics learning about social media data analysis, feature engineering, and model development for health applications.
  • Healthcare professionals interested in leveraging social media for insights into mental wellness trends.

Dataset Name Suggestions

  • Twitter Mental Health Classification Data
  • English Tweets Depression Classifier
  • Social Media Mental Health Indicators
  • Tweet-Level Mental Well-being Dataset
  • Depression Prediction from Twitter

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

2

LISTED

05/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format