Opendatabay APP

Social Media NSFW/SFW Posts Dataset

Social Media and Networking

Tags and Keywords

Online

Communities

Text

Nlp

Categorical

Binary

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Social Media NSFW/SFW Posts Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset comprises Reddit post data, collected utilising the Reddit API. Its primary purpose is to enable the classification of Reddit posts as either Not Safe For Work (NSFW) or Safe For Work (SFW). It provides valuable context for understanding and moderating online communities, facilitating natural language processing tasks, and analysing social media content.

Columns

  • title: Represents the full text of the Reddit post's title.
  • subreddit: Indicates the specific subreddit where the post was originally published.
  • is_nsfw: A boolean tag specifying whether the post is categorised as NSFW (True) or SFW (False). The dataset contains approximately 100,477 posts tagged as 'true' for NSFW and 517,475 posts tagged as 'false' for SFW.

Distribution

Data files are typically provided in a CSV format. The dataset is structured with three distinct columns and contains approximately 617,952 individual records based on the available tag counts.

Usage

This dataset is ideally suited for a variety of applications, including:
  • Developing and training binary or categorical text classification models to identify NSFW content.
  • Conducting Natural Language Processing (NLP) research on social media text.
  • Building automated content moderation and filtering systems for online platforms.
  • Analysing trends in user-generated content and community behaviour on Reddit.

Coverage

The dataset's geographic scope is global, encompassing posts from across the Reddit platform. It was listed on 26/06/2025. The data focuses specifically on Reddit post content.

License

CC-BY

Who Can Use It

This dataset is beneficial for:
  • Data Scientists and Machine Learning Engineers: For constructing and refining content classification algorithms.
  • Natural Language Processing Researchers: For studies on text analysis, content tagging, and social media language.
  • Social Media Analysts: For gaining insights into content patterns, community standards, and user-generated data on Reddit.
  • Developers: For integrating content filtering capabilities into applications or services.

Dataset Name Suggestions

  • Reddit Content Classifier Data
  • Social Media NSFW/SFW Posts
  • Reddit Post Safety Tagging
  • Online Community Content Moderation Dataset

Attributes

Original Data Source:Reddit Post Title - NSFW or SFW

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

26/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format