Social Media NSFW/SFW Posts Dataset
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset comprises Reddit post data, collected utilising the Reddit API. Its primary purpose is to enable the classification of Reddit posts as either Not Safe For Work (NSFW) or Safe For Work (SFW). It provides valuable context for understanding and moderating online communities, facilitating natural language processing tasks, and analysing social media content.
Columns
- title: Represents the full text of the Reddit post's title.
- subreddit: Indicates the specific subreddit where the post was originally published.
- is_nsfw: A boolean tag specifying whether the post is categorised as NSFW (True) or SFW (False). The dataset contains approximately 100,477 posts tagged as 'true' for NSFW and 517,475 posts tagged as 'false' for SFW.
Distribution
Data files are typically provided in a CSV format. The dataset is structured with three distinct columns and contains approximately 617,952 individual records based on the available tag counts.
Usage
This dataset is ideally suited for a variety of applications, including:
- Developing and training binary or categorical text classification models to identify NSFW content.
- Conducting Natural Language Processing (NLP) research on social media text.
- Building automated content moderation and filtering systems for online platforms.
- Analysing trends in user-generated content and community behaviour on Reddit.
Coverage
The dataset's geographic scope is global, encompassing posts from across the Reddit platform. It was listed on 26/06/2025. The data focuses specifically on Reddit post content.
License
CC-BY
Who Can Use It
This dataset is beneficial for:
- Data Scientists and Machine Learning Engineers: For constructing and refining content classification algorithms.
- Natural Language Processing Researchers: For studies on text analysis, content tagging, and social media language.
- Social Media Analysts: For gaining insights into content patterns, community standards, and user-generated data on Reddit.
- Developers: For integrating content filtering capabilities into applications or services.
Dataset Name Suggestions
- Reddit Content Classifier Data
- Social Media NSFW/SFW Posts
- Reddit Post Safety Tagging
- Online Community Content Moderation Dataset
Attributes
Original Data Source:Reddit Post Title - NSFW or SFW