Opendatabay APP

Reddit Data Share Insights

Social Media and Networking

Tags and Keywords

Social

Networks

Nlp

Reddit

Data

Posts

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Reddit Data Share Insights Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset presents a collection of posts and comments from Reddit's /r/datasets board, providing a detailed meta-corpus of data. It includes all posts and comments from the subreddit's inception up to 1st March 2022. The data was procured by SocialGrep and has been anonymised by excluding usernames to protect user privacy and prevent targeted harassment. This resource is invaluable for analysing trends in datasets shared on Reddit, exploring user engagement metrics like post scores, and performing sentiment analysis on post titles to understand the correlation between sentiment and user reactions such as upvotes or downvotes.

Columns

The dataset is structured across two main files: the-reddit-dataset-dataset-comments.csv and the-reddit-dataset-dataset-posts.csv, with columns as follows:
Common Columns across Posts and Comments:
  • type: The type of post (String).
  • subreddit.name: The name of the subreddit (String).
  • subreddit.nsfw: Indicates whether the subreddit is NSFW (Boolean).
  • created_utc: The time the post or comment was created, in UTC (Timestamp).
  • permalink: The permanent link for the post or comment (String).
  • sentiment: The sentiment expressed in the post or comment (String).
  • score: The score of the post or comment (Integer).
Columns Specific to Comments:
  • body: The body text of the comment (String).
Columns Specific to Posts:
  • domain: The domain of the linked content in the post (String).
  • url: The URL of the post (String).
  • selftext: The self-text of the post (String).
  • title: The title of the post (String).
Other related columns observed:
  • index: An index identifier.
  • id: A unique identifier.
  • subreddit.id: The identifier for the subreddit.

Distribution

The dataset is provided in CSV file format, suitable for processing with standard text editors or data analysis tools. It includes two primary files: the-reddit-dataset-dataset-posts.csv and the-reddit-dataset-dataset-comments.csv. While specific total row or record counts for the entire dataset are not explicitly stated, observed column data suggests approximately 54,850 records for at least one of the files, such as the comments dataset. The data spans a significant period, from the subreddit's beginning until March 2022.

Usage

This dataset is ideally suited for a variety of analytical and research applications, including:
  • Analysing long-term trends in datasets shared and discussed on Reddit's /r/datasets board.
  • Calculating and comparing average scores for posts across different subreddits or over time.
  • Performing sentiment analysis on post titles to investigate the relationship between sentiment and user engagement (upvotes/downvotes).
  • Identifying correlations between various types of datasets shared on the platform.
  • Determining which datasets gain the most popularity on Reddit.
  • Analysing the overall sentiment of posts and comments within the /r/datasets community.

Coverage

The dataset offers a global scope, encompassing content from Reddit's /r/datasets subreddit. It covers a substantial time range, from the inception of the subreddit up to 1st March 2022. There are no specific demographic details included, as usernames have been removed to preserve user anonymity and prevent any form of targeted harassment, focusing purely on the content of posts and comments.

License

CC0

Who Can Use It

This dataset is valuable for:
  • Social media analysts aiming to understand community behaviour and content trends on platforms like Reddit.
  • Data scientists and researchers interested in meta-analysis of shared datasets, data popularity, or sentiment within technical communities.
  • Natural Language Processing (NLP) practitioners looking for real-world text data for sentiment analysis models or other text-based research.
  • Academics and students conducting studies on online communities, data sharing practices, or digital humanities.

Dataset Name Suggestions

  • Reddit /r/datasets Posts & Comments
  • SocialGrep Reddit Datasets Archive
  • Reddit Data Share Insights
  • Subreddit Data Trends
  • Reddit /r/datasets Collection

Attributes

Original Data Source: Reddit /r/datasets Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format