Reddit Data Share Insights
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset presents a collection of posts and comments from Reddit's /r/datasets board, providing a detailed meta-corpus of data. It includes all posts and comments from the subreddit's inception up to 1st March 2022. The data was procured by SocialGrep and has been anonymised by excluding usernames to protect user privacy and prevent targeted harassment. This resource is invaluable for analysing trends in datasets shared on Reddit, exploring user engagement metrics like post scores, and performing sentiment analysis on post titles to understand the correlation between sentiment and user reactions such as upvotes or downvotes.
Columns
The dataset is structured across two main files:
the-reddit-dataset-dataset-comments.csv
and the-reddit-dataset-dataset-posts.csv
, with columns as follows:Common Columns across Posts and Comments:
- type: The type of post (String).
- subreddit.name: The name of the subreddit (String).
- subreddit.nsfw: Indicates whether the subreddit is NSFW (Boolean).
- created_utc: The time the post or comment was created, in UTC (Timestamp).
- permalink: The permanent link for the post or comment (String).
- sentiment: The sentiment expressed in the post or comment (String).
- score: The score of the post or comment (Integer).
Columns Specific to Comments:
- body: The body text of the comment (String).
Columns Specific to Posts:
- domain: The domain of the linked content in the post (String).
- url: The URL of the post (String).
- selftext: The self-text of the post (String).
- title: The title of the post (String).
Other related columns observed:
- index: An index identifier.
- id: A unique identifier.
- subreddit.id: The identifier for the subreddit.
Distribution
The dataset is provided in CSV file format, suitable for processing with standard text editors or data analysis tools. It includes two primary files:
the-reddit-dataset-dataset-posts.csv
and the-reddit-dataset-dataset-comments.csv
. While specific total row or record counts for the entire dataset are not explicitly stated, observed column data suggests approximately 54,850 records for at least one of the files, such as the comments dataset. The data spans a significant period, from the subreddit's beginning until March 2022.Usage
This dataset is ideally suited for a variety of analytical and research applications, including:
- Analysing long-term trends in datasets shared and discussed on Reddit's /r/datasets board.
- Calculating and comparing average scores for posts across different subreddits or over time.
- Performing sentiment analysis on post titles to investigate the relationship between sentiment and user engagement (upvotes/downvotes).
- Identifying correlations between various types of datasets shared on the platform.
- Determining which datasets gain the most popularity on Reddit.
- Analysing the overall sentiment of posts and comments within the /r/datasets community.
Coverage
The dataset offers a global scope, encompassing content from Reddit's /r/datasets subreddit. It covers a substantial time range, from the inception of the subreddit up to 1st March 2022. There are no specific demographic details included, as usernames have been removed to preserve user anonymity and prevent any form of targeted harassment, focusing purely on the content of posts and comments.
License
CC0
Who Can Use It
This dataset is valuable for:
- Social media analysts aiming to understand community behaviour and content trends on platforms like Reddit.
- Data scientists and researchers interested in meta-analysis of shared datasets, data popularity, or sentiment within technical communities.
- Natural Language Processing (NLP) practitioners looking for real-world text data for sentiment analysis models or other text-based research.
- Academics and students conducting studies on online communities, data sharing practices, or digital humanities.
Dataset Name Suggestions
- Reddit /r/datasets Posts & Comments
- SocialGrep Reddit Datasets Archive
- Reddit Data Share Insights
- Subreddit Data Trends
- Reddit /r/datasets Collection
Attributes
Original Data Source: Reddit /r/datasets Dataset