Reddit Place Discussion Archive
Reddit & Forum Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Captures the recorded history and related discussion concerning Reddit's /r/Place event. Starting from early 2022 and focusing on the period when the collaborative pixel canvas was active in April 2022, the data includes all posts and comments related to the event. It provides valuable insight into social dynamics, collaborative efforts, and sentiment shifts as millions of users participated in placing pixels and creating digital art.
Columns
The dataset contains 10 columns, primarily relating to comment data:
- type: Indicates the nature of the data entry (e.g., 'comment').
- id: The unique Base-36 identifier assigned to the specific comment.
- subreddit.id: The unique Base-36 identifier for the subreddit where the comment was posted.
- subreddit.name: The human-readable name of the originating subreddit, which is consistently 'place'.
- subreddit.nsfw: A boolean indicator noting whether the subreddit is marked as Not Safe For Work (NSFW).
- created_utc: The timestamp indicating when the comment was created, in Coordinated Universal Time.
- permalink: The direct link to the comment on the Reddit platform.
- body: The text content of the comment itself, which may include entries marked as '[removed]' or '[deleted]'.
- sentiment: An analyzed sentiment score associated with the comment, though a significant portion of values may be missing. The mean sentiment value is approximately 0.08.
- score: The comment's total score or upvote count, with a mean score of 6.46.
Distribution
The primary data is available in a single CSV file, named
the-reddit-place-dataset-comments.csv
, which has a file size of 181.14 MB. It includes approximately 931,022 total records, which correspond to the total values across essential identification fields such as id
and created_utc
. The data structure is tailored towards structured analysis, with 10 distinct attributes for each entry.Usage
This data is perfectly suited for researchers studying mass digital collaboration, social network analysis, and cultural trend detection. Use cases include:
- Investigating accusations of botting and determining if behavioural data aligns with these claims.
- Tracking how the popularity and longevity of minor, non-intrusive contributions (like the "amogus" phenomenon) change over time.
- Applying Natural Language Processing (NLP) techniques to understand player mood shifts and sentiment evolution.
- Identifying specific areas or art pieces on the canvas that generated the most discussion.
Coverage
The dataset spans a specific time frame, covering posts and comments made from January 1, 2022, up until April 4, 2022. The scope is centred around the Reddit community discussion related to the /r/Place canvas. Note that user anonymity has been preserved; therefore, individual usernames are not included to prevent targeted harassment.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Data Scientists: For developing models related to text classification and social scoring.
- NLP Specialists: To conduct sentiment analysis and topic modelling on high-volume, event-driven community text.
- Digital Historians and Sociologists: To study ephemeral online communities, collective behaviour, and internet culture phenomenon.
- Students: Ideal for projects involving social network dataset exploration and analysis.
Dataset Name Suggestions
- Reddit Place Discussion Archive
- Collaborative Pixel Canvas Comments
- r/Place Social History
- 2022 Reddit Art Movement Data
Attributes
Original Data Source: Reddit Place Discussion Archive