Data Is Beautiful Reddit Posts Archive
Reddit & Forum Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains information regarding post submissions from the "Data is Beautiful" community on Reddit. It offers insights into the characteristics of posts that visualise data, providing a foundation for analysis of content, engagement, and community dynamics within this specific online community. The data was extracted using the PushShift API for Reddit.
Columns
- id: A unique identifier for each Reddit post submission.
- title: The title of the Reddit post.
- score: The score attributed to the Reddit post, reflecting its upvotes and downvotes.
- author: The username of the author who submitted the post.
- author_flair_text: The flair text associated with the author's profile.
- removed_by: Indicates if and by whom the post was removed (e.g., a moderator).
- total_awards_received: The total number of awards received by the post.
- awarders: Details about the awards received.
- created_utc: The UTC timestamp indicating when the post was created.
- full_link: The direct link to the Reddit post.
- num_comments: The total number of comments on the post.
- over_18: A boolean value indicating if the content of the post is flagged as NSFW (Not Safe For Work).
Distribution
The dataset is provided as a CSV file, named
r_dataisbeautiful_posts.csv
, with a size of 40.95 MB. It comprises 12 columns and approximately 191,000 unique records or rows, each representing a distinct Reddit post submission.Usage
This dataset is ideal for researchers, data scientists, and analysts interested in social media trends, online community behaviour, and the performance of data visualisations. Potential applications include analysing popular content, identifying engagement patterns, studying author contributions, or examining the lifespan and removal reasons of posts within the "Data is Beautiful" subreddit.
Coverage
The dataset covers posts from the Reddit "Data is Beautiful" community over a time range from February 2012 to February 2021. There are no specific geographic or demographic details beyond the general scope of Reddit's global user base and the characteristics of the "Data is Beautiful" community members.
Licence
CC BY-SA 4.0
Who Can Use It
- Data Analysts: To understand trends in data visualisation content.
- Social Scientists: To study online community interactions and content moderation.
- Researchers: For academic studies on user engagement and information dissemination on social platforms.
- Developers: To build applications that analyse or categorise Reddit post data.
- Marketing Professionals: To identify popular content formats and topics within data-focused communities.
Dataset Name Suggestions
- Reddit DataIsBeautiful Posts
- DataIsBeautiful Community Submissions
- Reddit Visualisation Post Metrics
- DataIsBeautiful Reddit Posts Archive
Attributes
Original Data Source: Data Is Beautiful Reddit Posts Archive