Social Media Analytics for Information Design and Data Viz
Reddit & Forum Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Analysing the intersection of public engagement and information design, these records capture a decade of discourse from the Reddit subreddits r/DataIsBeautiful and r/DataIsUgly. The collection documents how users react to various visualisations, ranging from expertly crafted charts to those deemed confusing or misleading. By tracking upvote scores, comment volumes, and specific categorisation flairs, the data provides a foundation for studying the viral nature of data-driven storytelling and the community-led critique of visual communication.
Columns
- post_id: A unique alphanumeric identifier for each specific Reddit post.
- created_at: The precise date and time the entry was published to the platform.
- title: The descriptive headline provided by the author of the post.
- link_flair_text: Subreddit-specific labels used to categorise the type or source of the data visualisation.
- score: The net upvote count reflecting community approval at the time of collection.
- num_comments: The total number of discussion replies associated with the post.
- posted_by: The unique Reddit user identifier for the author of the submission.
- image_url: A direct link to the hosted image or graphic featured in the post.
- full_link: The complete URL leading to the original discussion thread on Reddit.
- nsfw: A boolean flag indicating whether the content is marked as "Not Safe For Work" or restricted to adults.
Distribution
The records are provided in a tabular CSV format within a file named
data_is_beautiful.csv, totalling approximately 56.41 MB. The collection contains 204,000 records with exceptionally high data integrity, showing 100% validity across nearly all primary fields. It is a structured archive that receives quarterly updates and maintains a usability rating of 10.00.Usage
This resource is ideal for training machine learning models to predict social media engagement based on post titles and timing. It is well-suited for sentiment analysis to understand the communal reception of different visualisation styles. Additionally, researchers can use the archive to identify historical trends in data design or to build automated classification systems for "good" versus "bad" charts using the subreddit source as a ground truth.
Coverage
The scope is focused on digital interactions within two specific online communities over a ten-year period. Temporally, the data spans from 15 February 2012 to 13 September 2022. The demographic coverage consists of Reddit users, primarily those interested in statistics and design, with approximately 99% of the content being classified as safe for general viewing.
License
CC BY-SA 4.0
Who Can Use It
Data scientists can leverage these records to study patterns in community-driven feedback and engagement metrics. Sociologists may utilise the titles and comment counts to research the evolution of online critique and subcultural norms. Furthermore, UI/UX designers can use the curated links to find examples of highly rated or heavily criticised visualisations to inform better design practices.
Dataset Name Suggestions
- Reddit Data Visualisation: Community Sentiment and Engagement Archive
- The Beautiful and Ugly of Data: A Ten-Year Reddit Registry
- Reddit Engagement Metrics for Data Storytelling (2012–2022)
- Visualisation Critique Log: Posts from r/DataIsBeautiful and r/DataIsUgly
- Social Media Analytics for Information Design and Data Viz
Attributes
Original Data Source: Social Media Analytics for Information Design and Data Viz
Loading...
Free
Download Dataset in ZIP Format
Recommended Datasets
Loading recommendations...
