Dark Mode

Home

Data Categories

Web & Social Media Data

Social Media Analytics for Information Design and Data Viz

FREE DATASET LIBRARY

Verified Data Provider

£0

Social Media Analytics for Information Design and Data Viz

Reddit & Forum Data

Tags and Keywords

Visualisation

Engagement

Analytics

Social

Trusted By

Social Media Analytics for Information Design and Data Viz Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Analysing the intersection of public engagement and information design, these records capture a decade of discourse from the Reddit subreddits r/DataIsBeautiful and r/DataIsUgly. The collection documents how users react to various visualisations, ranging from expertly crafted charts to those deemed confusing or misleading. By tracking upvote scores, comment volumes, and specific categorisation flairs, the data provides a foundation for studying the viral nature of data-driven storytelling and the community-led critique of visual communication.

Columns

post_id: A unique alphanumeric identifier for each specific Reddit post.
created_at: The precise date and time the entry was published to the platform.
title: The descriptive headline provided by the author of the post.
link_flair_text: Subreddit-specific labels used to categorise the type or source of the data visualisation.
score: The net upvote count reflecting community approval at the time of collection.
num_comments: The total number of discussion replies associated with the post.
posted_by: The unique Reddit user identifier for the author of the submission.
image_url: A direct link to the hosted image or graphic featured in the post.
full_link: The complete URL leading to the original discussion thread on Reddit.
nsfw: A boolean flag indicating whether the content is marked as "Not Safe For Work" or restricted to adults.

Distribution

The records are provided in a tabular CSV format within a file named data_is_beautiful.csv, totalling approximately 56.41 MB. The collection contains 204,000 records with exceptionally high data integrity, showing 100% validity across nearly all primary fields. It is a structured archive that receives quarterly updates and maintains a usability rating of 10.00.

Usage

This resource is ideal for training machine learning models to predict social media engagement based on post titles and timing. It is well-suited for sentiment analysis to understand the communal reception of different visualisation styles. Additionally, researchers can use the archive to identify historical trends in data design or to build automated classification systems for "good" versus "bad" charts using the subreddit source as a ground truth.

Coverage

The scope is focused on digital interactions within two specific online communities over a ten-year period. Temporally, the data spans from 15 February 2012 to 13 September 2022. The demographic coverage consists of Reddit users, primarily those interested in statistics and design, with approximately 99% of the content being classified as safe for general viewing.

License

CC BY-SA 4.0

Who Can Use It

Data scientists can leverage these records to study patterns in community-driven feedback and engagement metrics. Sociologists may utilise the titles and comment counts to research the evolution of online critique and subcultural norms. Furthermore, UI/UX designers can use the curated links to find examples of highly rated or heavily criticised visualisations to inform better design practices.

Dataset Name Suggestions

Reddit Data Visualisation: Community Sentiment and Engagement Archive
The Beautiful and Ugly of Data: A Ten-Year Reddit Registry
Reddit Engagement Metrics for Data Storytelling (2012–2022)
Visualisation Critique Log: Posts from r/DataIsBeautiful and r/DataIsUgly
Social Media Analytics for Information Design and Data Viz

Attributes

Original Data Source: Social Media Analytics for Information Design and Data Viz

Listing Stats

VIEWS

DOWNLOADS

LISTED

27/12/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format

Recommended Datasets

Loading recommendations...