Shadow Sightings Historical Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Explore the beloved annual tradition of Groundhog Day predictions. This dataset collects yearly weather prognostications made by groundhogs and various groundhog substitutes from locations across North America. It provides rich detail on the predictors themselves and their historical prediction records. Users can analyse prediction accuracy, examine regional differences in the custom, and track the historical outcomes which determine whether spring will arrive early or if there will be six more weeks of winter.
Columns
The data is divided into two primary files:
groundhogs.csv
- id: A unique identifier for the predictor.
- slug: A simplified, kebab-case version of the predictor's name.
- shortname / name: The short and full names of the groundhog or substitute.
- city / region / country: Geographical identifiers for the prediction location (e.g., USA or Canada).
- latitude / longitude: The geographical coordinates of the city.
- source / current_prediction: URLs providing information about the predictor or its most recent prediction.
- is_groundhog: A logical indicator specifying if the predictor is a living groundhog.
- type: A description of the animal or entity making the prediction (e.g., Groundhog, Taxidermied groundhog).
- active: A logical value indicating if the predictor was active as of 2023.
- description: Free-text detailing the history or background of the predictor.
- image: A URL linking to an image of the predictor.
- predictions_count: The total number of predictions available for this individual predictor.
predictions.csv
- id: Links back to the specific groundhog predictor.
- year: The year the prediction was made.
- shadow: A logical indicator of whether the groundhog saw its shadow (meaning six more weeks of winter).
- details: Free text offering more specific information about the prediction event.
Distribution
The data is provided in a relational structure across two files, primarily CSV format. The
groundhogs.csv file contains 75 distinct records detailing the characteristics of the predictors. The information is expected to be updated annually, corresponding to the yearly Groundhog Day tradition.Usage
This data is ideal for analysing patterns in cultural predictions and weather folklore. Applications include performing geographical statistical analysis to compare prediction efficacy by region, creating visualisations of prediction outcomes over time, and studying the popularity and distribution of the tradition across countries. It can also be used for educational purposes to demonstrate logical and categorical data handling.
Coverage
The geographic scope of the dataset is centred on North America, covering prediction locations in the USA (81% of records) and Canada (19% of records). The state of Pennsylvania is notably represented (20%). The data includes historical depth, with some predictors having up to 128 recorded predictions. It covers various types of predictors, with 96% classified as active as of 2023.
License
CC0: Public Domain
Who Can Use It
Data Scientists interested in time-series and geographic analysis of cultural phenomena. Folklorists and cultural researchers studying annual traditions and popular culture. Academics seeking real-world examples for statistical or data visualisation coursework. Weather enthusiasts tracking long-term trends in meteorological folklore.
Dataset Name Suggestions
- Annual Prognostications Register
- Shadow Sightings Historical Data
- North American Groundhog Predictor Registry
- Early Spring Forecasters
Attributes
Original Data Source: Shadow Sightings Historical Data
Loading...
