Opendatabay APP

Competition Shake Analytics

Data Science and Analytics

Tags and Keywords

Kaggle

Leaderboard

Ranking

Overfitting

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Competition Shake Analytics Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Provides detailed data on contestant performance shifts across major Kaggle machine learning competitions. This collection focuses on the phenomenon known as "Shake," which is the dramatic re-ranking of participants that occurs when the private test set, previously unavailable, is introduced for final model scoring. The purpose of this data is to enable analysts to evaluate the severity of model overfitting to the public leaderboard results and to investigate competitive dynamics related to strategic submission timing and risk management.

Columns

  • Team Name: The name identifying the participant or group entry.
  • Rank-private: The final achieved rank after the scores are re-calculated using the private test set.
  • Rank-public: The position of the team on the leaderboard during the main competition phase, based on the public test set score.
  • Shake: A calculated metric representing the shift in ranking, derived from the difference between the public rank and the private rank.
  • Score-private: The score obtained by the model using the final, private test set data.
  • Score-public: The score obtained by the model using the publicly visible test set data during the competition.
  • Entries: The total number of submissions associated with the specific team (observed in related data records).

Distribution

The data consists of several independent datasets, with files generally supplied in CSV format. For instance, the data frame pertaining to the Elo Merchant Category Recommendation competition contains 8 distinct columns and 4,063 valid records. The expected schedule for updates is Quarterly, reflecting the closure of high-profile competitive events.

Usage

This collection is highly suitable for several analytical applications:
  • Overfitting Assessment: Quantifying the extent to which contestants' models overfit the intermediate public leaderboard results.
  • Strategy Analysis: Studying successful approaches used by teams to mitigate the risk associated with the rank shake-up mechanism.
  • Performance Investigation: Allowing users to investigate the datasets thoroughly in search of interesting findings related to competition outcomes.
  • Educational Case Studies: Demonstrating the importance of model generalization over mere memorisation in data science contests.

Coverage

The collection includes results scraped from specific, high-stakes events held on Kaggle. The scope includes seven distinct competitions, such as:
  • Elo Merchant Category Recommendation
  • Human Protein Atlas Image Classification
  • Humpback Whale Identification
  • Microsoft Malware Prediction
  • Quora Insincere Questions Classification
  • TGS Salt Identification Challenge
  • VSB Power Line Fault Detection
The data covers competitive results from these past events, providing thematic coverage focused on machine learning contest dynamics.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists and Statisticians: For studying patterns in model generalization and competitive performance metrics.
  • Kaggle Participants: To inform future submission strategies and methods for avoiding severe rank volatility.
  • Academics: Researchers interested in quantitative analysis of competitive environments and risk management in data science.

Dataset Name Suggestions

  • Kaggle Rank Shift Records
  • Competition Shake Analytics
  • Private Leaderboard Dynamics
  • ML Competition Volatility Tracker

Attributes

Original Data Source: Competition Shake Analytics

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

11/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format