Opendatabay APP

Player Guessing Behavior and Retention

Data Science and Analytics

Tags and Keywords

Gameplay

Alexa

Guessing

Behavior

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Player Guessing Behavior and Retention Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This is a real-world curated collection of gameplay interactions, representing the largest known dataset derived from an Alexa number guessing game. It captures player behaviour and decision-making patterns from a simple 'Higher or Lower' task where users attempt to guess a hidden number between 1 and 100. The data represents 6,800 hours of actual gameplay, featuring approximately 380,000 individual decisions made by 14,000 players across over 50,000 games. This rich resource is essential for analysing user interaction dynamics, spotting behavioural patterns, and developing predictive machine learning models, proving valuable for both beginners and experienced practitioners.

Columns

The primary file detailing the games (games.csv) includes the following key fields:
  • gameId: A unique identifier assigned to each game played.
  • user: An anonymised unique identifier for the player.
  • startTime: The timestamp indicating when the game session began.
  • finishTime: The timestamp indicating when the game session concluded.
  • duration: The length of the game session, measured in seconds.
  • targetNum: The randomly selected number, ranging from 1 to 100, which the player aimed to guess.
  • numGuesses: The total number of guesses the player required to identify the target number.
  • guess1 through guess15: Individual recorded guesses made by the player during the game.

Distribution

The underlying data is structured across three separate files: games.csv, which provides details on all games (date, player, outcome); guesses.csv, which lists all player decisions along with statistical calculations for analysis; and repeat-player-prediction.csv, which details each player’s first game and whether they returned to play again. The files contain data relating to 50,881 unique game sessions. The data is typically supplied in CSV format.

Usage

This dataset is ideal for training models designed to predict player actions and persistence. Specific applications include:
  • Behavioral Modeling: Analysing user psychology and the efficiency of guessing strategies.
  • Retention Prediction (Beginner Challenge): Determining if a new user is likely to return for subsequent sessions.
  • Performance Analysis (Beginner Challenge): Forecasting the number of guesses a user will require to succeed.
  • Session Management (Intermediate Challenge): Predicting if a user will actively continue or prematurely end a session.
  • Next Action Prediction (Intermediate Challenge): Modelling a user's subsequent numerical guess based on history and feedback.
  • Scheduling Prediction (Expert Challenge): Forecasting the timing of the next session for regular players.

Coverage

The data captures user actions from September 21, 2019, through to January 21, 2020. The scope is limited to interactions with the voice assistant game developed by the dataset author. Only games that were completed and involved a maximum of 15 guesses are included. Although repeated guesses and guesses that conflicted with previous hints were retained, any invalid guesses (outside the 1-100 range or misunderstood by the platform) were excluded. User identity is protected through anonymisation.

License

CC BY-SA 4.0

Who Can Use It

The primary intended users are Machine Learning practitioners, ranging from academics and students needing simple projects to experienced data scientists interested in advanced behavioural analytics. Anyone focused on digital gameplay metrics, voice assistant usage, or user retention modelling will find this data valuable.

Dataset Name Suggestions

  • Alexa Higher or Lower Game Data
  • Player Guessing Behavior and Retention
  • Voice Game Player Decision Model
  • 380K Gameplay Decisions Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

15/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format