Dark Mode

Home

Data Categories

Web & Social Media Data

Content Preference Recommendation Set

FREE DATASET LIBRARY

Verified Data Provider

£0

Content Preference Recommendation Set

Social Media and Posts

Tags and Keywords

Movies

Recommendation

Content

Sony

Challenge

Trusted By

Content Preference Recommendation Set Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

this collection originates from a recent competitive challenge organised by Sony, aiming to foster the creation of advanced recommendation systems. It provides the essential information to develop a system capable of suggesting the top 10 movies, adapting to individual user locations and their preferences. The data offers insights into various content characteristics, facilitating detailed analysis for crafting tailored entertainment experiences.

Columns

content_id: A unique identifier for each piece of content, featuring 48,645 distinct values. All entries are valid.
content_type: Specifies the kind of content, with 'series' accounting for 93% and 'sports' for 7%. There are 4 unique content types in total. All entries are valid.
language: Indicates the content's language, predominantly 'hindi' (49%) and 'english' (19%). The dataset includes content in 11 different languages. All entries are valid.
genre: Describes the content category, with 'drama' (47%) and 'comedy' (20%) being the most frequent. There are 22 distinct genres present. All entries are valid.
duration: Represents the length of the content, varying from 60,000 to 11,100,000 units, with an average duration of 3.53 million units. All entries are valid.
release_date: The date when the content was made available, ranging from 11 October 1990 to 31 December 2020. All entries are valid.
rating: The numerical rating given to the content, with values from 0 to 10 and an average rating of 5.04. All entries are valid.
episode_count: The number of episodes for the content, from 0 to 60, with a mean of 16.2 episodes. All entries are valid.
season_count: The number of seasons for the content, from 0 to 44, with a mean of 6.61 seasons. All entries are valid.

Distribution

The data is structured as a CSV file, named 'content.csv'. This primary data file is 2.98 MB in size and comprises 9 distinct columns. All 48.6 thousand records across these columns are valid, with no missing or mismatched entries, ensuring high data quality for analysis.

Usage

This data is well-suited for:

Developing and training recommendation systems for movies and other media content.
Building models to suggest the top 10 movies specifically tailored to user locations and preferences.
Conducting exploratory data analysis to discover patterns and relationships within media content attributes.
Machine learning research focused on content discovery and user personalisation.

Coverage

The data's geographic focus is identified by the 'India' tag, suggesting that a significant portion of the content or its target audience is associated with this region. The content release dates span from 11 October 1990 to 31 December 2020, offering a broad historical perspective. While specific demographic details are not explicitly provided, the aim to recommend based on user preferences implies applicability to diverse user segments interested in media content, further supported by content languages such as Hindi and English.

License

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Who Can Use It

Data Scientists and Machine Learning Engineers: To design and implement advanced content recommendation algorithms.
Researchers: Those engaged in data challenges, particularly within the entertainment and media sectors.
Data Analysts: For exploring trends, patterns, and insights within movie and series content.
Product Developers: Building applications that require personalised media suggestions.

Dataset Name Suggestions

Sony RISE Movie Challenge Data
Content Preference Recommendation Set
Entertainment Media Attributes Data
User-Centric Movie Recommender Data
Global Content Insights Data

Attributes

Original Data Source: Content Preference Recommendation Set

Listing Stats

VIEWS

DOWNLOADS

LISTED

12/09/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Content Preference Recommendation Set

Social Media and Posts

Tags and Keywords

Movies

Recommendation

Content

Sony

Challenge

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in ZIP Format

RECOMMENDED DATASETS