Opendatabay APP

Library User Engagement Dataset

Knowledge & Research Collections

Tags and Keywords

Mts

Books

Users

Interactions

Reading

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Library User Engagement Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset captures detailed user-book interactions from the MTS Library, a digital service offering e-books, audiobooks, and press, part of the MTS ecosystem. MTS is a prominent mobile network operator in Russia and the CIS region. The collection includes information on 150,000 unique users, 60,000 distinct books, and a significant 1.5 million user-book interactions. The data covers a two-year period, from 1 January 2018 to 31 December 2019, and includes intentionally added random noise. All user and book identifiers have been anonymised to protect privacy. This dataset is ideal for understanding reading habits, user engagement, and developing personalised digital content experiences.

Columns

The dataset is structured across three primary files:
users.csv
  • user_id: An integer representing an anonymised user identifier.
  • age: A string indicating the user's age group (e.g., "18_24", "25_34", "65_inf"). "NaN" signifies an unknown age. This feature is derived from a model prediction.
  • sex: An integer indicating the user's sex (1 for male, 0 for female). "NaN" signifies an unknown sex. This feature is also derived from a model prediction.
items.csv
  • item_id: An integer representing an anonymised book identifier.
  • title: A string containing the title of the book.
  • genres: A string listing the genres of the book, with multiple genres separated by commas.
  • authors: A string listing the authors of the book, with multiple authors separated by commas.
  • year: A string representing the year of publication. This column is stored as a string due to the presence of uncommon values that cannot be automatically converted to an integer.
interactions.csv
  • user_id: An integer representing the user identifier involved in the interaction.
  • item_id: An integer representing the book identifier involved in the interaction.
  • progress: An integer (int8) indicating the percentage of the book read by the user.
  • rating: An integer (from 1 to 5) reflecting the rating given to the book by the user. A significant number of values in this column are missing.
  • start_date: The date when the user initiated reading the book.

Distribution

The dataset comprises 150,000 users, 60,000 books, and a total of 1.5 million user-book interactions. Out of these interactions, 285,000 include explicit user-provided ratings. The interactions.csv file has a size of 43.53 MB. Data files are typically provided in CSV format. The rating column has a high percentage of missing values, at 81%. The user_id and item_id columns in the interactions.csv file each contain 1.53 million valid records. The progress column also has 1.53 million valid records, with values ranging from 0 to 100 percent.

Usage

This dataset is well-suited for various applications, including:
  • Developing and evaluating recommender systems for digital content, specifically books and media.
  • Analysing user behaviour patterns in digital reading environments.
  • Researching book popularity trends and demographic influences on reading choices.
  • Studying the dynamics of user engagement with digital content.
  • Building predictive models for reading progress and content consumption.

Coverage

The dataset's scope encompasses user and book activity from the MTS Library service.
  • Geographic Scope: The data originates from a service primarily serving users in Russia and the Commonwealth of Independent States (CIS).
  • Time Range: Interactions were recorded over a two-year period, specifically from 1 January 2018 to 31 December 2019.
  • Demographic Scope: User demographics include age groups (e.g., 18-24, 25-34, 65 and older) and sex (male/female), though these attributes are model-derived and may contain unknown values. All user and book IDs are anonymised.

License

CC BY-NC-SA 4.0

Who Can Use It

This dataset is valuable for:
  • Data scientists and machine learning engineers focused on building and improving recommendation engines.
  • Academic researchers in fields like social science, human-computer interaction, and library science, studying digital reading habits and user psychology.
  • Business analysts and product managers interested in understanding customer engagement, content performance, and market trends within the digital publishing industry.
  • Developers creating features related to user-generated content, content discovery, and personalised experiences in digital platforms.

Dataset Name Suggestions

  • MTS Digital Reading Habits
  • Library User Engagement Dataset
  • Book Recommendation Interaction Data
  • MTS E-Reading Activity Log
  • Digital Content User Interaction

Attributes

Original Data Source: Library User Engagement Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

30/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format