Library User Engagement Dataset
Knowledge & Research Collections
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset captures detailed user-book interactions from the MTS Library, a digital service offering e-books, audiobooks, and press, part of the MTS ecosystem. MTS is a prominent mobile network operator in Russia and the CIS region. The collection includes information on 150,000 unique users, 60,000 distinct books, and a significant 1.5 million user-book interactions. The data covers a two-year period, from 1 January 2018 to 31 December 2019, and includes intentionally added random noise. All user and book identifiers have been anonymised to protect privacy. This dataset is ideal for understanding reading habits, user engagement, and developing personalised digital content experiences.
Columns
The dataset is structured across three primary files:
users.csv
- user_id: An integer representing an anonymised user identifier.
- age: A string indicating the user's age group (e.g., "18_24", "25_34", "65_inf"). "NaN" signifies an unknown age. This feature is derived from a model prediction.
- sex: An integer indicating the user's sex (1 for male, 0 for female). "NaN" signifies an unknown sex. This feature is also derived from a model prediction.
items.csv
- item_id: An integer representing an anonymised book identifier.
- title: A string containing the title of the book.
- genres: A string listing the genres of the book, with multiple genres separated by commas.
- authors: A string listing the authors of the book, with multiple authors separated by commas.
- year: A string representing the year of publication. This column is stored as a string due to the presence of uncommon values that cannot be automatically converted to an integer.
interactions.csv
- user_id: An integer representing the user identifier involved in the interaction.
- item_id: An integer representing the book identifier involved in the interaction.
- progress: An integer (int8) indicating the percentage of the book read by the user.
- rating: An integer (from 1 to 5) reflecting the rating given to the book by the user. A significant number of values in this column are missing.
- start_date: The date when the user initiated reading the book.
Distribution
The dataset comprises 150,000 users, 60,000 books, and a total of 1.5 million user-book interactions. Out of these interactions, 285,000 include explicit user-provided ratings. The
interactions.csv
file has a size of 43.53 MB. Data files are typically provided in CSV format. The rating
column has a high percentage of missing values, at 81%. The user_id
and item_id
columns in the interactions.csv
file each contain 1.53 million valid records. The progress
column also has 1.53 million valid records, with values ranging from 0 to 100 percent.Usage
This dataset is well-suited for various applications, including:
- Developing and evaluating recommender systems for digital content, specifically books and media.
- Analysing user behaviour patterns in digital reading environments.
- Researching book popularity trends and demographic influences on reading choices.
- Studying the dynamics of user engagement with digital content.
- Building predictive models for reading progress and content consumption.
Coverage
The dataset's scope encompasses user and book activity from the MTS Library service.
- Geographic Scope: The data originates from a service primarily serving users in Russia and the Commonwealth of Independent States (CIS).
- Time Range: Interactions were recorded over a two-year period, specifically from 1 January 2018 to 31 December 2019.
- Demographic Scope: User demographics include age groups (e.g., 18-24, 25-34, 65 and older) and sex (male/female), though these attributes are model-derived and may contain unknown values. All user and book IDs are anonymised.
License
CC BY-NC-SA 4.0
Who Can Use It
This dataset is valuable for:
- Data scientists and machine learning engineers focused on building and improving recommendation engines.
- Academic researchers in fields like social science, human-computer interaction, and library science, studying digital reading habits and user psychology.
- Business analysts and product managers interested in understanding customer engagement, content performance, and market trends within the digital publishing industry.
- Developers creating features related to user-generated content, content discovery, and personalised experiences in digital platforms.
Dataset Name Suggestions
- MTS Digital Reading Habits
- Library User Engagement Dataset
- Book Recommendation Interaction Data
- MTS E-Reading Activity Log
- Digital Content User Interaction
Attributes
Original Data Source: Library User Engagement Dataset