Russian Media Plot Summaries Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides descriptions of plots for films, animated films, animated series, and television series in Russian. It is designed for various applications, including semantic search and other uses where understanding media content through its narrative is crucial. The dataset encompasses a wide range of entertainment and media consumption content.
Columns
- title: The primary title of the film, animated film, animated series, or television series.
- type: Categorises the media content. Common types include 'film', 'animated film', 'animated series', and 'tv series'. For instance, approximately 74% of entries are 'film' and 9% are 'tv series'.
- genre: Specifies the genre of the content. Examples include 'драма' (drama) and 'комедия' (comedy), with drama accounting for around 11% and comedy for 6% of entries.
- imdb_rating: The IMDb rating associated with the content. Ratings range from 0.00 to 9.60, with a notable concentration between 5.76 and 7.20.
- summary: A concise summary of the plot. Approximately 8% of entries may not have a summary available.
- plot: A more detailed description of the plot. This field contains unique textual narratives for each entry. For example, one plot describes a classic love triangle based on D. H. Lawrence's novel "Lady Chatterley's Lover".
Distribution
The dataset is typically provided as a CSV file. It contains approximately 44,105 unique records, with a corresponding number of unique titles and plot descriptions. Specific file size details are not provided, but the structure is record-based, with each record detailing a single media item.
Usage
This dataset is ideal for various applications, including:
- Developing and training natural language processing (NLP) models.
- Enabling semantic search functionalities for movies, TV shows, and other media.
- Building recommendation systems based on content plot similarity.
- Conducting research and analysis on narrative structures and thematic elements in Russian media.
- Creating AI and machine learning applications that require textual understanding of plots.
Coverage
The dataset's coverage is global, though the plot descriptions are exclusively in Russian. It includes content spanning various media types such as films, cartoons, anime, and TV series. While a wide range of content is covered, it is noted that not all entries within the dataset may contain a plot description. The listing date for this dataset is 24/06/2025.
License
CC-BY-SA
Who Can Use It
This dataset is suitable for:
- Data Scientists and Machine Learning Engineers: For training models related to text analysis, content classification, and recommendation systems.
- Researchers: Studying linguistic patterns, narrative structures, or cultural representation in Russian media.
- Developers: Building applications that require semantic search, content understanding, or automated summarisation.
- Content Creators: Seeking inspiration or insights from a large corpus of plot summaries.
Dataset Name Suggestions
- Russian Film and TV Plot Descriptions
- Wikipedia Movie Plots (Russian Language)
- Russian Media Plot Summaries Dataset
- Cyrillic Film & Series Plot Collection
Attributes
Original Data Source: Movie plots from Wikipedia in Russian