IMDb Lightweight Movie & Crew Archive
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Tracking the intersection of cinematic history and the professionals who shape it is simplified through this streamlined version of the Internet Movie Database (IMDb). By merging and refining original tables, including basics and principals, it provides a high-performance resource for exploring film titles and their associated crew members. This collection is specifically curated to include only movies and TV movies, ensuring a focused archive for academic research and media application development.
Columns
- ID_title: The unique identifier for each film title, also known as tconst.
- titleType: The classification of the entry, primarily categorised as movie or tvMovie.
- primaryTitle: The most commonly known title of the film.
- originalTitle: The title in its original language.
- startYear: The year the film was released, ranging from the late 19th century to 2020.
- runtimeMinutes: The total duration of the film in minutes.
- genres: The professional categorisations associated with the film, such as Drama.
- averageRating: The weighted average score provided by viewers.
- numVotes: The total number of user votes that contribute to the rating.
- ID_crew: Unique identifiers for the principal crew members, also known as nconst.
- category: The general professional role of the crew member, such as actor or director.
- job: Specific professional titles for certain crew roles where applicable.
- characters: The specific names of characters portrayed by actors in the film.
- director: The identifier for the individual who directed the film.
- writer: The identifier for the individual who wrote the film.
Distribution
The data is delivered across two primary CSV files. The movie-centric table,
df_movies.csv, has a file size of 788.81 MB and contains approximately 5.68 million valid records. The accompanying df_names.csv file provides a trimmed registry of professionals involved in these specific productions. The dataset maintains a usability score of 10.00 and is provided as a static historical record with no future updates expected.Usage
This resource is ideal for building film recommendation engines and performing network analysis on the collaborations between directors, writers, and actors. It is well-suited for longitudinal studies in the cinema industry, such as tracking the evolution of film runtimes or genre popularity over the last century. Developers can also use the data to create movie trivia applications or searchable registries for web-based media platforms.
Coverage
The temporal scope is extensive, starting in 1894 and extending through to 2020. Geographically, it reflects global film production as captured by the IMDb registry. While the dataset is broad, it is specifically restricted to entries classified as movies and TV movies, excluding other media formats. It captures approximately 5.68 million entries, ensuring an exhaustive representation of titles that have received user ratings and votes.
License
CC0: Public Domain
Who Can Use It
Data scientists can leverage these records to train machine learning models for popularity prediction and genre classification. Film historians may utilise the release dates and crew metadata to map the careers of specific industry professionals. Additionally, software developers can integrate the linked tables to populate databases for media-centric applications and websites.
Dataset Name Suggestions
- IMDb Lightweight Movie & Crew Archive
- Streamlined Cinema History: Titles and Talent
- Global Film and TV Movie Registry (1894–2020)
- IMDb Merged Titles and Crew Metadata
- Lighter IMDb Movie Database for Researchers
Attributes
Original Data Source: IMDb Lightweight Movie & Crew Archive
Loading...
Free
Download Dataset in ZIP Format
Recommended Datasets
Loading recommendations...
