IMDb Co-Starring Network Data
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Explore the professional relationships among actors in the Bollywood film industry using an undirected graph model. The data maps the collaboration network of actors based on their shared appearances in Bollywood movies released from the year 2005 onwards. Connections are weighted by the frequency of co-starring, offering insight into the proximity and strength of professional bonds. This resource is an invaluable tool for analysts studying industry sociology and performing social graph simulations.
Columns
The dataset is structured as an edge list detailing actor pairings and the bond between them:
- actor1 id (From): The unique identifier for the originating actor in the pair. This field contains 23,814 unique actor identifiers.
- actor2 id (To): The unique identifier for the connecting actor in the pair. This field contains 30,170 unique actor identifiers.
- number of common movies (Strength): The metric quantifying the relationship, representing the exact count of films shared by the two listed actors. The minimum strength observed is 1, and the maximum is 15.
Distribution
The data is supplied in a CSV file format named
imdb_edgelist.csv, with a file size of 2.28 MB. It consists of 102,000 valid records (edges) detailing the connections between actors. The structure is based on an undirected graph, and all records are valid with no missing or mismatched entries observed.Usage
This dataset is perfect for applying social network analysis (SNA) techniques to a real-world cultural domain. Ideal applications include:
- Creating visualisations of central actors and collaboration clusters within Bollywood cinema.
- Developing simulations and models of professional networks in the arts and entertainment sectors.
- Analysing network density and connectivity changes within the Indian film industry since 2005.
- Supporting academic research into collaboration patterns and the structure of professional relationships.
Coverage
The data scope is focused exclusively on the Bollywood genre. The time period covered includes all relevant movies released from the year 2005, inclusive of that year. Actor identifiers are sourced from IMDb datasets. The expected update frequency for this specific release is 'Never'.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For testing graph algorithms and creating network visualisations.
- Researchers and Academics: Those studying sociology, film industry trends, or applied social graphs.
- Students: Ideal for projects related to network theory, arts and entertainment, and data visualisations (a sample Kumu visualisation is available for the first 1,000 edges).
Dataset Name Suggestions
- Bollywood Actor Collaboration Network (2005+)
- IMDb Co-Starring Network Data
- Indian Cinema Actor Relationship Edgelist
Attributes
Original Data Source:IMDb Co-Starring Network Data
Loading...
