Opendatabay APP

IMDb Co-Starring Network Data

Product Reviews & Feedback

Tags and Keywords

Bollywood

Actors

Network

Imdb

Movies

Trusted By
Trusted by company1Trusted by company2Trusted by company3
IMDb Co-Starring Network Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Explore the professional relationships among actors in the Bollywood film industry using an undirected graph model. The data maps the collaboration network of actors based on their shared appearances in Bollywood movies released from the year 2005 onwards. Connections are weighted by the frequency of co-starring, offering insight into the proximity and strength of professional bonds. This resource is an invaluable tool for analysts studying industry sociology and performing social graph simulations.

Columns

The dataset is structured as an edge list detailing actor pairings and the bond between them:
  • actor1 id (From): The unique identifier for the originating actor in the pair. This field contains 23,814 unique actor identifiers.
  • actor2 id (To): The unique identifier for the connecting actor in the pair. This field contains 30,170 unique actor identifiers.
  • number of common movies (Strength): The metric quantifying the relationship, representing the exact count of films shared by the two listed actors. The minimum strength observed is 1, and the maximum is 15.

Distribution

The data is supplied in a CSV file format named imdb_edgelist.csv, with a file size of 2.28 MB. It consists of 102,000 valid records (edges) detailing the connections between actors. The structure is based on an undirected graph, and all records are valid with no missing or mismatched entries observed.

Usage

This dataset is perfect for applying social network analysis (SNA) techniques to a real-world cultural domain. Ideal applications include:
  • Creating visualisations of central actors and collaboration clusters within Bollywood cinema.
  • Developing simulations and models of professional networks in the arts and entertainment sectors.
  • Analysing network density and connectivity changes within the Indian film industry since 2005.
  • Supporting academic research into collaboration patterns and the structure of professional relationships.

Coverage

The data scope is focused exclusively on the Bollywood genre. The time period covered includes all relevant movies released from the year 2005, inclusive of that year. Actor identifiers are sourced from IMDb datasets. The expected update frequency for this specific release is 'Never'.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For testing graph algorithms and creating network visualisations.
  • Researchers and Academics: Those studying sociology, film industry trends, or applied social graphs.
  • Students: Ideal for projects related to network theory, arts and entertainment, and data visualisations (a sample Kumu visualisation is available for the first 1,000 edges).

Dataset Name Suggestions

  1. Bollywood Actor Collaboration Network (2005+)
  2. IMDb Co-Starring Network Data
  3. Indian Cinema Actor Relationship Edgelist

Attributes

Original Data Source:IMDb Co-Starring Network Data

Listing Stats

VIEWS

4

DOWNLOADS

3

LISTED

07/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format