Opendatabay APP

Flight Delay Prediction Data

Aerospace & Aviation

Tags and Keywords

Flight

Delay

Aviation

Prediction

Airlines

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Flight Delay Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to facilitate the prediction of flight delays. It provides essential information regarding scheduled departures, enabling users to build models that forecast whether a given flight will be delayed. The dataset contains 539,383 instances with eight distinct features aimed at this prediction task.

Columns

The dataset includes the following columns:
  • id: A serial number for each entry.
  • Airline: Specifies the commercial airline, such as Alaska Airlines (AS/ASA), American Airlines (AA/AAL), Air Canada (AC/ACA), Aeromexico (AM/AMX), Continental Airlines (CO/COA), Delta Airlines (DL/DAL), FedEx (FX/FDX), Hawaiian Airlines (HA/HAL), Northwest Airlines (NW/NWA), Polar Air Cargo (PO/PAC), Southwest Airlines (SW/SWA), United Airlines (UA/UAL), United Parcel (UPS) (5X/UPS), Virgin Atlantic (VS/VIR), VivaAerobús (VB/VIV), and WestJet (WS/WJ).
  • Flight: Describes the type of aircraft or flight number.
  • Airport From: Denotes the source airport for the flight. Examples include Hartsfield-Jackson Atlanta International Airport (ATL), Austin-Bergstrom International Airport (AUS), and Boston Logan International Airport (BOS).
  • Airport To: Indicates the destination airport for the flight, with a similar list of represented airports as the source.
  • DayOfWeek: Provides information about the scheduled day of the week for the flight.
  • Time: Represents the scheduled time of the flight.
  • Length: Refers to the length or duration associated with the flight.
  • Delay: A binary indicator showing whether the flight experienced a delay.

Distribution

The dataset is provided in a CSV format and comprises 539,383 records. It consists of 9 columns in total. All data fields are 100% valid, with no mismatched or missing entries across the dataset.
Key distributions include:
  • Airline: Southwest Airlines (WN) accounts for 17% of entries, while Delta Airlines (DL) represents 11%.
  • AirportFrom and AirportTo: Hartsfield-Jackson Atlanta International Airport (ATL) is the most common for both source and destination airports, each accounting for 6% of entries, followed by Chicago O'Hare International Airport (ORD) at 5%.
  • Delay: Approximately 45% of flights in the dataset are marked as delayed.

Usage

This dataset is ideal for:
  • Developing and training machine learning models to predict flight delays.
  • Conducting analytical studies on factors influencing airline punctuality.
  • Exploring patterns and trends in flight operations and delays.
  • Building predictive applications for travel planning and operational management.

Coverage

The dataset focuses on flights to and from numerous airports across the United States, including major hubs in Georgia, Texas, Massachusetts, Washington, North Carolina, Colorado, Michigan, New Jersey, Florida, Hawaii, New York, Nevada, California, Illinois, Minnesota, Louisiana, Oregon, Pennsylvania, Arizona, and Missouri. There is no specific time range or demographic scope detailed in the available information.

License

CC0: Public Domain

Who Can Use It

This dataset is particularly useful for:
  • Data scientists and machine learning engineers looking to build and evaluate predictive models.
  • Aviation industry analysts studying operational efficiencies and causes of delays.
  • Researchers in transportation logistics and predictive analytics.
  • Students and academics working on data science projects related to real-world applications.

Dataset Name Suggestions

  • Flight Delay Prediction Data
  • Aviation Punctuality Dataset
  • Airline Operations Data
  • Scheduled Flight Delay Records

Attributes

Original Data Source:Flight Delay Prediction Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format