Flight Delay Prediction Data
Aerospace & Aviation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to facilitate the prediction of flight delays. It provides essential information regarding scheduled departures, enabling users to build models that forecast whether a given flight will be delayed. The dataset contains 539,383 instances with eight distinct features aimed at this prediction task.
Columns
The dataset includes the following columns:
- id: A serial number for each entry.
- Airline: Specifies the commercial airline, such as Alaska Airlines (AS/ASA), American Airlines (AA/AAL), Air Canada (AC/ACA), Aeromexico (AM/AMX), Continental Airlines (CO/COA), Delta Airlines (DL/DAL), FedEx (FX/FDX), Hawaiian Airlines (HA/HAL), Northwest Airlines (NW/NWA), Polar Air Cargo (PO/PAC), Southwest Airlines (SW/SWA), United Airlines (UA/UAL), United Parcel (UPS) (5X/UPS), Virgin Atlantic (VS/VIR), VivaAerobús (VB/VIV), and WestJet (WS/WJ).
- Flight: Describes the type of aircraft or flight number.
- Airport From: Denotes the source airport for the flight. Examples include Hartsfield-Jackson Atlanta International Airport (ATL), Austin-Bergstrom International Airport (AUS), and Boston Logan International Airport (BOS).
- Airport To: Indicates the destination airport for the flight, with a similar list of represented airports as the source.
- DayOfWeek: Provides information about the scheduled day of the week for the flight.
- Time: Represents the scheduled time of the flight.
- Length: Refers to the length or duration associated with the flight.
- Delay: A binary indicator showing whether the flight experienced a delay.
Distribution
The dataset is provided in a CSV format and comprises 539,383 records. It consists of 9 columns in total. All data fields are 100% valid, with no mismatched or missing entries across the dataset.
Key distributions include:
- Airline: Southwest Airlines (WN) accounts for 17% of entries, while Delta Airlines (DL) represents 11%.
- AirportFrom and AirportTo: Hartsfield-Jackson Atlanta International Airport (ATL) is the most common for both source and destination airports, each accounting for 6% of entries, followed by Chicago O'Hare International Airport (ORD) at 5%.
- Delay: Approximately 45% of flights in the dataset are marked as delayed.
Usage
This dataset is ideal for:
- Developing and training machine learning models to predict flight delays.
- Conducting analytical studies on factors influencing airline punctuality.
- Exploring patterns and trends in flight operations and delays.
- Building predictive applications for travel planning and operational management.
Coverage
The dataset focuses on flights to and from numerous airports across the United States, including major hubs in Georgia, Texas, Massachusetts, Washington, North Carolina, Colorado, Michigan, New Jersey, Florida, Hawaii, New York, Nevada, California, Illinois, Minnesota, Louisiana, Oregon, Pennsylvania, Arizona, and Missouri. There is no specific time range or demographic scope detailed in the available information.
License
CC0: Public Domain
Who Can Use It
This dataset is particularly useful for:
- Data scientists and machine learning engineers looking to build and evaluate predictive models.
- Aviation industry analysts studying operational efficiencies and causes of delays.
- Researchers in transportation logistics and predictive analytics.
- Students and academics working on data science projects related to real-world applications.
Dataset Name Suggestions
- Flight Delay Prediction Data
- Aviation Punctuality Dataset
- Airline Operations Data
- Scheduled Flight Delay Records
Attributes
Original Data Source:Flight Delay Prediction Data