Flight Delay Predictor Dataset
Aerospace & Aviation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a detailed collection of on-time airline performance data specifically for US airports from Data Expo 2009. It allows users to investigate the factors behind flight delays or cancellations, offering insights into airline punctuality. The dataset was primarily uploaded as a valuable resource for students to practice data analysis with a substantial body of information, making it an excellent tool for educational and research purposes concerning flight predictability.
Columns
The dataset includes the following columns, each offering unique insights into airport and flight details:
- iata: A unique three-letter International Air Transport Association (IATA) code for each airport, with 3375 unique values.
- airport: The full name of the airport, featuring 3245 unique airport names.
- city: The city where the airport is situated, containing 2676 unique city entries.
- state: The US state where the airport is located, with 57 unique state codes.
- country: The country where the airport is located, with the vast majority (100%) of entries being for the USA, though a few other countries are minimally represented.
- lat: The geographical latitude coordinate of the airport, ranging from approximately 7.37 to 71.3 degrees, with a mean of 40.1.
- long: The geographical longitude coordinate of the airport, ranging from approximately -177 to 146 degrees, with a mean of -98.7.
Distribution
The data is typically provided in a CSV format, with a sample file available on the platform. The
airports.csv file itself is approximately 214.3 kB in size and contains 7 columns. It includes 3379 valid records, providing a robust tabular structure for analysis.Usage
This dataset is ideal for various analytical applications and use cases. It can be used to explore and predict flight delays and cancellations, allowing users to understand the underlying causes of disrupted air travel. It serves as an excellent resource for practicing data analysis, particularly for students in college or university settings. Furthermore, it supports exploratory data analysis, statistical analysis, and could potentially be integrated into BigQuery for large-scale data processing. The dataset has been used in discussions at events like the JSM poster session to address flight delays.
Coverage
The dataset's geographic scope is primarily focused on US airports, with the country column predominantly listing 'USA'. The latitude and longitude data cover the geographical spread of these airports across the United States. The time range for this dataset is specifically from Data Expo 2009. While it may not be up-to-date, it remains a valuable historical resource for data practice. No specific demographic scope is outlined, as it pertains to general airline performance.
License
The dataset is available under two licenses:
- CC BY 4.0
- CC0: Public Domain
Who Can Use It
This dataset is particularly suitable for:
- Students: To train and practice data analysis ideas on a substantial dataset.
- Researchers: For studies on airline performance, predictability of delays, and transportation logistics.
- Data Analysts: To perform exploratory data analysis, statistical modelling, and generate insights into airport operations and flight punctuality.
- Anyone interested in aviation data: To understand historical flight patterns and potential causes of delays.
Dataset Name Suggestions
- US Airport Performance 2009
- Airline On-Time Flight Data
- Flight Delay Predictor Dataset
- Data Expo 2009 US Airports
- American Airport Punctuality Data
Attributes
Original Data Source: Flight Delay Predictor Dataset
Loading...
