Opendatabay APP

NYC Taxi Trip Duration Prediction Data

Data Science and Analytics

Tags and Keywords

Nyc

Taxi

Duration

Trip

Transportation

Trusted By
Trusted by company1Trusted by company2Trusted by company3
NYC Taxi Trip Duration Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Provides structured data based on the 2016 NYC Yellow Cab trip records. This sampled and cleaned data product is designed for machine learning practitioners and analysts seeking to predict the precise duration of taxi journeys using various trip attributes, including timestamps, location coordinates, and vendor details.

Columns

  • id: A unique identifier assigned to each individual trip record.
  • vendor_id: A coded value identifying the provider associated with the trip.
  • pickup_datetime: The date and time when the taxi meter was initially engaged.
  • dropoff_datetime: The date and time when the taxi meter was disengaged.
  • passenger_count: The number of passengers occupying the vehicle, as entered by the driver.
  • pickup_longitude: The geographical longitude where the meter was engaged.
  • pickup_latitude: The geographical latitude where the meter was engaged.
  • dropoff_longitude: The geographical longitude where the meter was disengaged.
  • dropoff_latitude: The geographical latitude where the meter was disengaged.
  • store_and_fwd_flag: A flag (Y or N) indicating whether the trip record was stored in the vehicle's memory before transmission due to a temporary lack of server connection.
  • trip_duration: The duration of the trip, measured in seconds (this is the key target variable).

Distribution

The data is structured in a tabular format and is typically distributed as a CSV file, with an estimated file size of 200.59 MB. It contains approximately 1,458,644 individual records. All key fields possess 100% validity, with no missing values.

Usage

Ideal for building and evaluating Regression models focused on travel time prediction. It can be utilised for deep learning exercises in urban mobility, optimising routing algorithms, and understanding the impact of spatio-temporal variables on journey times. Users can evaluate model performance using metrics such as R2 and RMSE.

Coverage

The scope covers yellow taxi trips within New York City and spans a six-month period from 1 January 2016 to 1 July 2016. The data captures millions of passenger journeys across this time range.

License

CC0: Public Domain

Who Can Use It

Data scientists, machine learning engineers, and students working on predictive analytics. It is particularly valuable for those aiming to hone their skills in intermediate regression modelling and time series analysis using geolocation data, especially those focusing on transportation and logistics challenges.

Dataset Name Suggestions

  • NYC Taxi Trip Duration Prediction Data
  • 2016 NYC Yellow Cab Trip Records
  • NYC Taxi Trip Duration (January–June 2016)
  • Yellow Cab Journey Time Predictor

Attributes

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

13/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format