Opendatabay APP

New York Cab Ride Fare Forecasting

Data Science and Analytics

Tags and Keywords

Taxi

Fare

Prediction

Bigquery

Regression

Trusted By
Trusted by company1Trusted by company2Trusted by company3
New York Cab Ride Fare Forecasting Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

New York City yellow taxi cab trip records, designed for building machine learning models to predict taxi fares. It leverages Google's BigQuery, a fully managed, low-cost analytics database, allowing users to query and analyse terabytes of data without managing infrastructure. The dataset is ideal for creating, training, and evaluating forecasting models using BigQuery Machine Learning (BQML) with minimal coding. It enables users to explore trip data, select important features, and build models that can help cab drivers identify profitable trips and reach customers more efficiently.

Columns

  • trip_duration: The total duration of the taxi journey, measured in seconds.
  • distance_traveled: The distance covered during the trip, measured in kilometres.
  • num_of_passengers: The number of passengers in the taxi.
  • fare: The base fare for the journey, in Indian Rupees (INR).
  • tip: The amount the driver received in tips, in Indian Rupees (INR).
  • miscellaneous_fees: Additional charges such as tolls, convenience fees, and GST, in Indian Rupees (INR).
  • total_fare: The grand total for the ride, which is the target variable for prediction, in Indian Rupees (INR).
  • surge_applied: A binary indicator ('Yes' or 'No') showing if surge pricing was applied to the fare.

Distribution

The dataset is distributed across three files: train.csv, test.csv, and submission.csv. The data is updated daily.

Usage

This dataset is suitable for a variety of machine learning tasks, including:
  • Fare Prediction: Building a linear regression model to forecast the total fare of a taxi ride.
  • Data Exploration: Querying and visualising millions of taxi trips to understand travel patterns.
  • Feature Engineering: Using AutoML to automatically select important features from the dataset for model training.
  • Customer Location Optimisation: Creating models that help ride-sharing services like Uber and Rapido optimise driver routes to reach customers in a short time.
  • Clustering Analysis: Applying k-Means or spectral clustering algorithms to identify and visualise fare patterns across different locations.

Coverage

The data covers taxi trips within New York City. The time range is not explicitly specified in the sources.

License

CC0: Public Domain

Who Can Use It

  • Data Analysts: Can use BQML to create, train, and evaluate machine learning models with minimal coding.
  • Machine Learning Engineers: Can build and test forecasting models for applications like fare prediction and operational efficiency.
  • Students and Researchers: Can explore a large-scale public dataset to practise machine learning techniques like linear regression, clustering, and feature engineering.

Dataset Name Suggestions

  • NYC Taxi Fare Prediction Challenge
  • BigQuery ML Taxi Trip Analysis
  • New York Cab Ride Fare Forecasting
  • Predictive Modelling for Taxi Fares

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format