Dark Mode

Home

Data Categories

AI & ML Data

NYC Taxi Fare Prediction Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

NYC Taxi Fare Prediction Dataset

Data Science and Analytics

Tags and Keywords

Taxi

Fare

Prediction

Bigquery

Ml

Trusted By

NYC Taxi Fare Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for predicting taxi trip fares in New York City. It contains information related to millions of New York City yellow taxi cab trips, allowing data analysts to create, train, evaluate, and predict with machine learning models using BigQuery Machine Learning (BQML) with minimal coding. The ultimate goal is to help cab drivers easily identify fare trips in their respective locations and reach customers efficiently.

Columns

trip_duration: Indicates how long the journey lasted, measured in Seconds.
distance_traveled: Shows how far the taxi travelled, measured in Km.
num_of_passengers: Records the number of passengers in the taxi.
fare: Represents the base fare for the journey, in INR.
tip: Details how much the driver received in tips, in INR.
miscellaneous_fees: Accounts for any additional charges during the trip, such as tolls, convenience fees, or GST, in INR.
total_fare: The grand total for the ride, in INR, which is the prediction target for models.
surge_applied: A boolean indicator (Yes or No) if surge pricing was applied.

Distribution

The dataset is typically provided in CSV format, with data files including submission.csv, test.csv, and train.csv. The sizes of these files are approximately:

submission.csv: 359.46 kB
test.csv: 3.05 MB
train.csv: 9.02 MB The total size for Version 2 is around 12.43 MB. While specific row counts are not detailed, the dataset contains millions of New York City yellow taxi cab trips and is structured for training and testing machine learning models. The dataset is expected to be updated daily.

Usage

This dataset is ideal for:

Creating, training, evaluating, and making predictions with machine learning models.
Forecasting numeric values such as taxi fares using Linear Regression (linear_reg) models within BQML.
Binary or Multiclass Classification tasks (e.g., spam detection) using Logistic Regression (logistic_reg), though not the primary focus for fare prediction.
Unsupervised learning for exploration, utilising k-Means Clustering (kmeans).
Querying and exploring large public taxi cab datasets efficiently.
Building forecasting models that can assist cab drivers (e.g., Uber, Rapido) in identifying trip fares and optimising routes for quicker customer reach.
Visualising fare trip prices for better insights.
Applying AutoML for automatically selecting important features and models.

Coverage

The dataset focuses on New York City and comprises trips from yellow taxi cabs. It includes millions of trip records. The data is part of a BigQuery Public Dataset.

License

CC0: Public Domain

Who Can Use It

Data analysts looking to build and deploy machine learning models with minimal coding.
Machine learning engineers and data scientists interested in regression, classification, or clustering tasks.
Cab drivers or ride-sharing companies (e.g., Uber, Rapido) seeking insights into fare predictions and operational efficiency.
Anyone interested in data analytics and exploring large-scale public datasets.

Dataset Name Suggestions

NYC Taxi Fare Prediction Dataset
BigQuery ML Yellow Cab Trips
New York City Taxi Fare Analytics
Taxi Ride Pricing Model Data
BQML Taxi Fare Forecast

Attributes

Original Data Source: NYC Taxi Fare Prediction Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

13/08/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

NYC Taxi Fare Prediction Dataset

Data Science and Analytics

Tags and Keywords

Taxi

Fare

Prediction

Bigquery

Ml

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in ZIP Format

RECOMMENDED DATASETS