Opendatabay APP

NYC Taxi Fare Prediction Dataset

Data Science and Analytics

Tags and Keywords

Taxi

Fare

Prediction

Bigquery

Ml

Trusted By
Trusted by company1Trusted by company2Trusted by company3
NYC Taxi Fare Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for predicting taxi trip fares in New York City. It contains information related to millions of New York City yellow taxi cab trips, allowing data analysts to create, train, evaluate, and predict with machine learning models using BigQuery Machine Learning (BQML) with minimal coding. The ultimate goal is to help cab drivers easily identify fare trips in their respective locations and reach customers efficiently.

Columns

  • trip_duration: Indicates how long the journey lasted, measured in Seconds.
  • distance_traveled: Shows how far the taxi travelled, measured in Km.
  • num_of_passengers: Records the number of passengers in the taxi.
  • fare: Represents the base fare for the journey, in INR.
  • tip: Details how much the driver received in tips, in INR.
  • miscellaneous_fees: Accounts for any additional charges during the trip, such as tolls, convenience fees, or GST, in INR.
  • total_fare: The grand total for the ride, in INR, which is the prediction target for models.
  • surge_applied: A boolean indicator (Yes or No) if surge pricing was applied.

Distribution

The dataset is typically provided in CSV format, with data files including submission.csv, test.csv, and train.csv. The sizes of these files are approximately:
  • submission.csv: 359.46 kB
  • test.csv: 3.05 MB
  • train.csv: 9.02 MB The total size for Version 2 is around 12.43 MB. While specific row counts are not detailed, the dataset contains millions of New York City yellow taxi cab trips and is structured for training and testing machine learning models. The dataset is expected to be updated daily.

Usage

This dataset is ideal for:
  • Creating, training, evaluating, and making predictions with machine learning models.
  • Forecasting numeric values such as taxi fares using Linear Regression (linear_reg) models within BQML.
  • Binary or Multiclass Classification tasks (e.g., spam detection) using Logistic Regression (logistic_reg), though not the primary focus for fare prediction.
  • Unsupervised learning for exploration, utilising k-Means Clustering (kmeans).
  • Querying and exploring large public taxi cab datasets efficiently.
  • Building forecasting models that can assist cab drivers (e.g., Uber, Rapido) in identifying trip fares and optimising routes for quicker customer reach.
  • Visualising fare trip prices for better insights.
  • Applying AutoML for automatically selecting important features and models.

Coverage

The dataset focuses on New York City and comprises trips from yellow taxi cabs. It includes millions of trip records. The data is part of a BigQuery Public Dataset.

License

CC0: Public Domain

Who Can Use It

  • Data analysts looking to build and deploy machine learning models with minimal coding.
  • Machine learning engineers and data scientists interested in regression, classification, or clustering tasks.
  • Cab drivers or ride-sharing companies (e.g., Uber, Rapido) seeking insights into fare predictions and operational efficiency.
  • Anyone interested in data analytics and exploring large-scale public datasets.

Dataset Name Suggestions

  • NYC Taxi Fare Prediction Dataset
  • BigQuery ML Yellow Cab Trips
  • New York City Taxi Fare Analytics
  • Taxi Ride Pricing Model Data
  • BQML Taxi Fare Forecast

Attributes

Original Data Source: NYC Taxi Fare Prediction Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

13/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format