New York Cab Ride Fare Forecasting
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
New York City yellow taxi cab trip records, designed for building machine learning models to predict taxi fares. It leverages Google's BigQuery, a fully managed, low-cost analytics database, allowing users to query and analyse terabytes of data without managing infrastructure. The dataset is ideal for creating, training, and evaluating forecasting models using BigQuery Machine Learning (BQML) with minimal coding. It enables users to explore trip data, select important features, and build models that can help cab drivers identify profitable trips and reach customers more efficiently.
Columns
- trip_duration: The total duration of the taxi journey, measured in seconds.
- distance_traveled: The distance covered during the trip, measured in kilometres.
- num_of_passengers: The number of passengers in the taxi.
- fare: The base fare for the journey, in Indian Rupees (INR).
- tip: The amount the driver received in tips, in Indian Rupees (INR).
- miscellaneous_fees: Additional charges such as tolls, convenience fees, and GST, in Indian Rupees (INR).
- total_fare: The grand total for the ride, which is the target variable for prediction, in Indian Rupees (INR).
- surge_applied: A binary indicator ('Yes' or 'No') showing if surge pricing was applied to the fare.
Distribution
The dataset is distributed across three files:
train.csv
, test.csv
, and submission.csv
. The data is updated daily.Usage
This dataset is suitable for a variety of machine learning tasks, including:
- Fare Prediction: Building a linear regression model to forecast the total fare of a taxi ride.
- Data Exploration: Querying and visualising millions of taxi trips to understand travel patterns.
- Feature Engineering: Using AutoML to automatically select important features from the dataset for model training.
- Customer Location Optimisation: Creating models that help ride-sharing services like Uber and Rapido optimise driver routes to reach customers in a short time.
- Clustering Analysis: Applying k-Means or spectral clustering algorithms to identify and visualise fare patterns across different locations.
Coverage
The data covers taxi trips within New York City. The time range is not explicitly specified in the sources.
License
CC0: Public Domain
Who Can Use It
- Data Analysts: Can use BQML to create, train, and evaluate machine learning models with minimal coding.
- Machine Learning Engineers: Can build and test forecasting models for applications like fare prediction and operational efficiency.
- Students and Researchers: Can explore a large-scale public dataset to practise machine learning techniques like linear regression, clustering, and feature engineering.
Dataset Name Suggestions
- NYC Taxi Fare Prediction Challenge
- BigQuery ML Taxi Trip Analysis
- New York Cab Ride Fare Forecasting
- Predictive Modelling for Taxi Fares
Attributes
Original Data Source:New York Cab Ride Fare Forecasting