Taxi Fares Regression Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Data analyzes taxi trips within New York City to provide critical insights into the various factors that influence trip fares. This resource is fundamental for exploring the relationship between fare, distance, and geographical variables. It is specifically designed to facilitate the prediction of trip fares using advanced machine learning models like regression trees and random forests.
Columns
The dataset details 10 of the 11 original columns, all showing 100% validity:
- medallion: Contains 12,744 unique values across 50.0k valid records.
- hack_license: Contains 25,532 unique values.
- vendor_id: Categorical data with 2 unique values, predominantly CMT (51%) and VTS (49%).
- pickup_datetime: Time-series data spanning from 2013-01-01 to 2014-01-01, with a mean date of 29 June 2013.
- payment_type: Categorical data, with CRD (Credit Card) representing 54% and CSH (Cash) representing 45% of transactions.
- fare_amount: The calculated fare, with a mean value of 12.4 and a maximum recorded value of 2.07k.
- surcharge: The applied surcharge amount, with a mean of 0.32 and a maximum of 1.5.
- mta_tax: The Metropolitan Transportation Authority tax, consistently near 0.5.
- tip_amount: The tip left by the passenger, with a mean of 1.38 and a maximum of 62.
- tolls_amount: The amount paid in tolls, averaging 0.26 and reaching a maximum of 17.3.
Distribution
The information is stored in a CSV file named
fares.csv, which has a size of 5.7 MB. The dataset contains 50.0k valid records across all listed columns, which show 100% data validity.Usage
This resource is ideal for fitting regression trees and random forests to predict trip fares. Users can analyze how variables such as pickup location, time of day, day of the week, and month influence pricing. It supports comparing the performance of different prediction methods and identifying the most important fare predictors. It can also be used to visualize predicted fares and explore how they vary across the city.
Coverage
The data captures taxi trips specifically within New York City (NYC). The time range extends across the full calendar year 2013, running from 1 January 2013 to 1 January 2014. The material is static and is not expected to receive future updates.
License
CC0: Public Domain
Who Can Use It
The dataset is intended for analysts and data scientists studying urban mobility and pricing models. It is suitable for those looking to visualize the spatial distribution of trip origins and explore how fares vary across the city, as well as the relationship between fare and distance. The material has a high usability rating of 10.00.
Dataset Name Suggestions
- NYC Taxi Fare Prediction Data
- New York City Trip Fares Analysis
- Taxi Fares Regression Dataset
Attributes
Original Data Source: Taxi Fares Regression Dataset
Loading...
