Opendatabay APP

Taxi Fares Regression Dataset

Data Science and Analytics

Tags and Keywords

Travel

Fares

Taxi

Prediction

Nyc

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Taxi Fares Regression Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Data analyzes taxi trips within New York City to provide critical insights into the various factors that influence trip fares. This resource is fundamental for exploring the relationship between fare, distance, and geographical variables. It is specifically designed to facilitate the prediction of trip fares using advanced machine learning models like regression trees and random forests.

Columns

The dataset details 10 of the 11 original columns, all showing 100% validity:
  • medallion: Contains 12,744 unique values across 50.0k valid records.
  • hack_license: Contains 25,532 unique values.
  • vendor_id: Categorical data with 2 unique values, predominantly CMT (51%) and VTS (49%).
  • pickup_datetime: Time-series data spanning from 2013-01-01 to 2014-01-01, with a mean date of 29 June 2013.
  • payment_type: Categorical data, with CRD (Credit Card) representing 54% and CSH (Cash) representing 45% of transactions.
  • fare_amount: The calculated fare, with a mean value of 12.4 and a maximum recorded value of 2.07k.
  • surcharge: The applied surcharge amount, with a mean of 0.32 and a maximum of 1.5.
  • mta_tax: The Metropolitan Transportation Authority tax, consistently near 0.5.
  • tip_amount: The tip left by the passenger, with a mean of 1.38 and a maximum of 62.
  • tolls_amount: The amount paid in tolls, averaging 0.26 and reaching a maximum of 17.3.

Distribution

The information is stored in a CSV file named fares.csv, which has a size of 5.7 MB. The dataset contains 50.0k valid records across all listed columns, which show 100% data validity.

Usage

This resource is ideal for fitting regression trees and random forests to predict trip fares. Users can analyze how variables such as pickup location, time of day, day of the week, and month influence pricing. It supports comparing the performance of different prediction methods and identifying the most important fare predictors. It can also be used to visualize predicted fares and explore how they vary across the city.

Coverage

The data captures taxi trips specifically within New York City (NYC). The time range extends across the full calendar year 2013, running from 1 January 2013 to 1 January 2014. The material is static and is not expected to receive future updates.

License

CC0: Public Domain

Who Can Use It

The dataset is intended for analysts and data scientists studying urban mobility and pricing models. It is suitable for those looking to visualize the spatial distribution of trip origins and explore how fares vary across the city, as well as the relationship between fare and distance. The material has a high usability rating of 10.00.

Dataset Name Suggestions

  • NYC Taxi Fare Prediction Data
  • New York City Trip Fares Analysis
  • Taxi Fares Regression Dataset

Attributes

Original Data Source: Taxi Fares Regression Dataset

Listing Stats

VIEWS

6

DOWNLOADS

2

LISTED

17/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format