Opendatabay APP

Uber Fare Prediction Data

NLP / Natural Language Processing

Tags and Keywords

Fare

Uber

Prediction

Regression

Travel

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Uber Fare Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for predicting the fare for Uber rides, a classic regression problem. Uber, as one of the world's largest taxi companies, handles millions of transactions daily. Managing this vast amount of data is crucial for developing new business strategies and ensuring accurate fare estimations for customers. This dataset provides the necessary information to build and evaluate models for precise fare prediction.

Columns

  • key: A unique identifier for each trip.
  • fare_amount: The cost of each trip, denominated in USD.
  • pickup_datetime: The specific date and time when the taxi meter was engaged for the journey.
  • passenger_count: The number of passengers in the vehicle, recorded by the driver.
  • pickup_longitude: The geographical longitude coordinate where the taxi meter was engaged.
  • pickup_latitude: The geographical latitude coordinate where the taxi meter was engaged.
  • dropoff_longitude: The geographical longitude coordinate where the taxi meter was disengaged.
  • dropoff_latitude: The geographical latitude coordinate where the taxi meter was disengaged.

Distribution

The dataset is provided as a CSV file and is approximately 23.46 MB in size. It comprises 9 columns. While the key column indicates a range up to 55.4 million, several other columns, including fare_amount, have approximately 200,000 valid records. Most columns have no missing values, though dropoff_longitude and dropoff_latitude each have one missing entry, representing a negligible percentage (0%) of the total records. Fare amounts range from -52.00 to 499.00 USD, with a mean of 11.4 USD. Passenger counts vary from 0 to 208, with an average of 1.68 passengers per trip.

Usage

This dataset is ideal for:
  • Developing and training regression models to accurately predict Uber ride fares.
  • Evaluating the performance of machine learning models using metrics such as R2 and RMSE.
  • Gaining insights into Uber's transactional data to inform new business ideas and operational efficiencies.
  • Researching factors that influence taxi fare pricing.

Coverage

The dataset's time range spans from 1st January 2009 to 1st July 2015, capturing several years of transactional data. While specific geographic boundaries are not stated, the longitude and latitude fields cover a broad range of coordinates, reflecting the global operations of Uber. Passenger count information is available, but no further demographic scope is provided.

License

CC0: Public Domain

Who Can Use It

This dataset is suitable for:
  • Data Scientists and Machine Learning Engineers focusing on predictive modelling and regression tasks.
  • Business Intelligence Analysts seeking to understand pricing dynamics and operational data within the ride-sharing industry.
  • Academics and Researchers interested in urban mobility, transportation economics, or large-scale data analysis.
  • Developers creating applications that require fare estimation functionalities.

Dataset Name Suggestions

  • Uber Fare Prediction Data
  • Ride-Sharing Fare Estimation Dataset
  • Taxi Trip Fare Dataset
  • Uber Historical Fare Data

Attributes

Original Data Source: Uber Fare Prediction Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format