Opendatabay APP

California Housing Prices for Regression

Product Reviews & Feedback

Tags and Keywords

Housing

Real

Estate

Regression

California

Modelling

Trusted By
Trusted by company1Trusted by company2Trusted by company3
California Housing Prices for Regression Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This resource provides a fundamental housing dataset specifically designed for testing, fine-tuning, and benchmarking regression models using machine learning techniques. It contains many useful features that give insight into the dynamics of real estate data.

Columns (List and describe each column found in the 'Original Data Sample'.)

The dataset contains 10 columns:
  1. longitude: A measure of how far west a house is located. A higher value indicates a location further west.
  2. latitude: A measure of how far north a house is located. A higher value indicates a location further north.
  3. housing_median_age: The median age of a house within a block. Lower numbers typically suggest a newer building, with values ranging from 1 to 52 years. The mean age is approximately 28.6 years.
  4. total_rooms: The total count of rooms within a block. Values range from 2 up to 39.3 thousand, with an average of 2.64 thousand.
  5. total_bedrooms: The total count of bedrooms within a block. Values range from 1 up to 6.45 thousand, with an average of 538.
  6. population: The total number of people residing within a block, ranging from 3 to 35.7 thousand people. The average population is 1.43 thousand.
  7. households: The total number of households, defined as a group of people residing within a home unit for a specific block. The average is 500 households.
  8. median_income: The median income for households within a block, measured in tens of thousands of US Dollars. The range spans from $5,000 (0.5) to $150,000 (15), with a mean of $38,700 (3.87).
  9. median_house_value: The target variable, representing the median house value for households within a block, measured in US Dollars. Values range from $15,000 to $500,001. The average value is $207,000.
  10. ocean_proximity: A categorical feature indicating the house's proximity to the ocean. The most common category is "<1H OCEAN" (44%), followed by INLAND (32%). There are 5 unique proximity values recorded.

Distribution

The data is provided in a single CSV file, housing.csv, with a size of 1.42 MB. It contains 10 attributes. Across most features, the dataset contains 20.6 thousand valid records. The total_bedrooms field is the only column showing missing data, accounting for 1% of the records. The usability rating is high (10.00), and the data is expected to be updated on an annual basis.

Usage

This data product is ideally suited for academic study and practical application in regression modelling. It can be used to experiment with different machine learning algorithms, conduct model validation, and perform feature engineering to predict housing values based on socioeconomic and geographic factors.

Coverage

The geographical scope of the data covers California, defined by coordinates for longitude (approximately -124 to -114) and latitude (approximately 32.5 to 42). The dataset provides block-level statistics, capturing measures of demographics (population, households) and economic factors (median income) tied to housing characteristics (age, total rooms, bedrooms, and median value).

License

CC0: Public Domain

Who Can Use It

Machine learning engineers, data scientists building predictive price models, students learning fundamental regression techniques, and researchers interested in urban planning or real estate economics.

Dataset Name Suggestions

  • California Housing Prices for Regression
  • California Real Estate Metrics
  • Housing Block Value Prediction Data

Attributes

Listing Stats

VIEWS

5

DOWNLOADS

0

LISTED

15/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format