California 1990 Housing Census Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
California housing metrics derived from the 1990 U.S. Census data, sourced originally from the Statlib repository. The statistics are aggregated at the block group level, which constitutes the smallest geographical unit for which the Census Bureau releases sample data, typically encompassing 600 to 3,000 residents. The dataset is fundamental for studying socio-economic drivers in real estate, capturing essential variables such as median income, population size, and median housing values across various Californian locations.
Columns
The data file contains 10 columns of quantitative and categorical information:
- longitude: The geographical position measured in degrees.
- latitude: The geographical position measured in degrees.
- housing_median_age: The median age of houses within the block group.
- total_rooms: The total count of rooms across all residences in the block group.
- total_bedrooms: The total number of bedrooms across all residences in the block group (note: approximately 1% of values are missing in this column).
- population: The total number of individuals residing in the block group.
- households: The total number of households in the block group.
- median_income: The median household income recorded for the block group.
- median_house_value: The target variable, representing the median house value in the block group.
- ocean_proximity: A categorical feature describing the proximity of the block group to the ocean, with categories like <1H OCEAN (the most frequent) and INLAND.
Distribution
The dataset is presented in a tabular structure, contained within a single CSV file (
housing.csv
) measuring 1.42 MB. It consists of 10 columns and contains approximately 20,600 valid records (rows) across most metrics. This is a static dataset with an expected update frequency of 'Never'. The usability rating is high.Usage
This data product is ideally suited for machine learning applications, specifically for building predictive models to forecast median house values based on location, age, and demographic statistics. It is also excellent for exploratory data analysis concerning housing supply and demand dynamics, understanding regional income disparity effects on property pricing, and performing geographical clustering or segmentation.
Coverage
The dataset focuses exclusively on the geographical domain of California, United States, providing detailed statistics for block groups within the state. The temporal coverage is fixed, based on data collected during the 1990 California Census. Block groups represent a fine-grained spatial measurement, allowing for localised analysis.
License
CC0: Public Domain
Who Can Use It
- Students and beginners in data science who require a well-known, accessible dataset for learning regression modelling techniques.
- Real estate analysts interested in historical property value trends and geographical influences.
- Economists and social scientists researching the interplay between demographics, income, and housing affordability in California.
- Machine learning engineers focused on building and benchmarking models for continuous variable prediction.
Dataset Name Suggestions
- California 1990 Housing Census Data
- Statlib California Property Metrics
- 1990 California Housing and Demographic Data
Attributes
Original Data Source: California 1990 Housing Census Data