Utrecht Housing / Dutch housing market
Urban Planning & Infrastructure
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The Utrecht Housing Dataset is a synthetic dataset designed for students and practitioners to learn about data science and machine learning. Derived from the Dutch housing market, it is high-quality and noise-free, making it suitable for multiple algorithms such as decision trees, linear regression, logistic regression, and neural networks. This dataset was specifically created for educational purposes and emphasises responsible AI by being accessible to learners with diverse academic backgrounds.
Dataset Features:
- id: Unique identifier for each house, ranging from 0 to 100,000 (not used in algorithms).
- zipcode: Zip code of the house's location, indicating its area. Possible values: 3520, 3525, 3800.
- lot-len: Length of the house plot in meters, ranging from 5.0 to 100.0.
- lot-width: Width of the house plot in meters, ranging from 5.0 to 100.0.
- lot-area: Total area of the house plot in square meters, derived from lot-len * lot-width.
- house-area: The living area of the house in square meters (e.g., 30.0 for small houses, 200.0 for mansions).
- garden-size: The size of the garden in square meters, with larger gardens being desirable.
- balcony: Number of balconies (common values: 0, 1, 3). x-coor: X-coordinate of the house's location (range: 2000 to 3000).
- y-coor: Y-coordinate of the house's location (range: 5000 to 6000).
- buildyear: The year the house was built (from as early as 1100 to modern times).
- bathrooms: Number of bathrooms (common values: 1, 2, or 3). Output/Target Features
- tax value: Estimated value of the house for taxation, ranging from 50,000 to 1,000,000 euros.
- Retail value: The market value of the house, also ranges from 50,000 to 1,000,000 euros.
- energy-eff: Binary indicator (0 or 1) of whether the house is energy-efficient.
- monument: Binary indicator (0 or 1) of whether the house has architectural or historical monumental value.
Usage:
The dataset is ideal for:
- Machine Learning Applications: Training and testing predictive models for tax valuation, market value, and energy efficiency.
- Feature Analysis: Exploring the relationships between housing attributes and target values.
- Educational Purposes: Teaching students about regression, classification, and feature engineering.
- Visualisation: Creating plots and graphs due to the well-structured and interpretable data.
Coverage:
The dataset provides a comprehensive representation of housing features relevant to the Dutch market, ensuring high usability for educational and experimental projects.
License:
CC0 (Public Domain)
Who Can Use It:
This dataset is designed for students, researchers, data scientists, and machine learning practitioners seeking to explore real-world applications of AI in housing markets.
How to Use It:
- Develop predictive models for tax and retail value estimation.
- Evaluate housing energy efficiency or monumental status using classification techniques.
- Explore feature importance to understand what drives housing value.
- Benchmark machine learning algorithms on a synthetic, high-quality dataset.