Synthetic US Residential Listings
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of data represents 3,000 synthetic residential property listings, meticulously modelled after genuine U.S. house sales records, adopting a format similar to major real estate platforms. It serves as an excellent resource for real estate analytics, the development of machine learning models, data visualisation projects, and for practicing web scraping techniques. Each entry outlines a distinct property and includes 16 vital features typically used by investors, analysts, and estate agents. The data spans various U.S. states and cities, containing realistic values for elements such as listing price, area measurement in square feet, property specification (bedrooms/bathrooms), and housing type.
Columns
- Price: The monetary value of the listing, specified in US Dollars.
- Address, City, State, Zipcode: Location attributes formatted according to U.S. conventions.
- Bedrooms, Bathrooms, Area (Sqft): The primary physical specifications of the residence.
- Lot Size: Details regarding the dimensions of the property's plot.
- Year Built: The year the property was constructed.
- Days on Market: The length of time the property has been actively listed for sale.
- Property Type: Classification of the dwelling, such as 'Apartment' or 'Condo'.
- MLS ID: The unique identification number assigned by the Multiple Listing Service.
- Listing Agent: The estate agent or firm responsible for the sale.
- Status: The current transactional stage of the property (e.g., Sold, Pending).
- Listing URL: A mock hyperlink designed to simulate a property details page.
Distribution
The data file is typically supplied in CSV format. The structure encompasses 3,000 individual records, each detailing a unique property listing. It is built around 16 distinct features. The dataset exhibits a high degree of validity, with 100% of the entries verified and zero missing values across all key fields.
Usage
This data set is ideally suited for:
- Conducting Exploratory Data Analysis (EDA).
- Training predictive models, including regression and classification types.
- Engaging in feature engineering and data preprocessing workflows.
- Developing mock-ups for real estate dashboards and web applications.
- Gaining practical experience using data science libraries and tools like Pandas, BeautifulSoup, or Power BI.
Coverage
The geographic scope includes multiple U.S. states and cities. For instance, common locations include California (CA) and Illinois (IL), and high representation in cities such as Los Angeles and Sacramento. The temporal range for the 'Year Built' feature spans from 1950 up to 2023. The 'Days on Market' metric ranges from 1 to 120 days. The data covers several dwelling classifications, including Apartments and Condos.
License
CC0: Public Domain
Who Can Use It
The dataset is intended for a variety of users, including data scientists and machine learning engineers seeking realistic training data, real estate analysts requiring simulation inputs, and developers creating mock real estate applications. It is particularly useful for students or professionals seeking to practice real estate analysis and data manipulation techniques.
Dataset Name Suggestions
- Synthetic US Residential Listings
- Mock Zillow-Style House Sales Data
- US Real Estate Simulation Set
- 16-Feature US Home Sales Dataset
Attributes
Original Data Source: Synthetic US Residential Listings
Loading...
