Weather Prediction Model Training Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset features synthetic weather data crafted for ten major US cities, including New York, Los Angeles, Chicago, Houston, Phoenix, Philadelphia, San Antonio, San Diego, Dallas, and San Jose. It provides detailed information on key weather parameters: temperature, humidity, precipitation, and wind speed. With one million data points generated for each parameter, the dataset is designed to simulate realistic weather patterns, incorporating seasonal variations such as higher temperatures and increased precipitation during summer months in New York, and conversely, lower temperatures and more precipitation in Phoenix during winter. The data was generated using Python's Faker library to ensure a diverse and credible simulation for various analytical applications.
Columns
- Location: The specific city for which the weather data was simulated. This column contains 10 unique city names, with Phoenix and Chicago each representing 10% of the data, and the remaining 80% distributed among other locations.
- Date_Time: The precise date and time when the weather data point was recorded. The data spans from 1st January 2024 to 19th May 2024.
- Temperature_C: The temperature recorded in Celsius at the given location and time. Values range from -20°C to 40°C, with an average of 14.8°C.
- Humidity_pct: The humidity level expressed as a percentage. Humidity values range from 30% to 90%, with an average of 60%.
- Precipitation_mm: The amount of precipitation recorded in millimetres. Values range from 0mm to 15mm, with an average of 5.11mm.
- Wind_Speed_kmh: The speed of the wind recorded in kilometres per hour. Wind speeds range from 0 km/h to 30 km/h, with an average of 15 km/h. All columns are fully populated and contain no missing or mismatched data.
Distribution
The dataset is provided as a CSV file named
weather_data.csv
, with a file size of 104.11 MB. It is structured in a tabular format and contains 1 million records, each with 6 distinct columns.Usage
This dataset is ideally suited for a variety of analytical and educational purposes:
- Weather Prediction Models: Researchers and data scientists can utilise this data to develop, train, and validate sophisticated weather prediction models for different geographical areas.
- Climate Studies: It serves as a valuable resource for climate studies, enabling the analysis and understanding of weather patterns and long-term trends across various regions.
- Educational Purposes: Students and educators can leverage this dataset to gain practical experience in data analysis, visualisation techniques, and data modelling within the context of weather information.
Coverage
- Geographic Scope: The dataset covers ten major cities across the United States: New York, Los Angeles, Chicago, Houston, Phoenix, Philadelphia, San Antonio, San Diego, Dallas, and San Jose.
- Time Range: The data encompasses a period from 1st January 2024 to 19th May 2024.
- Data Availability Notes: The dataset incorporates realistic seasonal variations, simulating higher temperatures and precipitation during summer in locations like New York, and conversely, lower temperatures and increased precipitation during winter in cities such as Phoenix.
License
CC0: Public Domain
Who Can Use It
This dataset is designed for use by:
- Researchers
- Data Scientists
- Students
- Educators They can use it for tasks such as developing predictive models, conducting climate analysis, and engaging in practical data science learning.
Dataset Name Suggestions
- US City Synthetic Weather Data
- Simulated Urban Weather Patterns
- Faker Weather Dataset for US Cities
- North American Urban Climate Simulation
- Weather Prediction Model Training Data
Attributes
Original Data Source:Weather Prediction Model Training Data