Southern California Basic Climate Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of meteorological readings supports fundamental machine learning and data analysis projects, specifically focusing on precipitation prediction. It was originally compiled for an introductory DIY Machine Learning project at the Indian Institute of Technology, Guwahati, aiming to teach beginners the workflow and techniques used in real-world ML tasks. The data provides daily weather metrics suitable for classification and regression studies related to regional climate patterns.
Columns
- STATION: The unique identification code for the measuring station.
- NAME: The full name and location of the station (Los Angeles Downtown USC, CA US).
- DATE: The specific calendar date of the observation.
- PRCP: Daily precipitation volume, measured in tenths of millimeters.
- TMAX: The maximum temperature recorded for the day, in tenths of degrees Celsius.
- TMIN: The minimum temperature recorded for the day, in tenths of degrees Celsius.
- TAVG: The average daily temperature, in tenths of degrees Celsius. Note: This column contains only missing values.
- AWND: The average daily wind speed, recorded in tenths of meters per second.
- PGTM: The time of the peak gust of wind, displayed in HHMM format (hours and minutes). Note: This column has a high percentage of missing values (97%).
- WDF2/WDF5: The directional angle of the fastest wind recorded over 2-minute and 5-second intervals, measured in degrees.
- WSF2/WSF5: The speed of the fastest wind recorded over 2-minute and 5-second intervals, measured in tenths of meters per second.
- WT01, WT02, WT08, etc.: Indicators marking the presence of specific weather types, such as fog (WT01), heavy fog (WT02), or smoke/haze (WT08).
Distribution
The data is provided in a CSV file format and includes 16 distinct features. It contains 1827 valid records, representing daily observations. The total file size is approximately 228.32 kB. While most features are fully populated, two columns, TAVG and PGTM, show substantial gaps in recorded information.
Usage
Ideal applications include introductory machine learning projects, particularly those focused on classification tasks like predicting whether precipitation occurred on a given day. It is excellent for practising exploratory data analysis, visualising time series trends in temperature and wind speed, and implementing basic weather forecasting models. It is frequently used by students learning core data science principles.
Coverage
Geographically, the data is strictly limited to observations taken at the Los Angeles Downtown USC weather station in California, United States. The temporal scope covers five years of daily records, running from 1 January 2016 through to 31 December 2020. As meteorological data, there is no associated demographic scope.
Licens
CC0: Public Domain
Who Can Use It
- Students and Beginners: Utilising the data to complete foundational assignments, learn ML project workflows, and practice basic classification.
- Data Science Instructors: Employing the dataset as a reliable, simple example for teaching concepts like data cleaning (due to missing values in PGTM and TAVG) and feature engineering.
- Hobbyists: Running initial tests on standard algorithms like decision trees or logistic regression for environmental prediction.
Dataset Name Suggestions
- Los Angeles Daily Weather Metrics
- LA Precipitation and Temperature Records 2016-2020
- Southern California Basic Climate Data
- Machine Learning Starter Weather Dataset
Attributes
Original Data Source: Southern California Basic Climate Data
Loading...
