Opendatabay APP

Southern California Basic Climate Data

Data Science and Analytics

Tags and Keywords

Weather

Los

Angeles

Precipitation

Temperature

Beginner

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Southern California Basic Climate Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection of meteorological readings supports fundamental machine learning and data analysis projects, specifically focusing on precipitation prediction. It was originally compiled for an introductory DIY Machine Learning project at the Indian Institute of Technology, Guwahati, aiming to teach beginners the workflow and techniques used in real-world ML tasks. The data provides daily weather metrics suitable for classification and regression studies related to regional climate patterns.

Columns

  • STATION: The unique identification code for the measuring station.
  • NAME: The full name and location of the station (Los Angeles Downtown USC, CA US).
  • DATE: The specific calendar date of the observation.
  • PRCP: Daily precipitation volume, measured in tenths of millimeters.
  • TMAX: The maximum temperature recorded for the day, in tenths of degrees Celsius.
  • TMIN: The minimum temperature recorded for the day, in tenths of degrees Celsius.
  • TAVG: The average daily temperature, in tenths of degrees Celsius. Note: This column contains only missing values.
  • AWND: The average daily wind speed, recorded in tenths of meters per second.
  • PGTM: The time of the peak gust of wind, displayed in HHMM format (hours and minutes). Note: This column has a high percentage of missing values (97%).
  • WDF2/WDF5: The directional angle of the fastest wind recorded over 2-minute and 5-second intervals, measured in degrees.
  • WSF2/WSF5: The speed of the fastest wind recorded over 2-minute and 5-second intervals, measured in tenths of meters per second.
  • WT01, WT02, WT08, etc.: Indicators marking the presence of specific weather types, such as fog (WT01), heavy fog (WT02), or smoke/haze (WT08).

Distribution

The data is provided in a CSV file format and includes 16 distinct features. It contains 1827 valid records, representing daily observations. The total file size is approximately 228.32 kB. While most features are fully populated, two columns, TAVG and PGTM, show substantial gaps in recorded information.

Usage

Ideal applications include introductory machine learning projects, particularly those focused on classification tasks like predicting whether precipitation occurred on a given day. It is excellent for practising exploratory data analysis, visualising time series trends in temperature and wind speed, and implementing basic weather forecasting models. It is frequently used by students learning core data science principles.

Coverage

Geographically, the data is strictly limited to observations taken at the Los Angeles Downtown USC weather station in California, United States. The temporal scope covers five years of daily records, running from 1 January 2016 through to 31 December 2020. As meteorological data, there is no associated demographic scope.

Licens

CC0: Public Domain

Who Can Use It

  • Students and Beginners: Utilising the data to complete foundational assignments, learn ML project workflows, and practice basic classification.
  • Data Science Instructors: Employing the dataset as a reliable, simple example for teaching concepts like data cleaning (due to missing values in PGTM and TAVG) and feature engineering.
  • Hobbyists: Running initial tests on standard algorithms like decision trees or logistic regression for environmental prediction.

Dataset Name Suggestions

  • Los Angeles Daily Weather Metrics
  • LA Precipitation and Temperature Records 2016-2020
  • Southern California Basic Climate Data
  • Machine Learning Starter Weather Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

23/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format