River Water Oxygen Prediction
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset focuses on predicting dissolved oxygen (O2) levels in river water, utilising data collected from a state water monitoring system. It comprises five key indicators of river water quality across eight consecutive monitoring stations. The primary objective is to forecast the dissolved oxygen value at the eighth station based on readings from the preceding seven upstream stations. Data are provided as average monthly values, with observation periods varying from approximately 4 to 20 years across different stations. The dataset is structured to facilitate analysis of water quality trends and to support the development of predictive models.
Columns
The dataset includes 36 columns, with detailed monthly averaged data from multiple stations:
- Id: A unique identifier for each monthly averaged data entry.
- target: The monthly averaged dissolved oxygen (O2) value at the designated target station, measured in mgO2/cubic decimetre. This is the value intended for prediction.
- O2_X (where X is 1-7): Monthly averaged dissolved oxygen levels from stations 1 through 7, measured in mgO2/cubic decimetre. These stations are located upstream from the target station, with station 1 being the closest.
- NH4_X (where X is 1-7): Monthly averaged concentrations of ammonium ions (NH4) from stations 1 through 7, measured in milligrams per cubic decimetre.
- NO2_X (where X is 1-7): Monthly averaged concentrations of nitrite ions (NO2) from stations 1 through 7, measured in milligrams per cubic decimetre.
- NO3_X (where X is 1-7): Monthly averaged concentrations of nitrate ions (NO3) from stations 1 through 7, measured in milligrams per cubic decimetre.
- BOD5_X (where X is 1-7): Monthly averaged biochemical oxygen demand (BOD5), determined over 5 days, from stations 1 through 7, measured in milligrams of oxygen per cubic decimetre.
Distribution
The dataset is available in CSV format, with a sample file,
test.csv
, having a size of 7.57 KB. It contains 36 columns and 63 total values for each column. The data consists of average monthly readings. It is important to note that the number of observations varies across stations (ranging from 4 to approximately 20 years). Some columns, particularly for stations 3-7, show a notable percentage of missing (NA) values. Test data are specifically designed without the target column, in anticipation of potential prediction competitions.Usage
This dataset is ideally suited for various analytical and predictive tasks:
- Data Dependence Analysis: Conduct Exploratory Data Analysis (EDA) to uncover relationships and patterns among the different water quality indicators and across monitoring stations.
- Target Prediction: Develop and test models to accurately predict the dissolved oxygen levels at the target station using data from the upstream stations. This is a core machine learning application for environmental monitoring.
- Impact Analysis: Investigate the distinct influence of data from the initial two upstream stations (1-2) versus the subsequent five stations (3-7) on the accuracy of dissolved oxygen predictions.
Coverage
The dataset's geographic scope pertains to river water quality monitoring in Ukraine, utilising data provided by the State Water Resources Agency of Ukraine. The time range covers monthly averaged data, with the observation period for individual stations spanning between 4 and approximately 20 years. There is no explicit demographic scope. Data availability notes indicate that the number of observations varies per station, and training and test data subsets are curated to ensure a similar percentage of non-NA values.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
This dataset is particularly beneficial for:
- Environmental Researchers: To study river ecosystems, water pollution dynamics, and long-term water quality trends.
- Data Scientists and Machine Learning Practitioners: For developing and refining predictive models for environmental parameters, specifically dissolved oxygen.
- Hydrologists and Water Resource Managers: To gain insights into river health and inform management strategies, especially concerning oxygen levels critical for aquatic life.
- Students and Academics: As a practical resource for coursework, projects, and research in environmental science, data analysis, and machine learning.
Dataset Name Suggestions
- River Water Oxygen Prediction
- Ukrainian River Water Quality
- Dissolved Oxygen River Data
- Multi-Station River Health
- Aquatic BOD and Nutrient Monitoring
Attributes
Original Data Source: River Water Oxygen Prediction