Monthly River SSC Forecasting Dataset
Public Health & Epidemiology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides crucial information for predicting suspended substances concentration (SSC) in river water. It includes data from eight consecutive monitoring stations within a state water monitoring system. The primary purpose is to enable the prediction of SSC values at a target station using measurements from seven upstream stations, which are numbered in ascending order from the station closest to the target. This resource is vital for understanding and forecasting river water quality, particularly as it pertains to the maximum permissible SSC levels.
Columns
- Id: A unique identifier for each monthly averaged data entry.
- target: Represents the monthly averaged suspended substances concentration (SSC) at the target station, measured in milligrams per cubic decimetre (mg/cub. dm). This column is intentionally absent in the test data.
- 1-7: These columns contain the monthly averaged suspended substances concentration (SSC) from stations 1 through 7, also measured in milligrams per cubic decimetre (mg/cub. dm). These stations are strategically located upstream from the target station, with station 1 being the closest. It is worth noting that stations 3-7 have a significant percentage of missing values, whereas stations 1 and 2 have complete data.
Distribution
The dataset is structured as monthly averaged data, typically provided in a CSV file format, such as
test.csv
, which is approximately 1.51 KB in size and comprises 8 columns. The number of observations varies across stations, ranging from approximately 4 to 20 years of data. Training and test data have been carefully selected to ensure a consistent proportion of non-missing values across stations with differing observation series lengths.Usage
This dataset is ideally suited for various analytical and predictive tasks, including:
- Data Dependence Analysis: Performing Exploratory Data Analysis (EDA) to uncover relationships and patterns within the river water quality parameters.
- Water Quality Prediction: Developing models to predict the suspended substances concentration at the target station with high accuracy.
- Impact Assessment: Analysing the individual influence of the closest upstream stations (1-2) compared to the more distant stations (3-7) on prediction accuracy.
Coverage
The data originates from river water monitoring stations, specifically relevant to water bodies in Ukraine, where a maximum permissible SSC value of 15 mg/cub. dm is stipulated. The temporal scope includes monthly averaged data, with observation periods for individual stations spanning from 4 to approximately 20 years.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
This dataset is particularly beneficial for:
- Data Scientists: For developing and refining regression models to predict environmental metrics.
- Environmental Researchers: For studying river ecosystems, water pollution, and hydrological patterns.
- Hydrological Engineers: For designing and optimising water management and monitoring systems.
- Governmental Agencies: Especially those responsible for water resource management and public health, for compliance monitoring and strategic planning related to water quality.
Dataset Name Suggestions
- River Suspended Solids Concentration Predictor
- Ukrainian River Water Quality Data
- Upstream River Sediment Analysis
- Monthly River SSC Forecasting Dataset
- Water Monitoring Station Data
Attributes
Original Data Source: Monthly River SSC Forecasting Dataset