DS4C South Korea COVID-19 Data
Public Health & Epidemiology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset, known as DS4C: Data Science for COVID-19 in South Korea, provides structured information related to COVID-19 infection cases in South Korea [3]. It was created by reprocessing and structuring report materials from the KCDC (Korea Centers for Disease Control & Prevention) and local governments, which are known for their quick and transparent announcement of information [3, 4]. The primary purpose is to facilitate easy data analysis and to uncover meaningful patterns through the application of various data mining and visualisation techniques [3, 4]. A portion of this dataset has been recognised and accepted at NeurIPS 2020 [3]. Please be aware that updates to this dataset have ceased, and the
PatientRoute.csv
file is currently unavailable due to privacy concerns [5].Columns
The
Case.csv
file, a sample component of this dataset, includes the following columns:- case_id: A unique identifier for each infection case [6].
- province: Specifies the Special City, Metropolitan City, or Province(-do) where the case occurred. Examples include Seoul and Gyeonggi-do [7].
- city: Details the City(-si), Country(-gun), or District(-gu) [7].
- group: A boolean indicator (TRUE/FALSE) to show if the case is part of a group infection [8].
- infection_case: The specific name of the infection group or other case descriptions, such as 'overseas inflow' [8].
- confirmed: The accumulated number of confirmed cases related to that infection event [9].
- latitude: The latitude coordinate (WGS84) of the infection group's location [9].
- longitude: The longitude coordinate (WGS84) of the infection group's location [10].
Distribution
The data is typically provided in CSV format [1]. The
Case.csv
sample file is 11.71 kB in size and contains 8 columns [6]. It consists of 174 valid records or rows [7-10]. The dataset's PatientRoute.csv
file is currently not available due to privacy considerations [5].Usage
This dataset is ideal for various applications and use cases, including:
- Applying data mining and visualisation techniques to find meaningful patterns related to COVID-19 spread and cases [3, 4].
- Conducting exploratory data analysis (EDA), such as analysing floating population data or identifying who spreads the coronavirus [5].
- Developing time series geospatial analyses using tools like Folium [5].
- Supporting research on public health and epidemiology, particularly in the context of disease outbreaks [3, 11, 12].
- Participating in data visualisation and AI competitions focused on COVID-19 [13].
Coverage
The dataset primarily covers COVID-19 infection cases within South Korea, encompassing data from various provinces and cities across the country [3, 7]. Geographic coordinates (latitude and longitude) are also provided for group infections [9, 10]. The data reflects the period when COVID-19 had infected more than 10,000 people in South Korea [3]. It is important to note that the dataset has stopped receiving updates [5]. While specific demographic groups are not explicitly listed in the
Case.csv
columns, the data pertains to individuals affected by the virus [6-10].License
CC BY-NC-SA 4.0
Who Can Use It
This dataset is intended for a range of users interested in public health data and data analysis:
- Data scientists and analysts: To reprocess information, perform analyses, and find insights into COVID-19 patterns [3, 4].
- Researchers and academics: Particularly those in public health, epidemiology, and data science, as evidenced by partnerships with universities and research institutions [11, 12].
- Competitors in data challenges: Ideal for those participating in hackathons and competitions focused on COVID-19 visualisation and AI [13].
- Journalists and media outlets: For informing public understanding through news articles and blog posts about the pandemic's impact in South Korea [12].
Dataset Name Suggestions
- DS4C South Korea COVID-19 Data
- Korean COVID-19 Infection Cases
- KCDC COVID-19 Dataset
- South Korea Pandemic Data
Attributes
Original Data Source: DS4C South Korea COVID-19 Data