Opendatabay APP

Synthetic 2D Cluster Test Data

Synthetic Data Generation

Tags and Keywords

Clustering

Coordinates

Beginner

Tabular

K-means

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Synthetic 2D Cluster Test Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Represents small, two-dimensional coordinate data intended for practicing and demonstrating clustering techniques. It is an ideal, clean example for beginners and students learning fundamental machine learning concepts, particularly visualising algorithms such as k-Means. The data was synthetically generated using a cluster painting tool specifically for educational purposes.

Columns

  • x: Represents the x-coordinate of the data point. Values range from 417 (minimum) up to 699 (maximum), with a mean value of 577.
  • y: Represents the y-coordinate of the data point. Values range from 201 (minimum) up to 472 (maximum), with a mean value of 349.
  • color: Indicates the class or cluster label assigned to the point, taking values between 0 and 2.

Distribution

The data is delivered as a tabular file in CSV format, labelled as data.csv, with a small size of 12.91 kB. It contains 3 distinct columns and 336 valid records. The data structure is entirely clean, featuring no mismatched entries or missing values across any of the dimensions.

Usage

This dataset is perfectly suited for introductory data science and machine learning applications. Primary uses include creating case studies for algorithm explanation, demonstrating data preparation steps, and practicing the implementation of classification and clustering models. It is particularly valuable for visualising how k-Means clustering identifies centroids.

Coverage

The data is entirely abstract and synthetic, consisting only of two-dimensional coordinate points. Consequently, there is no inherent geographical, demographic, or specific temporal scope associated with this collection. The data is static and is not expected to receive future updates.

License

CC0: Public Domain

Who Can Use It

  • Students: For assignments requiring simple, clear data input for clustering models.
  • Educators/Academics: For use in teaching introductory programming notebooks and data science workshops.
  • Beginner Developers: For rapid testing and benchmarking of clustering library functions without needing heavy preprocessing.

Dataset Name Suggestions

  • Synthetic 2D Cluster Test Data
  • Introductory k-Means Practice Set
  • Small Coordinate Clustering Example

Attributes

Original Data Source: Synthetic 2D Cluster Test Data

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

02/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format