Synthetic Circle Point Collection
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of synthetic data provides a high-fidelity, structured challenge specifically designed for the evaluation of clustering algorithms. The points are arranged in a geometric pattern featuring 100 distinct, well-defined groups. This arrangement makes the data highly valuable for benchmarking machine learning models, such as k-means, where clear class separation and structure recognition are critical. The inclusion of class labels supports both unsupervised clustering analysis and supervised classification tasks.
Columns
- x: The primary coordinate of the point along the horizontal axis. Values range from approximately -5 to 185, with a mean of 90.
- y: The secondary coordinate of the point along the vertical axis. Values range from approximately -5 to 185, with a mean of 90.
- class: The ground truth label indicating the specific circle membership for the point. These labels run from 0 up to 99, denoting the 100 separate clusters.
Distribution
The dataset is provided in a standard tabular format, consisting of 10,000 total instances (rows or records). It features a fixed structure of 100 groups, with exactly 100 points contributing to each group. The data is entirely complete and valid across all fields, with zero missing or mismatched values recorded. The data file size is approximately 225 kB. Specific numbers for rows or records are available, totaling 10,000.
Usage
The points are ideal for testing the accuracy and scalability of novel clustering algorithms. They serve as a crucial benchmark for models attempting to detect non-linear geometric structures in two dimensions. It is also suitable for educational purposes to illustrate density-based or centroid-based grouping principles.
Coverage
The scope is purely abstract, focusing on two-dimensional mathematical space. As synthetic data, it has no geographic location, time range, or demographic limits. The data structure intentionally provides uniform availability, ensuring 100 points are present for each of the 100 classes (circles).
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Data Scientists: For validating the performance of their custom clustering models against a known ground truth.
- Machine Learning Researchers: Developing novel density estimation or geometric feature extraction techniques.
- Academics and Educators: Demonstrating core principles of unsupervised learning and class separation.
Dataset Name Suggestions
- Structured Clustering Benchmark Data
- Synthetic Circle Point Collection
- 2D Geometric Clustering Challenge
- 100-Class Algorithm Evaluator
Attributes
Original Data Source: Synthetic Circle Point Collection
Loading...
