Opendatabay APP

Diamond Attribute and Price Data

Product Reviews & Feedback

Tags and Keywords

Diamond

Price

Carat

Cut

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Diamond Attribute and Price Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This classic dataset focuses on the prices and physical attributes associated with diamonds. The initial collection included almost 54,000 records of individual diamonds. A key feature of this collection is the ability to predict the diamond cut based on the provided metrics. Note that a significant portion of the original records—specifically 14,184—appear to represent measurements of the same physical diamonds, likely taken from different angles. This duplication can be identified by checking for repeated values when variables dependent on angle (such as x, y, z, depth, and table) are disregarded.

Columns

The dataset contains ten columns, offering detailed characteristics for each recorded diamond:
  • carat: The weight of the diamond, measured in carats. Values range from 0.2 to 5.01.
  • cut: Describes the quality of the diamond cut. Examples of cuts present include 'Ideal' (40%) and 'Premium' (26%).
  • color: Refers to the diamond's colour grade. Common grades include 'G' (21%) and 'E' (18%).
  • clarity: Indicates the internal and external flaws of the diamond. The most frequent clarities are 'SI1' (24%) and 'VS2' (23%).
  • depth: The total depth percentage of the diamond. Measurements span from 43 to 79.
  • table: The width of the diamond's top facet relative to its width. Measurements span from 43 to 95.
  • price: The sale price of the diamond, ranging from 326 up to 18.8 thousand.
  • x: The length (x dimension) of the diamond, ranging from 0 to 10.7.
  • y: The width (y dimension) of the diamond, ranging from 0 to 58.9.
  • z: The depth (z dimension) of the diamond, ranging from 0 to 31.8.

Distribution

The dataset is typically structured in a format such as CSV, with the file size being 3.22 MB (diamonds.csv). It consists of 10 distinct columns. The data contains 53.9 thousand valid records. All metrics indicate 100% validity across all records, with zero mismatched or missing values reported. There is no expected update frequency for this dataset; it is marked as 'Never'.

Usage

This data product is ideally suited for classification tasks. It can be used to develop machine learning models, such as Random Forest, aimed at predicting specific diamond attributes. The primary use case involves building predictive models focused on classifying or estimating the cut of a diamond based on its physical properties.

Coverage

The dataset focuses entirely on the intrinsic attributes of diamonds (carat, dimensions, colour, clarity, and price). The sources do not specify any geographical or temporal scope, nor do they detail any demographic coverage.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

The dataset is suitable for a wide range of users, from beginners who are just starting out in data analysis to advanced practitioners. Intended users include data scientists, students, and machine learning engineers looking to practice classification modeling and price prediction using high-quality tabular data.

Dataset Name Suggestions

  • Diamond Attribute and Price Data
  • Diamond Quality Prediction
  • Machine Learning Diamond Attributes

Attributes

Original Data Source: Diamond Attribute and Price Data

Listing Stats

VIEWS

1

DOWNLOADS

1

LISTED

07/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format