Opendatabay APP

Diamond Price Prediction Dataset

Retail & Consumer Behavior

Tags and Keywords

Diamond

Price

Carat

Cut

Clarity

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Diamond Price Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides key features of diamonds to facilitate the prediction of their market prices. It is an excellent resource for data visualisation and understanding the various attributes that influence a diamond's value. The primary goal is to enable the development of models that can accurately predict diamond prices based on characteristics such as weight, cut quality, colour grade, and clarity.

Columns

  • carat: A numerical value representing the weight of the diamond, with one carat equivalent to 0.2 grams.
  • cut: An ordered categorical variable describing how a rough diamond is shaped into a finished gem. It has 5 levels: "Fair," "Good," "Very Good," "Premium," and "Ideal," with better cuts leading to more symmetrical and luminous diamonds.
  • colour: An ordered categorical variable indicating the diamond's colour. Colourless diamonds are preferred over those with a yellow tint. The dataset includes 7 different colours, represented by letters, where "D" - "F" are considered colourless, and "G" - "J" have a very faint colour.
  • clarity: An ordered categorical variable reflecting the clearness of a diamond. It measures the presence and visibility of imperfections like cracks or mineral deposits. There are 8 ordered levels, ranging from "I1" (worst) to "IF" (best), where fewer and less noticeable imperfections signify better clarity.
  • depth: A numerical value representing the total depth percentage of the diamond.
  • table: A numerical value representing the width of the top facet of the diamond relative to its widest point.
  • price: A numerical value representing the price of the diamond, in British Pounds (implied by British English requirement, currency not specified in source).
  • x: A numerical value representing the length of the diamond in millimetres.
  • y: A numerical value representing the width of the diamond in millimetres.
  • z: A numerical value representing the depth of the diamond in millimetres.

Distribution

The dataset is provided as a CSV data file and is approximately 2.77 MB in size. It contains 53,940 records across 10 distinct columns, with no missing or mismatched values reported for any feature. The data is structured in a tabular format, ideal for direct use in analytical and machine learning pipelines.

Usage

This dataset is ideally suited for:
  • Predictive modelling: Developing and testing machine learning models to forecast diamond prices.
  • Data visualisation: Exploring relationships between diamond characteristics and price, identifying trends and patterns.
  • Classification tasks: Categorising diamonds based on their features or price ranges.
  • Regression analysis: Quantifying the impact of individual features on diamond pricing.
  • Educational purposes: Serving as a practical example for studying statistical analysis and machine learning techniques.

Coverage

The dataset focuses on the intrinsic characteristics of individual diamonds. There is no explicit geographical, time range, or demographic scope detailed within the provided information.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists and Machine Learning Engineers: To build and refine diamond price prediction models.
  • Business Analysts and Retailers: For market analysis, pricing strategies, and understanding value drivers in the diamond industry.
  • Researchers: Studying gemmology, consumer behaviour, or the economics of luxury goods.
  • Students: For academic projects involving regression, classification, and data visualisation.

Dataset Name Suggestions

  • Diamond Price Prediction Dataset
  • Gemstone Characteristics and Pricing
  • Precious Stone Value Predictor
  • Diamond Quality and Price Data

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

22/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format