Opendatabay APP

Developers Emoji Reference Dataset

Data Science and Analytics

Tags and Keywords

Emoji

Unicode

Classification

Machine-learning

Dataset

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Developers Emoji Reference Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

842 distinct emojis and their corresponding details, created for machine learning applications. This dataset was scraped from Unicode Emojis and is particularly suited for projects involving image processing, such as comparing human facial features to emojis. It provides a foundational resource for developers and researchers looking to explore or build models related to emoji recognition, multiclass classification, or sentiment analysis.

Columns

  • Emoji: Contains the visual emoji characters. There are 842 unique values in this column.
  • Unicode: Lists the unique Unicode character codes for each emoji (e.g., U+1F601).
  • Bytes: Provides the byte representation for each Unicode emoji character (e.g., \xF0\x9F\x98\x81).
  • Description: A textual description of each emoji (e.g., GRINNING FACE WITH SMILING EYES).

Distribution

The dataset is provided in a single CSV file named emoji_data.csv, with a file size of approximately 39.34 kB. It contains 842 rows, corresponding to 842 unique emojis, and has 4 columns. There are no missing or mismatched values.

Usage

This dataset is ideal for a variety of applications, including:
  • Developing machine learning programs using image processing with libraries like OpenCV.
  • Training multiclass classification models to categorise or identify emojis.
  • Building systems that analyse or suggest emojis based on text or facial expressions.
  • Powering applications in gaming, deep learning, and digital art.

Coverage

The dataset contains a universal set of 842 emojis scraped from Unicode Emojis. It does not have a specific geographic, demographic, or time-based scope as it covers standardised digital characters. The data is static and is not expected to be updated.

License

CC0: Public Domain

Who Can Use It

  • Machine Learning Engineers: Can use this data to train and test image recognition and classification models.
  • Python and Full-Stack Developers: Suitable for integrating emoji-related features into applications.
  • Data Scientists: Can explore patterns in emoji usage and descriptions for sentiment analysis.
  • Researchers: Useful for studies in human-computer interaction and digital communication.
  • Hobbyists: A clean and simple dataset for learning web scraping or starting a first data project.

Dataset Name Suggestions

  • Unicode Emoji Set for Machine Learning
  • Complete Emoji Details (Unicode, Bytes, Description)
  • Emoji Classification & Image Processing Dataset
  • Developer's Emoji Reference Dataset

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

17/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format