Emoji Expressions Catalog
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a full database of Unicode emojis, offering detailed information for each of the 4,159 unique emoji characters. It serves as a valuable resource for text mining and social media analytics, enabling users to extract and understand the meaning and context of emojis within digital communication. The dataset includes various attributes such as names, codepoints, and their categorisation into groups and sub-groups, which aids in sentiment analysis and contextual understanding of human expressions.
Columns
The dataset is structured to provide essential details for each emoji, including:
- Code Points: One or more hexadecimal code points identifying the emoji.
- Status: Indicates the qualification status of the emoji (e.g., fully-qualified, minimally-qualified, unqualified, or component).
- Emoji Name: The official name of the emoji (e.g., "grinning face").
- Emoji Character: The visual representation of the emoji itself (e.g., 😀).
- E-version: The Unicode Emoji version in which the emoji was introduced or updated (e.g., E1.0).
- Group: The primary category the emoji belongs to (e.g., "Smileys & Emotion", "Animals & Nature").
- Sub-group: A more specific sub-category within its group (e.g., "face-smiling", "cat-face").
Distribution
The dataset is provided in CSV format and contains 4,159 records, representing all currently available emojis, including their variations and skin tones. The associated raw text file is approximately 593.24 KB in size.
Usage
This dataset is ideal for various applications, including:
- Text mining and natural language processing (NLP) to identify and extract emojis from text.
- Social media analytics for sentiment analysis, trend tracking, and understanding user emotions.
- Developing applications that require robust emoji handling and categorisation.
- Academic research into digital communication and emotional expression.
- Building custom emoji keyboards or display systems.
Coverage
The dataset covers all current Unicode emojis, including their variations and skin tones, as of version 15.0 of the Unicode Emoji Database. It provides a global scope as emojis are universally used across diverse demographics and languages.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Data Scientists working on text analysis and sentiment modelling.
- NLP Engineers developing language processing algorithms.
- Social Media Analysts aiming to gain deeper insights into online conversations.
- Researchers studying digital communication trends and emoji usage.
- Software Developers creating applications with emoji integration.
Dataset Name Suggestions
- Unicode Emoji List
- Emoji Character Database
- Global Emoji Data
- Emoji Expressions Catalog
- Digital Sentiment Emojis
Attributes
Original Data Source: Emoji Expressions Catalog