Opendatabay APP

Turkish Spam Detection Emails

Fraud Detection & Risk Management

Tags and Keywords

Tabular

Beginner

Email

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Turkish Spam Detection Emails Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to facilitate research into spam detection, specifically focusing on the Turkish language. It comprises a collection of email messages, each categorised as either 'spam' or 'normal' (often referred to as 'ham'), making it suitable for training and evaluating machine learning models aimed at identifying unsolicited electronic mail. The emails were gathered from various personal accounts, ensuring a varied content base for analysis.

Columns

  • text: This column contains the full content of the email message.
  • class: This column indicates the classification of the email, with values such as 'spam', 'ham' (normal), or 'Other', reflecting whether the email is unsolicited or legitimate.

Distribution

The dataset is presented in a tabular format, typically a CSV file, and includes 330 spam emails alongside 496 normal emails, resulting in a total of 826 individual records. While the exact file size is not specified, it is structured to support straightforward data processing.

Usage

This dataset is ideally suited for applications in fraud detection, risk management, and natural language processing. It can be utilised for developing and testing algorithms for email classification, building spam filters, and for general text classification tasks involving Turkish language data. It is particularly valuable for training machine learning models for identifying and flagging spam.

Coverage

The data collection period is reflected by its creation in 2019, and it represents version 1.0 of the dataset. While the specific geographic origin of each email is not detailed, the dataset's coverage is considered global. The emails are primarily in Turkish and were sourced from diverse personal accounts.

License

CC-BY

Who Can Use It

This dataset is particularly useful for researchers, data scientists, and machine learning engineers, especially those who are new to the field, as it is categorised for beginners. It is relevant for individuals working on developing and improving spam detection systems, natural language processing applications, and any project involving email content analysis within the Turkish language context.

Dataset Name Suggestions

  • Turkish Spam V01
  • Turkish Email Classification Data
  • Turkish Spam Detection Emails

Attributes

Original Data Source: Turkish Spam V01

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

05/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format