Turkish Spam Detection Emails
Fraud Detection & Risk Management
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to facilitate research into spam detection, specifically focusing on the Turkish language. It comprises a collection of email messages, each categorised as either 'spam' or 'normal' (often referred to as 'ham'), making it suitable for training and evaluating machine learning models aimed at identifying unsolicited electronic mail. The emails were gathered from various personal accounts, ensuring a varied content base for analysis.
Columns
- text: This column contains the full content of the email message.
- class: This column indicates the classification of the email, with values such as 'spam', 'ham' (normal), or 'Other', reflecting whether the email is unsolicited or legitimate.
Distribution
The dataset is presented in a tabular format, typically a CSV file, and includes 330 spam emails alongside 496 normal emails, resulting in a total of 826 individual records. While the exact file size is not specified, it is structured to support straightforward data processing.
Usage
This dataset is ideally suited for applications in fraud detection, risk management, and natural language processing. It can be utilised for developing and testing algorithms for email classification, building spam filters, and for general text classification tasks involving Turkish language data. It is particularly valuable for training machine learning models for identifying and flagging spam.
Coverage
The data collection period is reflected by its creation in 2019, and it represents version 1.0 of the dataset. While the specific geographic origin of each email is not detailed, the dataset's coverage is considered global. The emails are primarily in Turkish and were sourced from diverse personal accounts.
License
CC-BY
Who Can Use It
This dataset is particularly useful for researchers, data scientists, and machine learning engineers, especially those who are new to the field, as it is categorised for beginners. It is relevant for individuals working on developing and improving spam detection systems, natural language processing applications, and any project involving email content analysis within the Turkish language context.
Dataset Name Suggestions
- Turkish Spam V01
- Turkish Email Classification Data
- Turkish Spam Detection Emails
Attributes
Original Data Source: Turkish Spam V01