Dark Mode

Home

Data Categories

Synthetic Data

Synthetic Turkish Identity and Address Collection

FREE DATASET LIBRARY

Verified Data Provider

£0

Synthetic Turkish Identity and Address Collection

Synthetic Data Generation

Tags and Keywords

Turkish

Synthetic

Nlp

Address

Identity

Trusted By

Synthetic Turkish Identity and Address Collection Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Synthetic Turkish identity records facilitate the investigation and testing of various name and address matching algorithms. This file includes detailed personal information fields such as names, surnames, contact details, and location data specific to Turkey. It serves as a resource for developers and data scientists needing to validate systems with non-Latin characters and specific Turkish address formats without compromising real user privacy.

Columns

ID: Unique identifier for the record.
NAME_: First name of the individual (e.g., Deniz).
SURNAME: Family name (e.g., KALTAKCI).
NAMESURNAME: Combined first name and surname.
GENDER: Gender indicator, distributed as 55% 'K' and 45% 'E'.
BIRTHDATE: Date of birth, generally spanning from 1950 to 1999.
EMAIL: Synthetic email address associated with the identity.
TCNUMBER: Turkish Republic identity number (11 digits).
TELNR: Telephone number.
CITY: Major city or province (e.g., İstanbul, Ankara).
TOWN: Town or sub-province name.
DISTRICT: District or neighbourhood name.
STREET: Specific street name or location description.
POSTALCODE: Numeric postal code.
ADDRESSTEXT: Full combined address string including street, town, and city.

Distribution

The file is provided in CSV format to ensure easier accessibility and broader compatibility compared to Excel formats. It contains exactly 100,000 rows (records) and 15 columns. The data exhibits a clean structure with 100% validity across key fields like names, surnames, and city entries, with no missing values reported in the sample.

Usage

Ideal applications include testing name and address matching algorithms, training Natural Language Processing (NLP) models for Turkish text, and performing clustering analysis. It is also suitable for software testing environments where high-volume, realistic Turkish user data is required to verify database performance, field validation (such as TC Numbers), and UI localization.

Coverage

Geographically, the data covers Turkey, spanning both European and Asian regions. It includes specific provincial data, with İstanbul representing approximately 23% of the entries and Ankara 9%. Demographic data includes a balanced gender split and birth dates covering the late 20th century. As a synthetic set, it simulates real-world distributions while remaining public domain.

License

CC0: Public Domain

Who Can Use It

Data Scientists: For training NLP models and testing clustering algorithms.
Software Engineers: For population of development databases and load testing.
QA Engineers: For validating form inputs and sorting logic involving Turkish characters.

Dataset Name Suggestions

Synthetic Turkish Identity and Address Collection
100k Fake Turkish Customer Records
Turkish Names and Locations for Testing
Mock Turkish Demographic Data

Attributes

Original Data Source: Synthetic Turkish Identity and Address Collection

Listing Stats

VIEWS

DOWNLOADS

LISTED

07/12/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Synthetic Turkish Identity and Address Collection

Synthetic Data Generation

Tags and Keywords

Turkish

Synthetic

Nlp

Address

Identity

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in ZIP Format

RECOMMENDED DATASETS