Synthetic Turkish Identity and Address Collection
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Synthetic Turkish identity records facilitate the investigation and testing of various name and address matching algorithms. This file includes detailed personal information fields such as names, surnames, contact details, and location data specific to Turkey. It serves as a resource for developers and data scientists needing to validate systems with non-Latin characters and specific Turkish address formats without compromising real user privacy.
Columns
ID: Unique identifier for the record.NAME_: First name of the individual (e.g., Deniz).SURNAME: Family name (e.g., KALTAKCI).NAMESURNAME: Combined first name and surname.GENDER: Gender indicator, distributed as 55% 'K' and 45% 'E'.BIRTHDATE: Date of birth, generally spanning from 1950 to 1999.EMAIL: Synthetic email address associated with the identity.TCNUMBER: Turkish Republic identity number (11 digits).TELNR: Telephone number.CITY: Major city or province (e.g., İstanbul, Ankara).TOWN: Town or sub-province name.DISTRICT: District or neighbourhood name.STREET: Specific street name or location description.POSTALCODE: Numeric postal code.ADDRESSTEXT: Full combined address string including street, town, and city.
Distribution
The file is provided in CSV format to ensure easier accessibility and broader compatibility compared to Excel formats. It contains exactly 100,000 rows (records) and 15 columns. The data exhibits a clean structure with 100% validity across key fields like names, surnames, and city entries, with no missing values reported in the sample.
Usage
Ideal applications include testing name and address matching algorithms, training Natural Language Processing (NLP) models for Turkish text, and performing clustering analysis. It is also suitable for software testing environments where high-volume, realistic Turkish user data is required to verify database performance, field validation (such as TC Numbers), and UI localization.
Coverage
Geographically, the data covers Turkey, spanning both European and Asian regions. It includes specific provincial data, with İstanbul representing approximately 23% of the entries and Ankara 9%. Demographic data includes a balanced gender split and birth dates covering the late 20th century. As a synthetic set, it simulates real-world distributions while remaining public domain.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For training NLP models and testing clustering algorithms.
- Software Engineers: For population of development databases and load testing.
- QA Engineers: For validating form inputs and sorting logic involving Turkish characters.
Dataset Name Suggestions
- Synthetic Turkish Identity and Address Collection
- 100k Fake Turkish Customer Records
- Turkish Names and Locations for Testing
- Mock Turkish Demographic Data
Attributes
Original Data Source: Synthetic Turkish Identity and Address Collection
Loading...
