Official Turkish Lexicon Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Explore the richness of the Turkish language with this collection of Turkish dictionary definitions. The dataset serves as a valuable resource for those interested in linguistic research, language analysis, natural language processing tasks, and educational projects. It offers detailed definitions for a wide array of Turkish words and phrases, providing foundational material for understanding the intricacies of the language.
Columns
- madde_id: A unique identifier for each dictionary entry.
- word_id: An identifier for the specific word within an entry.
- kac: A numerical column, possibly indicating a count or frequency related to the word.
- kelime_no: A word number or sequence identifier.
- cesit: A numerical value, potentially representing a classification or type of word.
- anlam_gor: A numerical column, likely indicating whether a meaning is visible or checked.
- on_taki: Represents a pre-suffix, with many missing values and diverse entries like '(birinin)'.
- madde: The core meaning or entry word, with a high number of unique values.
- cesit_say: A numerical count related to the classification or type.
- anlam_say: The count of meanings associated with a word, ranging from 1 to 56.
- taki: Represents a suffix, with 'ği' being a common example among others.
- cogul_mu: A binary indicator, likely signifying if the word is plural.
- ozel_mi: A binary indicator, likely signifying if the word is a proper noun.
- lisan_kodu: A language code, with various numerical values.
- lisan: The language, with 'Rumca' being a frequently occurring entry.
- telaffuz: Pronunciation notes, such as 'l ince okunur' (l is pronounced thinly).
- birlesikler: Compound words or phrases, with 'kerli ferli' as an example.
- font: This column appears to be entirely empty.
- madde_duz: A regularised or adjusted form of the entry word.
- gosterim_tarihi: The display date or last update date for the entry, ranging from 26 March 2019 to 31 July 2023.
- anlamlarListe: A JSON-like structure containing a list of meanings for each entry, including 'anlam_id', 'madde_id', 'anlam_sira', 'fiil', 'tipkes', 'anlam', 'gos', and 'ozelliklerListe'.
- atasozu: A JSON-like structure containing proverbs related to the entry.
Distribution
The dataset is provided in a CSV format, typical for data files. It contains 21 distinct columns and consists of approximately 92,400 records or rows of data. The file size is 77.03 MB.
Usage
This dataset is ideal for a variety of applications, including natural language processing tasks such as text analysis and sentiment analysis in Turkish. It can also be used for educational projects focused on the Turkish language, linguistic research, and the development of language learning tools. Researchers and linguists can leverage it for detailed language studies.
Coverage
The dataset focuses exclusively on the Turkish language, providing definitions from the Turkish Language Association (TDK). The content covers a range of dictionary entries with historical data updates spanning from March 2019 to July 2023. There are no specific demographic notes, as the data is linguistic in nature.
License
CC0: Public Domain
Who Can Use It
This dataset is primarily intended for researchers, linguists, language enthusiasts, and anyone with a keen interest in the Turkish language. It supports use cases such as:
- Researchers: For academic studies on Turkish lexicography and etymology.
- Linguists: For analysing word structures, semantic relationships, and language evolution.
- Developers: For building natural language processing models, spell checkers, or translation tools for Turkish.
- Educators and Students: For language learning resources and educational projects.
Dataset Name Suggestions
- TDK Turkish Dictionary Words
- Turkish Language Definitions
- Official Turkish Lexicon Data
- Turkish Dictionary Entries
Attributes
Original Data Source: Official Turkish Lexicon Data