English Word POS Tag Dataset
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset presents a collection of 370,000 English words, each accompanied by its corresponding Part-of-Speech (POS) tag. It was generated by applying the NLTK POS-tagger to an existing corpus of English words. This resource is highly valuable for various applications in natural language processing (NLP), linguistic analysis, and educational technology, providing a foundational understanding of word functions in text.
Columns
- index: A numerical identifier for each entry in the dataset.
- word: The English word itself.
- pos_tag: The Part-of-Speech tag assigned to the word, adhering to the Penn Treebank tag set. Examples include:
- CC: coordinating conjunction
- CD: cardinal digit
- DT: determiner
- EX: existential there
- FW: foreign word
- IN: preposition/subordinating conjunction
- JJ: adjective (e.g., large)
- JJR: adjective, comparative (e.g., larger)
- JJS: adjective, superlative (e.g., largest)
- LS: list item marker
- MD: modal (e.g., could, will)
- NN: noun, singular
- NNS: noun plural
- NNP: proper noun, singular
- NNPS: proper noun, plural
- PDT: predeterminer
- POS: possessive ending (e.g., parent's)
- PRP: personal pronoun (e.g., hers, himself)
- PRP$: dollar-sign possessive pronoun (e.g., her, my)
- RB: adverb (e.g., occasionally, swiftly)
- RBR: adverb, comparative (e.g., greater)
- RBS: adverb, superlative (e.g., biggest)
- RP: particle (e.g., about)
- SYM: symbol
- TO: infinite marker (e.g., to)
- UH: interjection (e.g., goodbye)
- VB: verb (e.g., ask)
- VBG: verb gerund (e.g., judging)
- VBD: verb past tense (e.g., pleaded)
- VBN: verb past participle (e.g., reunified)
- VBP: verb, present tense not 3rd person singular (e.g., wrap)
- VBZ: verb, present tense with 3rd person singular (e.g., bases)
- WDT: wh-determiner (e.g., that, what)
- WP: wh- pronoun (e.g., who)
- WP$: possessive wh-pronoun
- WRB: wh- adverb (e.g., how)
Distribution
This dataset comprises approximately 370,100 records, each consisting of an English word and its corresponding Part-of-Speech tag. The distribution of POS tags within the dataset indicates that singular nouns (NN) constitute 62% of the entries, plural nouns (NNS) account for 13%, and the remaining tag types collectively make up 24%. The dataset is structured to provide clear word-to-tag mappings.
Usage
This dataset is well-suited for a variety of applications, including:
- Natural Language Processing (NLP): Essential for training models in tasks such as POS tagging, text classification, and grammar analysis.
- Linguistic Research: Facilitates the study of English grammatical structures, word morphology, and syntactic patterns.
- Educational Tools: Ideal for developing language learning apps, grammar checkers, and vocabulary building exercises.
- Text Mining and Analysis: Enables deeper insights into unstructured text by identifying the grammatical role of individual words.
Coverage
The dataset focuses on the English language and its grammatical components, providing a global scope relevant to English words. It does not include specific geographic, time-based, or demographic limitations related to the words themselves, serving as a general English word corpus.
License
COO
Who Can Use It
- AI and Machine Learning Developers: For creating and improving NLP models and algorithms.
- Linguists and Academic Researchers: For conducting scholarly investigations into English grammar and lexicography.
- Educators and Students: For teaching, learning, and developing educational resources related to language arts.
- Data Scientists and Analysts: For preparing and enriching text data in diverse analytical projects.
- Software Developers: Especially those creating applications with language processing functionalities.
Dataset Name Suggestions
- English Word POS Tag Dataset
- 370k English Word Corpus with Grammatical Tags
- Annotated English Lexicon for NLP
- NLTK Tagged English Words
- Part-of-Speech Tagged English Dictionary
Attributes
Original Data Source: 370k English words corpus