Opendatabay APP

Linguistic Parts of Speech Data

Data Science and Analytics

Tags and Keywords

Computer

Text

Data

Nlp

Languages

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Linguistic Parts of Speech Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a collection of words categorised by their parts of speech, specifically designed for educational pursuits. It includes a CSV file detailing statistical counts and separate files for individual words sorted by their respective parts of speech. The information within this dataset is freely available and primarily intended for academic and learning applications in areas such as natural language processing and text analysis. It differentiates between 'pure' and 'impure' parts of speech, noting specific counts for pure adjectives and adverbs.

Columns

The core description file features the following columns:
  • Parts of Speech: The name given to each part of speech (e.g., noun, verb, adjective).
  • Count: The total number of words available for each specific part of speech.
  • Pure (Top): This indicates the count of words that belong exclusively to one part of speech, meaning they cannot function as other parts of speech. For instance, 143 top adjectives and 47 top adverbs are considered pure in this dataset.

Distribution

The dataset typically includes data files in CSV format. It consists of a content description CSV file that provides counts and a folder containing all words, with each word organised into separate files named by its part of speech. The 'Parts of Speech' column itself contains 8 unique values and 8 total values. While specific total row or record counts for the entire word collection are not provided, statistical summaries like percentages (e.g., 63%, 13%, 25%) are included for various categories.

Usage

This dataset is ideally suited for:
  • Data Science and Analytics projects focusing on linguistic patterns.
  • Natural Language Processing (NLP) research and development.
  • Text Mining and Text Cleaning applications.
  • Studying Languages and grammatical structures.
  • Educational purposes in Computer Science and related fields.

Coverage

The dataset has a global regional coverage. It was listed on 16th June 2025, with version 1.0. There are no specific notes regarding demographic scope or data availability for particular groups or years beyond its general global reach.

License

CC0

Who Can Use It

This dataset is intended primarily for educational purposes. Ideal users include:
  • Students and researchers in data science, computer science, and linguistics.
  • Developers working on NLP models and text analysis tools.
  • Anyone interested in linguistic data analysis and understanding parts of speech.

Dataset Name Suggestions

  • Parts of Speech Word Collection
  • PoS Word Corpus
  • Linguistic Parts of Speech Data
  • Grammatical Word Classification

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

16/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free