Topic_classification_dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
I made this dataset from other datasets so I can make it easier to deal with topic classification
it contains 6 topics :
Politics
Health
Emotion
Financial
Sport
Science
the content of the topics are news , articles ,answers or comments
- the file "topic_classification_data.csv" have the original text data
- the file "2CLEAN" have the same text data but with NLP processing applied on the text
the NPL processing steps are :
- Text cleaning: -Normalize the text.
-Remove punctuation marks.
-Remove stop words.
-Remove HTML tags.
-Remove special characters.
-Remove emojis.
-Fix contractions.
- POS tagging 3.Lemmatization
License
CC0
Original Data Source: Topic_classification_dataset