News Category Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is a follow-up to the News Category Dataset. It contains 45.5k news headlines from the year 2012 to 2018 obtained from HuffPost. The motive was to give beginners an easy-to-use dataset. Therefore dataset has been cleaned, filtered and target feature have been balanced, unlike the original dataset.
Content
This data contains:
45500 rows and 5 columns
Target column: Category ( Business , Politics, Food & Drink, TRAVEL ,Parenting, STYLE & BEAUTY ,Wellness, World news, Sports , Entertainment)
-Each category class contains 4500 rows
-It contains nan values only in keywords column
Apart from that, the original dataset had lots of third person statements (like "This statement is irrelevant" says the officials)
-Keyword column has been added where main keywords in a url are extracted (urls were in the original dataset)
Inspiration
I found the original dataset hard to work with. So i cleaned the dataset and made a more easy-to-use dataset. Hope it helps fellow beginners getting started with NLP !!
Original Data Source: News Category Dataset