GadgetByte Blog Content Dataset
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a collection of 7,245 blog posts scraped from GadgetByte Nepal, a prominent tech review news portal. Spanning from 2013 to 2023, it offers a valuable resource for understanding technology trends, news coverage, and expert opinions as presented by a key industry voice in Nepal. The dataset's purpose is to provide users with in-depth information, tools, and advice derived from real-world tech content, aiding in research, analysis, and decision-making within the technology sector.
Columns
- title: The title of each blog post, detailing the main subject.
- tags: Relevant keywords or categories associated with the blog post, aiding in content classification.
- author: The name of the individual who authored the blog post. This column contains 2,731 unique authors, with 'Sanjeev' accounting for 21% and 'Yural Maskey' for 13% of posts.
- published_date: The published date of the blog post in MD,Y format.
- official_date: The complete datetime stamp of the blog post's publication.
- posts: The full text content of each blog post.
- featured_image: Links to the primary image associated with the blog post. This column contains 7,230 unique image links.
Distribution
The dataset comprises 7,245 individual blog posts, typically formatted as a CSV file. While the exact file size is not specified, it includes a wealth of textual data, image links, and metadata. The
title
column features 7,233 unique values, indicating very few duplicate titles. The tags
column shows variety, with 'Samsung' being a tag in 8% of the posts.Usage
This dataset is ideal for:
- Natural Language Processing (NLP) research, including topic modelling, sentiment analysis, and text summarisation on tech-related content.
- Analysing technology trends and shifts over a decade in a specific regional context.
- Content strategy development, by understanding popular topics, author contributions, and reader engagement patterns.
- Media studies focusing on tech journalism and online publishing.
- Developing AI models requiring large corpuses of specialised text data.
Coverage
The dataset covers a time range from 16th November 2013 to 15th August 2023, providing a historical record of technology news and reviews over approximately ten years. Geographically, the data originates from Nepal, as published by GadgetByte Nepal, but the content often discusses global technology products and trends, making it relevant for a global audience interested in tech. Data availability is consistent across the specified years, with annual post counts ranging from 205 to 1,228.
License
CC0
Who Can Use It
- Data Scientists and NLP Researchers: For building and testing language models, performing text analytics, and understanding content structure.
- Market Analysts: To identify emerging technology trends, competitive landscapes, and consumer interest within the tech sector.
- Journalists and Researchers: For historical analysis of tech news, author contributions, and media influence.
- Tech Enthusiasts and Developers: To gain insights into product reviews, news, and the evolution of technology discussions.
Dataset Name Suggestions
- GadgetByte Nepal Tech Blog Archive
- Nepali Tech News Articles 2013-2023
- GadgetByte Blog Content Dataset
- Nepal Technology Blog Posts
Attributes
Original Data Source: GadgetByte Nepal Blog Posts