Opendatabay APP

Twitter-Based Influenza Activity Data

Social Media and Networking

Tags and Keywords

Earth

Nature

Text

Nlp

Public

Health

Diseases

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Twitter-Based Influenza Activity Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset aims to forecast the spatiotemporal patterns of influenza outbreaks across different locations and dates. It achieves this by identifying influenza-related tweets, primarily originating from the United States. For each week and state, the dataset provides input data consisting of keyword counts from tweets, with the goal of predicting whether an influenza outbreak will occur in that specific state during the subsequent week. An influenza outbreak is indicated when the activity level, as defined by CDC Flu Activity Map, reaches a high level.

Columns

  • ID: An identifier for entries.
  • flu_X_tr: Input data for training, representing keyword counts for tweets from various locations and weeks.
  • flu_Y_tr: Output data for training, indicating the occurrence of an influenza outbreak (0 for no event, 1 for an event) for specific states in the next week.
  • flu_X_te: Input data for testing, similar to flu_X_tr.
  • flu_Y_te: Output data for testing, similar to flu_Y_tr.
  • flu_locs: A list detailing the states covered by the data.
  • flu_keywords: A list of 525 specified keywords used for analysis.
  • Label Count: Provides ranges and counts of values, for instance, 0.00 - 52.40 with 53 entries, up to 471.60 - 524.00 with 53 entries.

Distribution

The dataset is typically provided in a CSV file format. It includes 525 distinct keywords. The input data ('flu_X_') consists of keyword counts for all tweets within a state over a week. The output data ('flu_Y_') signifies the occurrence of an influenza outbreak in that specific state for the subsequent week, represented as either zero (no event) or one (event). The dataset contains 524 unique values across its various segments.

Usage

This dataset is ideal for developing predictive models to forecast influenza outbreak events. It can be utilised for research into spatiotemporal disease patterns, enabling the creation of early warning systems for public health initiatives. Additionally, it supports applications focused on identifying and analysing influenza-related social media discussions.

Coverage

The geographic scope of the dataset is limited to the United States, covering various states. The data spans different weeks, with the prediction task focusing on the occurrence of an influenza outbreak for the next date or week. Influenza activity levels are categorised from minimal to high, with an outbreak specifically indicated when the activity level is high according to the CDC Flu Activity Map.

License

CC-BY

Who Can Use It

This dataset is suitable for data scientists and machine learning engineers interested in building predictive models for disease surveillance. Public health researchers and epidemiologists can use it for studying influenza spread patterns and developing intervention strategies. It is also relevant for social media analysts and natural language processing (NLP) practitioners focused on health-related text data.

Dataset Name Suggestions

  • Influenza Outbreak Event Prediction via Twitter
  • US Flu Outbreak Forecasting Dataset
  • Twitter-Based Influenza Activity Data
  • Spatiotemporal Flu Prediction Dataset

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

26/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format