Stock Volatility via Social Sentiment
Stock & Market Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of financial and social media metrics is engineered to investigate the connection between public sentiment expressed on Twitter and subsequent fluctuations in company share prices. The data includes 862,231 labeled instances, allowing for an in-depth exploration of how messages delivered via social media may impact stock market performance at a company level. It features sentiment analysis scores derived from sophisticated machine learning algorithms alongside traditional market metrics, enabling robust analysis of short-term and longer-term market reactions.
Columns
- TWEET: The textual content of the message posted on Twitter (String).
- STOCK: The symbol identifying the company's stock referenced in the tweet (String).
- DATE: The specific date when the tweet was published (Date).
- LAST_PRICE: The closing price of the company's stock at the precise moment the tweet occurred (Float).
- 1_DAY_RETURN: The measure of gain or loss experienced by the stock over the day immediately following the tweet (Float).
- 2_DAY_RETURN: The measure of gain or loss experienced by the stock over the two days following the tweet (Float).
- 3_DAY_RETURN: The measure of gain or loss experienced by the stock over the three days following the tweet (Float).
- 7_DAY_RETURN: The measure of gain or loss experienced by the stock over the seven days following the tweet (Float).
- PX_VOLUME: The volume of the stock traded concurrently with the time of tweeting (Integer).
- VOLATILITY_10D: A measure quantifying market fluctuation across a ten-day interval (Float).
- VOLATILITY_30D: A measure quantifying market fluctuation across a thirty-day interval (Float).
- LSTM_POLARITY: The sentiment score assigned using the Long Short-Term Memory machine learning model (Float).
- TEXTBLOB_POLARITY: The sentiment score assigned using the TextBlob algorithm (Float).
- MENTION: Indicates the number of times the associated stock was mentioned within the tweet (Integer).
Distribution
The dataset contains 862,231 labeled instances of tweet sentiment matched with stock return metrics. The primary data file,
full_dataset-release.csv, totals 235.98 MB and consists of 14 columns. The underlying file format is typically CSV.Usage
This data is highly suitable for quantitative finance research and machine learning applications. Users can apply descriptive statistics such as histograms or regression methods to identify relationships between the content and sentiment of tweets and corresponding financial data points, such as 1-day or 7-day returns measurements. It is ideal for exploring natural language processing (NLP) models aimed at predicting future market trends based on textual data. Additionally, researchers can investigate the specific influence of different types of tweets (e.g., positive versus negative, factual versus opinionated) on stock prices across various time frames.
Coverage
The sources indicate coverage across multiple companies and associated stock symbols, providing data points tied to tweets and subsequent market performance metrics. Specific geographic scope, exact time range, or demographic focus details are not available in the provided material.
License
CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
Who Can Use It
Intended users include financial analysts seeking novel data streams for market prediction, academic researchers examining the societal impact on finance, and data scientists looking to develop and test text-based predictive models for identifying upcoming market shifts due to social events or news.
Dataset Name Suggestions
- Social Media Sentiment & Stock Returns Linkage
- Twitter Influence on Share Prices
- Financial Market Reaction to Tweet Polarity
- Stock Volatility via Social Sentiment
Attributes
Original Data Source:Stock Volatility via Social Sentiment
Loading...
Free
Download Dataset in ZIP Format
Recommended Datasets
Loading recommendations...
