Political Communication & Bias Dataset
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains classified social media messages from US Senators and other American politicians. Contributors analysed thousands of messages to categorise their content, providing valuable insights into political communication. Each message is broken down by its intended audience (either national or the politician's specific constituency), its bias (neutral/bipartisan or biased/partisan), and the actual substance of the message, which can range from informational updates to announcements of media appearances or attacks on other candidates. This dataset is a valuable resource for understanding political discourse and communication strategies.
Columns
- unit_id: A unique identifier for each individual social media message.
- golden: A boolean flag indicating whether the record is considered a 'gold standard' for classification.
- unit_state: Represents the current processing state of the data unit.
- trusted_judgments: The count of trusted human judgments applied to the message's classification.
- last_judgment_at: The timestamp indicating when the last judgment or update was made on the record.
- audience: Categorises the intended recipient of the message as either 'national' or addressing the 'constituency' of the tweeter.
- audience_confidence: A numerical score reflecting the confidence level of the assigned audience classification.
- bias: Classifies the political stance or leaning of the message as either 'neutral' or 'partisan'.
- bias_confidence: A numerical score indicating the confidence level of the assigned bias classification.
- message: Describes the core substance or topic of the social media post, with categories such as 'policy', 'personal', 'informational', 'announcement of a media appearance', 'an attack on another candidate', or 'Other'.
Distribution
The dataset is typically provided in a CSV format, making it easily accessible for data analysis. It contains approximately 5,000 records, offering a substantial collection of classified political social media messages in a structured, tabular format.
Usage
This dataset is ideal for various applications, including:
- Developing and training Natural Language Processing (NLP) models for text classification.
- Conducting political analysis to understand communication patterns and strategies of US politicians.
- Performing sentiment analysis and bias detection in political discourse.
- Researching the effectiveness of political messaging tailored for different audiences.
- Studying the evolution of political communication on social media platforms.
Coverage
The dataset focuses on social media messages from US Senators and other American politicians. The messages were collected within a specific timeframe, primarily covering August 2015. The data provides insights into the communication styles and content distribution for political figures in the United States during this period.
License
CC0
Who Can Use It
This dataset is suitable for:
- Data Scientists and Machine Learning Engineers: For building and refining NLP models, especially for text classification and bias detection in political content.
- Academic Researchers: For studies on political science, communication, computational social science, and public opinion.
- Political Analysts and Strategists: To gain insights into political messaging effectiveness, audience targeting, and partisan communication.
- AI/LLM Developers: For training and fine-tuning large language models with real-world political discourse data.
- Media Monitoring Agencies: For tracking and analysing political narratives and biases across social media.
Dataset Name Suggestions
- Classification of Pol Social
- US Political Social Media Messages
- American Politicians' Social Media Classification
- Political Communication & Bias Dataset
- US Congressional Social Media Data
Attributes
Original Data Source: Classification of Pol Social