Tweets from Spanish Politicians Dataset
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a corpus of tweets from Spanish politicians, primarily designed to support Natural Language Processing (NLP) research and development in the Spanish language [3]. While NLP is a rapidly advancing field, much of its focus remains on English. This dataset aims to address this gap by offering a valuable resource for understanding and processing natural language in Spanish, specifically focusing on political discourse [3]. It includes tweets from major Spanish political parties such as PSOE, PP, VOX, Unidas Podemos, and Ciudadanos [3]. The creators hope to inspire Spanish-speaking users to share their NLP knowledge and contribute to the community [3].
Columns
- cuenta: The hashed name of the Twitter account that published the tweet [3, 4].
- partido: The political party to which the user belongs [3, 4].
- timestamp: The exact moment the tweet was published [3, 4].
- tweet: The full textual content of the tweet [3, 4].
Distribution
The dataset is typically provided as a data file, often in CSV format, though the specific source notes a tab ('\t') as the field separator [1, 4]. While an exact total number of rows or records is not specified in the available information, the
cuenta
column is noted to have 245,369 unique values [4].Usage
This dataset is ideal for various applications within the fields of Natural Language Processing and political science [3].
- Training NLP models for the Spanish language.
- Analysing political discourse and sentiment in Spain.
- Developing language understanding algorithms specific to Spanish political contexts.
- Academic research on social media trends and political communication.
- Creating Notebooks and sharing knowledge within the NLP community, particularly for Spanish speakers [3].
Coverage
The dataset's geographic scope is Spain, focusing exclusively on tweets written by Spanish politicians from prominent parties [3]. The content is entirely in Spanish [3]. The time range is captured by the
timestamp
column, though specific start and end dates for the dataset's coverage period are not detailed [3].License
CCO
Who Can Use It
- NLP researchers and developers: Especially those focusing on Spanish language processing [3].
- Data scientists: Interested in social media analysis, sentiment analysis, or topic modelling.
- Academics and students: Studying political science, linguistics, or social media communication in Spain [3].
- Spanish-speaking Kaggle users: Encouraged to use the dataset for learning and sharing knowledge [3].
Dataset Name Suggestions
- Spanish Political Tweets Corpus
- Tweets from Spanish Politicians
- Spain Parliamentarian Tweets
- Spanish NLP Political Data
- Política España Tweets
Attributes
Original Data Source: Tweets Política España