Arabic Soccer News Corpus
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides Arabic news articles focused on the Saudi MBS football league, capturing the heightened competition and significant investments in recent years. Its primary purpose is to enable advanced analytical tasks such as sentiment analysis, prediction modelling, and clustering of news content. This allows users to gain insights into how the intense competition within the league is described and perceived through media. The news articles cover a specific period, detailing events, team activities, and match-related discussions.
Columns
- writer: The individual or organisation responsible for writing the news article.
- location: The geographical location associated with the news piece, such as Riyadh or Jeddah.
- date: The date the news was published, formatted as
yyyy-mm-dd
. - time: The time the news was published, formatted as
hh:mm
. - news: The full textual content of the news article.
- title: The headline or title of the news article.
- class: A categorical indicator for the type of news:
- 0: Informative news specifically about teams.
- 1: News directly related to a football match.
- 2: General news not specific to particular teams.
Distribution
This dataset is provided as a CSV (Comma Separated Values) file. While the precise total number of rows or records is not stated, it contains news articles predominantly from late 2020 to early 2021, with various daily and multi-day counts reported, for instance, ranging from 26 to 752 records on specific days or periods. A total of 1996 values are referenced, suggesting the overall scale of the data.
Usage
This dataset is ideal for:
- Sentiment analysis to gauge public and media sentiment towards MBS league teams and events.
- Predictive modelling to forecast the class or type of news.
- Text clustering to identify common themes or narratives within Saudi football news.
- Analysing news trends, such as identifying the most frequent words per month or determining locations with the highest news coverage.
- Developing Natural Language Processing (NLP) applications for Arabic sports content.
Coverage
The dataset primarily covers Saudi Arabian football news related to the MBS league. The news articles were published between 12 December 2020 and 25 January 2021. Geographical coverage includes locations such as Riyadh and Jeddah, with writers from various sources. There are no specific notes on data availability for particular demographic groups, as the focus is on news content.
License
CC BY-NC-SA
Who Can Use It
This dataset is suitable for:
- Data scientists and machine learning engineers working on NLP, sentiment analysis, or classification tasks in Arabic.
- Sports analysts and researchers interested in media coverage and trends within Saudi football.
- Media companies looking to understand content performance and audience engagement with sports news.
- Academics studying Arabic text, media, or sports sociology.
Dataset Name Suggestions
- Saudi Football News
- MBS League News Articles
- Arabic Soccer News Corpus
- Saudi Pro League Media Data
- Middle East Football News
Attributes
Original Data Source: Saudi Soccer News - Arabic