Opendatabay APP

Allegro Articles Summarization Source-Target Pairs

Data Science and Analytics

Tags and Keywords

Text

Summarisation

Articles

Nlp

Training

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Allegro Articles Summarization Source-Target Pairs Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Data is a valuable dataset specifically tailored for training and evaluating the performance of an advanced text summarisation model. It consists of source-target pairs, where the source column represents the original source text or article, and the target column contains the desired summary corresponding to that source text. This material facilitates research and development in text summarisation techniques, focusing on Allegro Articles summarisation tasks. By leveraging this resource, researchers can design accurate and sophisticated models that significantly enhance the ability to automatically summarise long-form texts efficiently across diverse domains such as news articles, blog posts, and academic papers.

Columns

The dataset is provided across three separate files: validation.csv, train.csv, and test.csv. Each file contains two columns:
  • source: The source text or article from Allegro Articles from which the summarisation is to be generated (Text).
  • target: The desired output summarisation or target summary of the corresponding source text (Text).
For instance, the test.csv file contains 2 columns across 20.3k valid records.

Distribution

The material is distributed in three distinct CSV files: validation.csv, train.csv, and test.csv. The test.csv file is 62.18 MB in size. Key fields in the test.csv file, such as source and target, contain 20.3k valid records, ensuring 100% validity for both fields with zero mismatched or missing records. The expected update frequency is Never.

Usage

This resource is ideally suited for training and evaluating text summarisation models, particularly for generating summaries from source articles. Developers can use the data to build algorithms or systems that automatically generate concise summaries from longer texts. The dataset allows for the comparison of different text summarisation techniques or methodologies by using various algorithms on the same source articles. It is also used for the rigorous testing of trained models using unseen instances in test.csv.

Coverage

The scope focuses on Allegro Articles summarisation tasks. The content includes source text/articles paired with their target summaries. The material covers diverse domains such as news articles, blog posts, and academic papers. The data is meticulously curated and diversified across subsets for training (train.csv), validation (validation.csv), and testing (test.csv).

License

CC0 1.0 Universal (CC0 1.0) - Public Domain

Who Can Use It

The dataset is intended for researchers and developers focused on text summarisation techniques, algorithm development, and advancing state-of-the-art Natural Language Processing (NLP) models. Practitioners interested in evaluating and analysing the effectiveness of different summarisation strategies can also use this resource. The material holds a maximum usability rating of 10.00.

Dataset Name Suggestions

  • Allegro Articles Summarization Source-Target Pairs
  • NLP Text Summarization Training and Evaluation Resource
  • Allegro Text Summarization Corpus

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

18/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format