Allegro Articles Summarization Source-Target Pairs
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Data is a valuable dataset specifically tailored for training and evaluating the performance of an advanced text summarisation model. It consists of source-target pairs, where the source column represents the original source text or article, and the target column contains the desired summary corresponding to that source text. This material facilitates research and development in text summarisation techniques, focusing on Allegro Articles summarisation tasks. By leveraging this resource, researchers can design accurate and sophisticated models that significantly enhance the ability to automatically summarise long-form texts efficiently across diverse domains such as news articles, blog posts, and academic papers.
Columns
The dataset is provided across three separate files:
validation.csv, train.csv, and test.csv. Each file contains two columns:- source: The source text or article from Allegro Articles from which the summarisation is to be generated (Text).
- target: The desired output summarisation or target summary of the corresponding source text (Text).
For instance, the
test.csv file contains 2 columns across 20.3k valid records.Distribution
The material is distributed in three distinct CSV files:
validation.csv, train.csv, and test.csv. The test.csv file is 62.18 MB in size. Key fields in the test.csv file, such as source and target, contain 20.3k valid records, ensuring 100% validity for both fields with zero mismatched or missing records. The expected update frequency is Never.Usage
This resource is ideally suited for training and evaluating text summarisation models, particularly for generating summaries from source articles. Developers can use the data to build algorithms or systems that automatically generate concise summaries from longer texts. The dataset allows for the comparison of different text summarisation techniques or methodologies by using various algorithms on the same source articles. It is also used for the rigorous testing of trained models using unseen instances in
test.csv.Coverage
The scope focuses on Allegro Articles summarisation tasks. The content includes source text/articles paired with their target summaries. The material covers diverse domains such as news articles, blog posts, and academic papers. The data is meticulously curated and diversified across subsets for training (
train.csv), validation (validation.csv), and testing (test.csv).License
CC0 1.0 Universal (CC0 1.0) - Public Domain
Who Can Use It
The dataset is intended for researchers and developers focused on text summarisation techniques, algorithm development, and advancing state-of-the-art Natural Language Processing (NLP) models. Practitioners interested in evaluating and analysing the effectiveness of different summarisation strategies can also use this resource. The material holds a maximum usability rating of 10.00.
Dataset Name Suggestions
- Allegro Articles Summarization Source-Target Pairs
- NLP Text Summarization Training and Evaluation Resource
- Allegro Text Summarization Corpus
Attributes
Original Data Source: Allegro Articles Summarization Source-Target Pairs
Loading...
