Opendatabay APP

ScisummNet Co

Data Science and Analytics

Tags and Keywords

earth

and

nature

beginner

text

nlp

deep

learning

pytorch

Trusted By
Trusted by company1Trusted by company2Trusted by company3
ScisummNet Co Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Context This large corpus can be used to train scientific paper summarization models that utilize citations, facilitating research in supervised methods.
Previous datasets for scientific document summarization are small with only several dozen articles. This dataset includes 1000 examples which is much larger than the prior works.
Content I acquired this dataset from here in XML format. The CL-Scisumm project developed the first large-scale, human-annotated Scisumm dataset, ScisummNet. It provides over 1,000 papers in the ACL anthology network with their citation networks (e.g. citation sentences, citation counts) and their comprehensive, manual summaries.
The text column has every token of the research paper, and the summary column consists of summaries of the scientific paper.
Acknowledgements This dataset is possible by the CL-Scisumm shared task, which has been organized since 2014 for papers in the computational linguistics and NLP domain.
Inspiration This dataset should be trained with SOTA models and perform better than the model proposed by the SCisummNet.

License

CC-BY-SA
Original Data Source: ScisummNet Corpus

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

22/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free