ScisummNet Co
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Context
This large corpus can be used to train scientific paper summarization models that utilize citations, facilitating research in
supervised methods.
Previous datasets for scientific document summarization are small with only several dozen articles. This dataset includes 1000 examples which is much larger than the prior works.
Content
I acquired this dataset from here in XML format. The CL-Scisumm project developed the first large-scale, human-annotated Scisumm dataset, ScisummNet. It provides over 1,000 papers in the ACL anthology network with their citation networks (e.g. citation sentences, citation counts) and their comprehensive, manual summaries.
The text column has every token of the research paper, and the summary column consists of summaries of the scientific paper.
Acknowledgements
This dataset is possible by the CL-Scisumm shared task, which has been organized since 2014 for papers in the computational linguistics and NLP domain.
Inspiration
This dataset should be trained with SOTA models and perform better than the model proposed by the SCisummNet.
License
CC-BY-SA
Original Data Source: ScisummNet Corpus