Keyphrase Generation Beyond the Boundaries of Title and Abstract

Krishna Garg; Jishnu Ray Chowdhury; Cornelia Caragea

doi:10.18653/v1/2022.findings-emnlp.427

Keyphrase Generation Beyond the Boundaries of Title and Abstract

Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea

Abstract

Keyphrase generation aims at generating important phrases (keyphrases) that best describe a given document. In scholarly domains, current approaches have largely used only the title and abstract of the articles to generate keyphrases. In this paper, we comprehensively explore whether the integration of additional information from the full text of a given article or from semantically similar articles can be helpful for a neural keyphrase generation model or not. We discover that adding sentences from the full text, particularly in the form of the extractive summary of the article can significantly improve the generation of both types of keyphrases that are either present or absent from the text. Experimental results with three widely used models for keyphrase generation along with one of the latest transformer models suitable for longer documents, Longformer Encoder-Decoder (LED) validate the observation. We also present a new large-scale scholarly dataset FullTextKP for keyphrase generation. Unlike prior large-scale datasets, FullTextKP includes the full text of the articles along with the title and abstract. We release the source code at https://github.com/kgarg8/FullTextKP.

Anthology ID:: 2022.findings-emnlp.427
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5809–5821
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.427/
DOI:: 10.18653/v1/2022.findings-emnlp.427
Bibkey:
Cite (ACL):: Krishna Garg, Jishnu Ray Chowdhury, and Cornelia Caragea. 2022. Keyphrase Generation Beyond the Boundaries of Title and Abstract. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5809–5821, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Keyphrase Generation Beyond the Boundaries of Title and Abstract (Garg et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.427.pdf
Software:: 2022.findings-emnlp.427.software.zip
Video:: https://aclanthology.org/2022.findings-emnlp.427.mp4

PDF Cite Search Software Video Fix data