- 著者
-
牧野 拓哉
岩倉 友哉
- 出版者
- 一般社団法人 人工知能学会
- 雑誌
- 人工知能学会論文誌 (ISSN:13460714)
- 巻号頁・発行日
- vol.35, no.6, pp.B-K46_1-8, 2020-11-01 (Released:2020-11-01)
- 参考文献数
- 25
- 被引用文献数
-
1
Pointer-generator, which is the one of the strong baselines in neural summarization models, generates summaries by selecting words from a set of words (output vocabulary) and words in source documents. A conventional method for constructing output vocabulary collects highly frequent words in summaries of training data. However, highly frequent words in summaries could be usually a high possibility to be frequent in source documents. Thus, an output vocabulary constructed by the conventional method is redundant for pointer-generator because pointergenerator can copy words in source documents. We propose a vocabulary construction method that selects words included in each summary but not included in its source text of each pair. Experimental results on CNN/Daily Mail corpus and NEWSROOM corpus showed that our method contributes to improved ROUGE scores while obtaining high ratios of generating novel words that do not occur in source documents.