文献一覧: 山崎治子 (著者)

3 0 0 0 OA 内容推測に適したキーワード抽出のための日本語ストップワード

著者: 國府久嗣山崎治子野坂政司
出版者: 日本感性工学会
雑誌: 日本感性工学会論文誌 (ISSN:18840833)
巻号頁・発行日: vol.12, no.4, pp.511-518, 2013 (Released:2013-12-11)
参考文献数: 17
被引用文献数: 3

Extracting keywords from a target text data is essential for an analysis to describe substance characteristics of message content. We picked a use of a stopword filter from among alternatives because the method has the advantage that it is simple yet effective way. The filter we present was made up of non-content words and low-content words. Non-content-bearing words consisted mainly of function words and were gotten rid of by using part-of-speech (POS) tag information. High occurrence rate words in remaining had prospects of being keywords, however usually there were some low-content words like delexical verbs and so on. This article presents a stopword list obtained to come up with low-content words by sensuous manual procedures carried out using 40 text files from the CASTEL/J database and establishes it in the view of general versatility.