著者
國府 久嗣 山崎 治子 野坂 政司
出版者
日本感性工学会
雑誌
日本感性工学会論文誌 (ISSN:18840833)
巻号頁・発行日
vol.12, no.4, pp.511-518, 2013 (Released:2013-12-11)
参考文献数
17
被引用文献数
3

Extracting keywords from a target text data is essential for an analysis to describe substance characteristics of message content. We picked a use of a stopword filter from among alternatives because the method has the advantage that it is simple yet effective way. The filter we present was made up of non-content words and low-content words. Non-content-bearing words consisted mainly of function words and were gotten rid of by using part-of-speech (POS) tag information. High occurrence rate words in remaining had prospects of being keywords, however usually there were some low-content words like delexical verbs and so on. This article presents a stopword list obtained to come up with low-content words by sensuous manual procedures carried out using 40 text files from the CASTEL/J database and establishes it in the view of general versatility.