キーワード抽出を実現する文書頻度分析

2 0 0 0 キーワード抽出を実現する文書頻度分析

著者: 武田善行梅村恭司
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会研究報告自然言語処理(NL) (ISSN:09196072)
巻号頁・発行日: vol.2001, no.112, pp.27-32, 2001-11-20
参考文献数: 7
被引用文献数: 2

反復度とは文書においてある部分文字列が1回以上出現するという条件でその部分文字列が2回以上出現する度合いである.本論文では英語において観測されているキーワードの反復出現が日本語においても観測できることを確かめた.英語同様に,キーワードの反復度はその頻度に対して無相関であった.一方,ランダムに切り出された文字列の反復度はばらついていた.この分析を日本語論文抄録と数年の日本語新聞記事で行い,反復度がキーワード境界の特定が可能な情報を持つことを示した.Adaptation is the degree in which a substring appears twice or more, when it appears once or more in a document. Adaptation of the keyword has been observed in English. Similarly, it is observed in Japanese and Chinese. We have observed that adaptation of a keyword tends to have no correlation with just like English. On the other hand, the estimated value varies in strings that are selected at random. We analyzed adaptation using newspaper article of several years and technical abstracts. We have tried to extract keywords using the difference of this distribution. We show that adaptation contains the information with which keyword boundaries are obtained.

https://ci.nii.ac.jp/naid/110002935334

言及状況

はてなブックマーク (1 users, 1 posts)

Twitter (1 users, 1 posts, 0 favorites)

あんまり関係ないけどキーワード抽出の話ならこれはかわいいですね http://ci.nii.ac.jp/naid/110002935334/

2 0 0 0 キーワード抽出を実現する文書頻度分析

言及状況

はてなブックマーク (1 users, 1 posts)

Twitter (1 users, 1 posts, 0 favorites)

収集済み URL リスト