組み合わせ的確率モデルに基づく特徴単語選択方法 -超幾何分布の応用-

1 0 0 0 組み合わせ的確率モデルに基づく特徴単語選択方法 -超幾何分布の応用-

著者: 久光徹丹羽芳樹
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会研究報告自然言語処理(NL) (ISSN:09196072)
巻号頁・発行日: vol.2000, no.107, pp.85-90, 2000-11-21
参考文献数: 4
被引用文献数: 5

与えられた文書集合を特徴付ける単語を選出することは,様々に応用できる有用技術である。「文書集合を特徴付ける」を,「文書集合中に特異的に多く現れる」と解釈し,これを捉えるために,文書集合D中の単語wに対し,以下の確率値に基づく重み付けを提案する。すなわち,全文書D_0中の単語数をN,wのD_0中での頻度をK,Dの単語数をn,wのD中での頻度をkとしたとき,「N個の玉の中にK個の赤い玉があるとき,任意に取り出したn個の玉の中に赤い玉がk個以上含まれる確率」が小さいほど,wに大きな重みを与えるのである。この指標の有効性を,5指標に関する比較実験により示し,併せて上記の確率の効率的計算方法を述べる.This paper proposes a method of selecting "characteristic words" from a document set. The selection is done by using the weight that is assigned to each word in the document set. The weight is calculated by using the hypergeometric distribution. A comparative evaluation of five methods of word weighting (including tf-idf and SMART) revealed that the proposed method is superior to existing methods. An effiecient method of calculating the hypergeometric probability is also shown.

2018-04-17 00:45:23
1 + 0 Twitter

https://ci.nii.ac.jp/naid/110002935251

言及状況

Twitter (1 users, 1 posts, 0 favorites)

こんな論文どうですか？組み合わせ的確率モデルに基づく特徴単語選択方法　－超幾何分布の応用－(久光徹ほか),2000 https://t.co/1GgJLQnAiE

収集済み URL リスト

https://ci.nii.ac.jp/naid/110002935251 (1)