統計的言語モデルとN - best探索を用いた日本語形態素解析法

1 0 0 0 統計的言語モデルとN - best探索を用いた日本語形態素解析法

著者: 永田昌明
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日: vol.40, no.9, pp.3420-3431, 1999-09-15
被引用文献数: 15

本論文では統計的言語モデルとN-best探索アルゴリズムを用いた新しい日本語形態素解析法を提案する. 本方法は未知語の確率モデルを持つことにより任意の日本語文を高精度に解析し確率が大きい順に任意個の形態素解析候補を求められる. EDRコーパスの部分集合(約19万文約470万語)を用いて言語モデルの学習を行いオープンテキスト100文に対してテストを行ったところ単語分割の精度は第1候補で再現率94.6%適合率93.5% 上位五候補で再現率97.8%適合率88.3%であった.We present a novel method for Japanese morphological analysis which uses a statistical language model and an N-best search algorithm. It has a probabilistic model for unknown words to parse unrestricted Japanese sentences accurately and it can get N-best morphological analysis hypotheses. When the statistical Japanese morphological analyzer was trained on the subset of the EDR corpus (about 190 thousand sentences, 4.7 million words) and tested on 100 sentences of open text, it achieved 94.6% recall and 93.5% precision for the top candidate, and 97.8% recall and 88.3% precision for the top five candidates.

2010-06-28 11:42:19
1 はてなブックマーク

https://ci.nii.ac.jp/naid/110002725063

言及状況

はてなブックマーク (1 users, 1 posts)

[nlp] "統計的言語モデルとN-best探索を用いた日本語形態素解析法 "

収集済み URL リスト

https://ci.nii.ac.jp/naid/110002725063/ (1)