コーパスに基づくシソーラス-統計情報を用いた既存のシソーラスへの未知語の配置

2 0 0 0 コーパスに基づくシソーラス-統計情報を用いた既存のシソーラスへの未知語の配置

著者: 浦本直彦
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日: vol.37, no.12, pp.2182-2189, 1996-12-15
被引用文献数: 11

本論文ではコーパスに基づくシソーラスを構築するための基礎として既存の中規模のシソーラスとコーパスを用いてシソーラスを拡張する手法について述べる. シソーラス上にない単語に対してその単語がシソーラスのどの部分に配置される可能性が高いかをコーパスから抽出した統計情報を用いて決定する. シソーラスの分類基準(視点)を自動的に獲得することで効率良く単語の位置を推定することが可能である. これらの知識を用いて拡張されたシソーラス上での位置上位語単語間の類似度などを計算する関数群を提供するためのシステムを作成した.This paper describes development of a corpus-based thesaurus system. For the purpose, a method for positioning unknown words in an existing thesaurus is proposed. A likely area of the thesaurus for an unknown word is estimated by integrating the human intuition buried in the thesaurus and statistical data extracted from the corpus. To overcome the problem of data sparseness, distinguishing featured called "viewpoints" of each node are extracted automatically and used to calculate the similarity between the unknown word and a word in the thesaurus. The results of an experiment confirm the contribution of the viewpoints to the positioning task. By using some functions for accessing the thesaurus with viewpoints, users can get information for words in the thesaurus including unknown words.

2008-11-13 22:26:00
2 はてなブックマーク

https://ci.nii.ac.jp/naid/110002723110

言及状況

はてなブックマーク (2 users, 2 posts)

[japanese][database]

[自然言語処理]

収集済み URL リスト

https://ci.nii.ac.jp/naid/110002723110/ (2)