著者
乾 孝司 村上 浩司 橋本 泰一 内海 和夫 石川 正道
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.24, no.6, pp.469-479, 2009 (Released:2009-08-07)
参考文献数
28
被引用文献数
1

This paper presents a method for boosting the performance of the organization name recognition, which is a part of named entity recognition (NER). Although gazetteers (lists of the NEs) have been known as one of the effective features for supervised machine learning approaches on the NER task, the previous methods which have applied the gazetteers to the NER were very simple. The gazetteers have been used just for searching the exact matches between input text and NEs included in them. The proposed method generates regular expression rules from gazetteers, and, with these rules, it can realize a high-coverage searches based on looser matches between input text and NEs. To generate these rules, we focus on the two well-known characteristics of NE expressions; 1) most of NE expressions can be divided into two parts, class-reference part and instance-reference part, 2) for most of NE expressions the class-reference parts are located at the suffix position of them. A pattern mining algorithm runs on the set of NEs in the gazetteers, and some frequent word sequences from which NEs are constructed are found. Then, we employ only word sequences which have the class-reference part at the suffix position as suffix rules. Experimental results showed that our proposed method improved the performance of the organization name recognition, and achieved the 84.58 F-value for evaluation data.
著者
山本 幹雄 乾 孝司
出版者
筑波大学
雑誌
基盤研究(B)
巻号頁・発行日
2009

本研究では、高精度かつ長距離のフレーズ並び替えを可能とするルールの抽出手法を開発した。抽出されたフレーズ並び替えルールの特徴は、フレーズの並び替えに重要な働きをする機能語(助詞など)の対訳関係を中心に語彙化されている点である。これにより、翻訳対象である文構造を的確に捉えながらフレーズの並び替えが可能となる。日英の翻訳実験において、提案ルールによって翻訳性能を改善できることを明らかにした。