著者
Dan Han Pascual Martínez-Gómez Yusuke Miyao Katsuhito Sudoh Masaaki Nagata
出版者
Information and Media Technologies Editorial Board
雑誌
Information and Media Technologies (ISSN:18810896)
巻号頁・発行日
vol.9, no.3, pp.272-301, 2014 (Released:2014-09-15)
参考文献数
44

In statistical machine translation, Chinese and Japanese is a well-known long-distance language pair that causes difficulties to word alignment techniques. Pre-reordering methods have been proven efficient and effective; however, they need reliable parsers to extract the syntactic structure of the source sentences. On one hand, we propose a framework in which only part-of-speech (POS) tags and unlabeled dependency parse trees are used to minimize the influence of parse errors, and linguistic knowledge on structural difference is encoded in the form of reordering rules. We show significant improvements in translation quality of sentences in the news domain over state-of-the-art reordering methods. On the other hand, we explore the relationship between dependency parsing and our pre-reordering method from two aspects: POS tags and dependencies. We observe the effects of different parse errors on reordering performance by combining empirical and descriptive approaches. In the empirical approach, we quantify the distribution of general parse errors along with reordering quality. In the descriptive approach, we extract seven influential error patterns and examine their correlations with reordering errors.