文献一覧: 谷川龍司 (著者)

2 0 0 0 IR 長い品詞列を文書特徴とした母語話者英文書・非母語話者英文書の判別

著者: 行野顕正青木さやか谷川龍司 [ 他 ]
出版者: 九州大学
雑誌: Research reports on information science and electrical engineering of Kyushu University (ISSN:13423819)
巻号頁・発行日: vol.11, no.2, pp.115-119, 2006-09

We propose using long and low-frequency part of speech (POS) strings for document separation between native English documents and non-native English documents. The long POS strings were ignored in previous works because their frequencies in training data are too small to estimate their probabilities. Meanwhile, a research of language identification showed that the long and low-frequency byte strings were useful for language identification among similar languages. There are some similarity between language identification and document separation between native English documents and non-native English documents, for example long POS strings are more peculiar to one class than short ones, though there is a difference between POS and byte. Therefore, we can expect higher accuracy by using long and low-frequency POS strings. Some experiments are described in this paper. These experiments show that the proposed method has higher accuracy than previous ones.

2016-06-07 12:18:33
2 + 0 Twitter

1 0 0 0 LE_004 言語識別技術を応用した英語における母語話者文書・非母語話者文書の判別(E分野:自然言語)

著者: 青木さやか冨浦洋一行野顕正谷川龍司
出版者: FIT(電子情報通信学会・情報処理学会)運営委員会
雑誌: 情報科学技術レターズ
巻号頁・発行日: vol.5, pp.85-88, 2006-08-21

2016-06-07 11:45:36
1 + 0 Twitter

https://ci.nii.ac.jp/naid/110007640907