著者
金 明哲
出版者
日本行動計量学会
雑誌
行動計量学 (ISSN:03855481)
巻号頁・発行日
vol.41, no.1, pp.35-46, 2014
被引用文献数
5

Text classification results often vary depending on the detailed factors in data analysis, including feature data, classification method, and parameter sets adopted in the analysis. The author of an anonymous text can be generally identified by extracting a set of distinctive features of the text, and then using the features to find the most likely author. Numerous efforts have been made to develop the feature extraction technique with more robustness and the classification algorithm, but an important issue is how to select the features datasets and classification method. To address this issue, we propose an integrated classification algorithm that extracts multiple feature datasets from differing viewpoints and aspects of a text and applies multiple strong classifiers to the datasets. Our proposed method achieved 100% accuracy in identifying the authors of literary works and student essays, and identified the author of all but 1 out of 60 diaries which were written by 6 different people.Our proposed method achieved equivalent or better accuracy than the case when any a strong classifier applied to individual feature dataset. Furthermore, the accuracy in identifying the authors of student essays increased by roughly two percentage points.

言及状況

外部データベース (DOI)

Twitter (2 users, 2 posts, 2 favorites)

金明哲(2014)「統合的分類アルゴリズムを用いた文章の書き手の識別」『行動計量学』41(1), 35-46 http://t.co/NSCresVMgm を頂きました。昨年度の大会の特別セッションで発表されていた内容を論文化したもの。勉強させていただきます。

収集済み URL リスト