著者
山崎 高弘 常盤 欣一朗
出版者
一般社団法人 電気学会
雑誌
電気学会論文誌C(電子・情報・システム部門誌) (ISSN:03854221)
巻号頁・発行日
vol.132, no.9, pp.1524-1532, 2012-09-01 (Released:2012-09-01)
参考文献数
10

This paper describes a method of readability assessment for web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined whether a reader can easily grasp text structures. The impression and the complexity of text are significant factors. We extract the features about impression and complexity from a plain text and additional data such as HTML tags.In order to compare effect of extracting features, we are assessing readability rank by machine learning. We conduct 5-fold cross validation for each domain, and calculate the root mean squared error between the actual rank and the estimated rank. The cross validation experiments confirm that the performance of our method is high measured. It shows effectiveness of extracting features about the impression and the complexity for readability assessment.