著者
Akiyo HIRAI Rie KOIZUMI
出版者
Japan Language Testing Association
雑誌
日本言語テスト学会研究紀要 (ISSN:2433006X)
巻号頁・発行日
vol.11, pp.1-20, 2008-09-20 (Released:2017-08-07)
被引用文献数
1

Among different types of rating scales in scoring speaking performance, the EBB (Empirically derived, Binary-choice, Boundary-definition) scale is claimed to be easy to use and highly reliable (Turner & Upshur, 1996; 2002). However, it has been questioned whether the EBB scale can be applied to other tasks. Thus, in this study, an EBB scale was compared with an analytic scale in terms of validity, reliability, and practicality. Fifty-two EFL learners were asked to read and retell four stories in a semi-direct Story Retelling Speaking Test (SRST). Their performances were scored using these two rating scales, and then the scores were compared by using generalizability theory, a multitrait-multimethod approach, and a questionnaire delivered to the raters. As a result, the EBB scale, which consists of four criteria, was found to be more generalizable (i.e., reliable) than those of the analytic scale and generally assessed the intended constructs. However, the present EBB scale turned out to be less practical than the analytic scale due to its binary format and because it had more levels in each criterion. Further revisions seeking a better scale for the SRST are suggested.
著者
Akiyo HIRAI Yusuke KONDO Ryoko FUJITA
出版者
The Japan Association for Language Education and Technology
雑誌
外国語教育メディア学会機関誌 (ISSN:21857792)
巻号頁・発行日
vol.58, pp.17-41, 2021 (Released:2021-08-18)

This study examines the accuracy of an automated speech scoring system. The system graded English language learners’ retelling performances according to five features, and its scores were compared to those given by both non-native and native English-speaking (NNES and NES) raters. The results show that, of the five features, words per second was the most consistent predictor of both NNES and NES evaluations. However, the NNES rater tended to pay more attention to exact word similarities between the speech utterances and the original text, while the NES raters focused more on similarities of meaning and gave credit to rephrased expressions. Additionally, the correspondence between the automated scores and those given by human raters was moderate (exact agreement = 48% to 65%; rs = .48 to .52), though less than that between the NNES and NES scores (rs = .70). These results indicate that the automated scoring system for retelling performances may be applicable to low-stakes tests if the speech transcription of learners’ utterances is obtained.