- 著者
-
Yasuichi Nakayama
Yasushi Kuno
Hiroyasu Kakuda
- 出版者
- Information Processing Society of Japan
- 雑誌
- Journal of Information Processing (ISSN:18826652)
- 巻号頁・発行日
- vol.28, pp.733-743, 2020 (Released:2020-11-15)
- 参考文献数
- 23
- 被引用文献数
-
1
There is a great need to evaluate and/or test programming performance. For this purpose, two schemes have been used. Constructed response (CR) tests let the examinee write programs on a blank sheet (or with a computer keyboard). This scheme can evaluate the programming performance. However, it is difficult to apply in a large volume because skilled human graders are required (automatic evaluation is attempted but not widely used yet). Multiple choice (MC) tests let the examinee choose the correct answer from a list (often corresponding to the “hidden” portion of a complete program). This scheme can be used in a large volume with computer-based testing or mark-sense cards. However, many teachers and researchers are suspicious in that a good score does not necessarily mean the ability to write programs from scratch. We propose a third method, split-paper (SP) testing. Our scheme splits a correct program into each of its lines, shuffles the lines, adds “wrong answer” lines, and prepends them with choice symbols. The examinee answers by using a list of choice symbols corresponding to the correct program, which can be easily graded automatically by using computers. In particular, we propose the use of edit distance (Levenshtein distance) in the scoring scheme, which seems to have affinity with the SP scheme. The research question is whether SP tests scored by using an edit-distance-based scoring scheme measure programming performance as do CR tests. Therefore, we conducted an experiment by using college programming classes with 60 students to compare SP tests against CR tests. As a result, SP and CR test scores are correlated for multiple settings, and the results were statistically significant. Therefore, we might conclude that SP tests with automatic scoring using edit distance are useful tools for evaluating the programming performance.