- 著者
-
Wei GAO
Lin HAN
Rongcai ZHAO
Yingying LI
Jian LIU
- 出版者
- The Institute of Electronics, Information and Communication Engineers
- 雑誌
- IEICE Transactions on Information and Systems (ISSN:09168532)
- 巻号頁・発行日
- vol.E100.D, no.1, pp.91-106, 2017-01-01 (Released:2017-01-01)
- 参考文献数
- 32
Single-instruction multiple-data (SIMD) extension provides an energy-efficient platform to scale the performance of media and scientific applications while still retaining post-programmability. However, the major challenge is to translate the parallel resources of the SIMD hardware into real application performance. Currently, all the slots in the vector register are used when compilers exploit SIMD parallelism of programs, which can be called sufficient vectorization. Sufficient vectorization means all the data in the vector register is valid. Because all the slots which vector register provides must be used, the chances of vectorizing programs with low SIMD parallelism are abandoned by sufficient vectorization method. In addition, the speedup obtained by full use of vector register sometimes is not as great as that obtained by partial use. Specifically, the length of vector register provided by SIMD extension becomes longer, sufficient vectorization method cannot exploit the SIMD parallelism of programs completely. Therefore, insufficient vectorization method is proposed, which refer to partial use of vector register. First, the adaptation scene of insufficient vectorization is analyzed. Second, the methods of computing inter-iteration and intra-iteration SIMD parallelism for loops are put forward. Furthermore, according to the relationship between the parallelism and vector factor a method is established to make the choice of vectorization method, in order to vectorize programs as well as possible. Finally, code generation strategy for insufficient vectorization is presented. Benchmark test results show that insufficient vectorization method vectorized more programs than sufficient vectorization method by 107.5% and the performance achieved by insufficient vectorization method is 12.1% higher than that achieved by sufficient vectorization method.