著者
Eiji YOSHIYA Tomoya NAKANISHI Tsuyoshi ISSHIKI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (ISSN:09168508)
巻号頁・発行日
vol.E105-A, no.7, pp.1061-1069, 2022-07-01
被引用文献数
2

In Internet of Things (IoT) applications, system-on-chip (SoCs) with embedded processors are widely used. As an embedded processor, RISC-V, which is license-free and has an extensible instruction set, is receiving attention. However, designing such embedded processors requires an enormous effort to achieve a highly efficient microarchitecture in terms of performance, power consumption, and circuit area, as well as the design verification of running complex software, including modern operating systems such as Linux. In this paper, we propose a method for directly describing the RTL structure of a pipelined RISC-V processor with cache memories, a memory management unit (MMU), and an AXI bus interface using the C++ language. This pipelined processor C++ model serves as a functional simulator of the complete RISC-V core, whereas our C2RTL framework translates the processor C++ model into a cycle-accurate RTL description in the Verilog-HDL and RTL-equivalent C model. Our processor design methodology using the C2RTL framework is unique compared to other existing methodologies because both the simulation and RTL models are derived from the same C++ source, which greatly simplifies the design verification and optimization processes. The effectiveness of our design methodology is demonstrated on a RISC-V processor that runs Linux OS on an FPGA board, achieving a significantly short simulation time of the original C++ processor model and RTL-equivalent C model in comparison to a commercial RTL simulator.
著者
Farhan Shafiq Tsuyoshi Isshiki Dongju Li
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on System LSI Design Methodology (ISSN:18826687)
巻号頁・発行日
vol.10, pp.13-27, 2017 (Released:2017-02-03)
参考文献数
38

Accurate and fast performance estimation methods for modern and future multi-core systems are the focal point of much research due to the complexity associated with such architectures. The communication architecture of such systems has a huge impact on the performance and power of the whole system. Architects need to explore many design possibilities by using performance estimation techniques at early stages of design to make design decisions earlier in the design cycle. While software developers need to develop and test applications for the target architecture and gather performance measurements as early in the design cycle as possible. Full system simulation techniques provide accurate performance values but are extremely time consuming. Static analysis techniques are fast but cannot capture the dynamic behavior associated with shared resource contention and arbitration. Moreover, synthetic traffic patterns have been used to analyze the communication architecture however, such patterns are not realistic enough. We propose a statistical based model to predict the dynamic cost of bus arbitration on the performance of a bus architecture. The proposed model uses workload trace of the actual applications and benchmarks to capture the real application traffic behavior. Statistics on the traffic patterns are collected and input to the analytical model which calculates performance values for the communication architecture under consideration. By knowing the performance measures, designers can avoid over and under-design of the communication architecture. This paper builds up on a previously developed performance estimation model. The previous work modeled single and burst bus-transfers, however only one interfering bus master at a time for each blocked bus request was considered. The proposed, improved accuracy model considers multiple interfering masters for each blocked request hence improving the estimation accuracy especially for traffic intensive applications and many PE architectures. Experiments are performed for two different architectures i.e., 4 processing elements connected via a shared bus and 8 processing elements connected via a shared bus. Results show no significant difference in accuracy compared to previously developed model, for low traffic applications SPARSE and ROBOT however notable accuracy improvement for traffic intensive applications. Maximum estimation error is reduced from 1.75% to 0.6% for FPPPP and from maximum 13.91% to 8.8% for FFT on the 4PE architecture. On the 8PE architecture, maximum estimation error is reduced from 11.8% to 2.7% for the FPPP benchmark. Moreover simulation speed-up for the proposed technique over simulation method is reported.