著者
Satoshi Matsuoka Hideharu Amano Kengo Nakajima Koji Inoue Tomohiro Kudoh Naoya Maruyama Kenjiro Taura Takeshi Iwashita Takahiro Katagiri Toshihiro Hanawa Toshio Endo
雑誌
研究報告ハイパフォーマンスコンピューティング(HPC) (ISSN:21888841)
巻号頁・発行日
vol.2016-HPC-155, no.32, pp.1-14, 2016-08-01

Slowdown and inevitable end in exponential scaling of processor performance, the end of the so-called “Moore's Law”is predicted to occur around 2025-2030 timeframe. Because CMOS semiconductor voltage is also approaching its limits, this means that logic transistor power will become constant, and as a result, the system FLOPS will cease to improve, resulting in serious consequences for IT in general, especially supercomputing. Existing attempts to overcome the end of Moore 's law are rather limited in their future outlook or applicability. We claim that data-oriented parameters, such as bandwidth and capacity, or BYTES, are the new parameters that will allow continued performance gains for periods even after computing performance or FLOPS ceases to improve, due to continued advances in storage device technologies and optics, and manufacturing technologies including 3-D packaging. Such transition from FLOPS to BYTES will lead to disruptive changes in the overall systems from applications, algorithms, software to architecture, as to what parameter to optimize for, in order to achieve continued performance growth over time. We are launching a new set of research efforts to investigate and devise new technologies to enable such disruptive changes from FLOPS to BYTES in the Post-Moore era, focusing on HPC, where there is extreme sensitivity to performance, and expect the results to disseminate to the rest of IT.
著者
Hideharu AMANO Akiya JOURAKU Kenichiro ANJO
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Communications (ISSN:09168516)
巻号頁・発行日
vol.E86-B, no.12, pp.3385-3391, 2003-12-01

A framework of dynamically adaptive hardware mechanism on multicontext reconfigurable devices is proposed, and as an example, an adaptive switching fabric is implemented on NEC's novel reconfigurable device DRP (Dynamically Reconfigurable Processor). In this switch, contexts for the full crossbar and alternative hadware modules, which provide larger bandwidth but can treat only a limited pattern of packet inputs, are prepared. Using the quick context switching functionality, a context for the full crossbar is replaced by alternative contexts according to the packet inputs pattern. If the context corresponding to requested alternative hadware modules is not inside the chip, it is loaded from outside chip to currently unused context memory, then replaced with the full size crossbar. If the traffic includes a lot of packets for specific destinations, a set of contexts frequently used in the traffic is gathered inside the chip like a working set stored in a cache. 4 4 mesh network connected with the proposed adaptive switches is simulated, and it appears that the latency between nodes is improved three times when the traffic between neighboring four nodes is dominant.