著者
Shungo Kumazawa Kazushi Kawamura Thiem Van Chu Masato Motomura Jaehoon Yu
出版者
IJNC Editorial Committee
雑誌
International Journal of Networking and Computing (ISSN:21852839)
巻号頁・発行日
vol.11, no.2, pp.215-230, 2021 (Released:2021-07-08)
参考文献数
14

Training machine learning models on edge devices is always a conflict with power consumption and computing cost. This paper introduces a hardware-oriented training method called ExtraFerns for a unique subset of decision tree ensembles, which significantly decreases memory access and optimizes each tree in parallel. ExtraFerns benefits from the advantages of both extraTrees and randomFerns. As extraTrees does, it generates nodes by randomly selecting attributes and generating thresholds. Then, as randomFerns does, it builds ferns, which are decision trees that share identical nodes at each depth. In contrast to other ensemble methods using greedy optimization, ExtraFerns attempts global optimization of each fern. Experimental results show that ExtraFerns requires only 4.3% and 4.1% memory access for training models with 3.0% and 1.2% accuracy drops compared with randomForest and extraTrees, respectively. This paper also proposes applying lightweight random projection to ExtraFerns as a preprocessing step, which achieved a further accuracy improvement of up to 2.0% for image datasets.
著者
EricS.Fukuda Hiroaki Inoue Takashi Takenaka Dahoo Kim Tsunaki Sadahisa Tetsuya Asai Masato Motomura
出版者
Information Processing Society of Japan
雑誌
Journal of Information Processing (ISSN:18826652)
巻号頁・発行日
vol.23, no.2, pp.143-152, 2015-03-15

Memcached has been widely accepted as a technology to improve the response speed of web servers by caching data on DRAMs in distributed servers. Because of its importance, the acceleration of memcached has been studied on various platforms. Among them, FPGA looks the most attractive platform to run memcached, and several research groups have tried to obtain a much higher performance than that of CPU out of it. The difficulty encountered there, however, is how to manage large-sized memory (gigabytes of DRAMs) from memcached hardware built in an FPGA. Some groups are trying to solve this problem by using an embedded CPU for memory allocation and another group is employing an SSD. Unlike other approaches that try to replace memcached itself on FPGAs, our approach augments the software memcached running on the host CPU by caching its data and some operations at the FPGA-equipped network interface card (NIC) mounted on the server. The locality of memcached data enables the FPGA NIC to have a fairly high hit rate with a smaller memory. In this paper, we describe the architecture of the proposed NIC cache, and evaluate the effectiveness with a standard key-value store (KVS) benchmarking tool. Our evaluation shows that our system is effective if the workload has temporal locality but does not handle workloads well without such a characteristic. We further propose methods to overcome this problem and evaluate them. As a result, we estimate that the latency improved by up to 3.5 times over software memcached running on a high performance CPU.Memcached has been widely accepted as a technology to improve the response speed of web servers by caching data on DRAMs in distributed servers. Because of its importance, the acceleration of memcached has been studied on various platforms. Among them, FPGA looks the most attractive platform to run memcached, and several research groups have tried to obtain a much higher performance than that of CPU out of it. The difficulty encountered there, however, is how to manage large-sized memory (gigabytes of DRAMs) from memcached hardware built in an FPGA. Some groups are trying to solve this problem by using an embedded CPU for memory allocation and another group is employing an SSD. Unlike other approaches that try to replace memcached itself on FPGAs, our approach augments the software memcached running on the host CPU by caching its data and some operations at the FPGA-equipped network interface card (NIC) mounted on the server. The locality of memcached data enables the FPGA NIC to have a fairly high hit rate with a smaller memory. In this paper, we describe the architecture of the proposed NIC cache, and evaluate the effectiveness with a standard key-value store (KVS) benchmarking tool. Our evaluation shows that our system is effective if the workload has temporal locality but does not handle workloads well without such a characteristic. We further propose methods to overcome this problem and evaluate them. As a result, we estimate that the latency improved by up to 3.5 times over software memcached running on a high performance CPU.