著者
柳瀬 利彦 廣木 桂一 伊藤 昭博 柳井 孝介
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.26, no.5, pp.621-637, 2011 (Released:2011-07-20)
参考文献数
34

We propose a computing platform for parallel machine learning. Learning from large-scale data has become common, so that parallelization techniques are increasingly applied to machine learning algorithms in order to reduce calculation time. Problems of parallelization are implementation costs and calculation overheads. Firstly, we formulate MapReduce programming model specialized in parallel machine learning. It represents learning algorithms as iterations of following two phases: applying data to machine learning models and updating model parameters. This model is able to describe various kinds of machine learning algorithms, such as k-means clustering, EM algorithm, and linear SVM, with comparable implementation cost to the original MapReduce. Secondly, we propose a fast machine learning platform which reduces the processing overheads at iterative procedures of machine learning. Machine learning algorithms iteratively read the same training data in the data application phase. Our platform keeps the training data in local memories of each worker during iterative procedures, which leads to acceleration of data access. We evaluate performance of our platform on three experiments. Our platform executes k-means clustering 2.85 to 118 times faster than the MapReduce approach, and shows 9.51 times speedup with 40 processing cores against 8 cores. We also show the performance of Variational Bayes clustering and linear SVM implemented on our platform.