著者
Fuyumasa Takatsu Kohei Hiraga Osamu Tatebe
出版者
Information Processing Society of Japan
雑誌
Journal of Information Processing (ISSN:18826652)
巻号頁・発行日
vol.25, pp.438-447, 2017 (Released:2017-06-15)
参考文献数
36
被引用文献数
2

The fusion of the research field of high-performance computing (HPC) with that of big data, which has become known as the field of extreme big data, is problematic in that file creation in storage systems such as distributed file systems is not optimized. That is, the large workload leads to simultaneous creations of many files by many processes when creating checkpoints. The need to improve the file creation processes prompted us to design a scale-out distributed file system for post-petascale systems named PPFS. PPFS consists of PPMDS, which is a scale-out distributed metadata server, and PPOSS, which is a scalable distributed storage server for flash storage. The high file creation performance of PPMDS was achieved by using a key-value store for metadata storage and non-blocking distributed transactions to update multiple entries simultaneously. PPOSS depends on PPOST, which is an object storage system that manages the underlying low-level storage, such as Fusion IO ioDrive, a flash device connected through PCI express supporting OpenNVM. The high file creation performance was attained by implementing the PPFS prototype using file creation optimization, termed bulk creation, to reduce the amount of communication between PPMDS and PPOSS. And, to enhance the I/O performance of PPOSS when the client process and PPOSS run on the same node, PPOSS accesses a local storage device directly. The prototype implementation of PPFS with a further file creation optimization called object prefetching achieves 138, 000 Operations Per Second for file creation when using five metadata servers and 128 client processes, thereby exceeding the performance of IndexFS by 2.52 times. With local access optimization, PPOSS reached its limit at a block size of 16KiB, which is an improvement of 1.5 times compared to before optimization. Furthermore, this evaluation indicates that PPFS has a good scalability on file creation and IO performance, that is required for post-petascale systems.
著者
Ken T. Murata Hidenobu Watanabe Kazunori Yamamoto Eizen Kimura Masahiro Tanaka Osamu Tatebe Kentaro Ukawa Kazuya Muranaga Yutaka Suzuki Hirotsugu Kojima
出版者
(社)電子情報通信学会
雑誌
IEICE Communications Express (ISSN:21870136)
巻号頁・発行日
vol.3, no.2, pp.74-79, 2014-02-25 (Released:2014-02-25)
参考文献数
6
被引用文献数
4 7

A variety of satellite missions are carried out every year. Most of the satellites yield big data, and high-performance data processing technologies are expected. We have been developing a cloud system (the NICT Science Cloud) for big data analyses of Earth and Space observations via spacecraft. In the present study, we propose a new technique to process big data considering the fact that high-speed I/O (data file read and write) is important compared with data processing speed. We adopt a task scheduler, the Pwrake, for easy development and management of parallel data processing. Using a set of long-time scientific satellite observation data (GEOTAIL satellite), we examine the performance of the system on the NICT Science Cloud. We successfully archived high-speed data processing more than 100 times faster than those on traditional data processing environments.