著者
Wataru Endo Shigeyuki Sato Kenjiro Taura
出版者
Information Processing Society of Japan
雑誌
Journal of Information Processing (ISSN:18826652)
巻号頁・発行日
vol.30, pp.269-282, 2022 (Released:2022-03-15)
参考文献数
26
被引用文献数
1

User-level threading or task-parallel systems have been developed over decades to provide efficient and flexible threading features missing from kernel-level threading for both parallel and concurrent programming. Some of the existing state-of-the-art user-level threading libraries provide interfaces to customize the implementation of thread scheduling to adapt to different workloads from both applications and upper-level systems. However, most of them are typically built as huge sets of monolithic components which achieve customizability with additional costs via concrete C APIs. We have noticed that the zero-overhead abstraction of C++ is beneficial for assembling flexible user-level threading in a clearer manner. To demonstrate our ideas, we have implemented a new user-level threading library ComposableThreads which provides customizability while minimizing the interfacing costs. We show that the users can pick up, insert, or replace the individual classes of ComposableThreads for their own purposes. ComposableThreads offers several characteristic abstractions to build high-level constructs of user-level threading including suspended threads (one-shot continuations) and lock delegators. We evaluate both the customizability and performance of our runtime system through the microbenchmark and application results.
著者
Takato Hideshima Shigeyuki Sato Kenjiro Taura
出版者
Information Processing Society of Japan
雑誌
Journal of Information Processing (ISSN:18826652)
巻号頁・発行日
vol.30, pp.464-475, 2022 (Released:2022-06-15)
参考文献数
31

Page-based distributed shared memory (PDSM) is a programming environment on distributed-memory computers that allows to freely allocate shared regions in the virtual address space accessible from any computer. It hides distributed physical memory from programmers and enables shared-memory programming over the uniform virtual address space. PDSM systems are typically equipped with coherent cache to improve performance while hiding communication, but the management cost is considered implementation details and is complex and implicit. Consequently, it is easy to fail in gaining speedup, and it is difficult to perform cost-aware programming to solve it. In this study, we explore cost-aware programming for ArgoDSM, a state-of-the-art PDSM. Particularly, based on the observation that there are three effective measures for reducing PDSM-derived costs: 1) informing PDSM of changes in access patterns to shared regions, 2) inspecting the data to be placed in shared regions, and 3) performing writes with an awareness of the original owner of the shared region, we extend the ArgoDSM with APIs to help in these measures. We performed cost-aware programming on the extended ArgoDSM for benchmark programs, and experimentally showed that PDSM-derived costs can be significantly reduced. The proposed programming measures significantly improve the situation, where the performance is below the sequential performance, and allows to benefit from the scalability of distributed-memory computers under the high-level abstraction of PDSM.