Containerization approaches based on namespaces offered by the Linux kernel have seen an increasing popularity in the HPC community both as a means to isolate applications and as a format to package and distribute them. However, their adoption and usage in HPC systems faces several challenges. These include difficulties in unprivileged running and building of scientific application container images directly on HPC resources, increasing heterogeneity of HPC architectures, and access to specialized networking libraries available only on HPC systems. These challenges of container-based HPC application development closely align with the several advantages that a new universal intermediate binary format called WebAssembly (Wasm) has to offer. These include a lightweight userspace isolation mechanism and portability across operating systems and processor architectures. In this paper, we explore the usage of Wasm as a distribution format for MPI-based HPC applications. To this end, we present MPIWasm, a novel Wasm embedder for MPI-based HPC applications that enables high-performance execution of Wasm code, has low-overhead for MPI calls, and supports high-performance networking interconnects present on HPC systems. We evaluate the performance and overhead of MPIWasm on a production HPC system and AWS Graviton2 nodes using standardized HPC benchmarks. Results from our experiments demonstrate that MPIWasm delivers competitive native application performance across all scenarios. Moreover, we observe that Wasm binaries are 139.5x smaller on average as compared to the statically-linked binaries for the different standardized benchmarks.
Mon 27 FebDisplayed time zone: Eastern Time (US & Canada) change
13:50 - 15:10
|A Programming Model for GPU Load Balancing|
|Exploring the Use of WebAssembly in HPC|
Mohak Chadha Chair of Computer Architecture and Parallel Systems, Technical University of Munich, Nils Krueger Chair of Computer Architecture and Parallel Systems, Technical University of Munich, Jophin John Chair of Computer Architecture and Parallel Systems, Technical University of Munich, Anshul Jindal Chair of Computer Architecture and Parallel Systems, Technical University of Munich, Michael Gerndt TUM, Shajulin Benedict Indian Institute of Information Technology Kottayam, Kerala, India
|Fast and Scalable Channels in Kotlin Coroutines|
|High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs|