A Scalable Hybrid Total FETI Method for Massively Parallel FEM Simulations (PPoPP 2023 - Main Conference)

Who

Kehao Lin, Chunbao Zhou, Yan Zeng, Ningming Nie, Jue Wang, Shigang Li, Yangde Feng, Yangang Wang, Kehan Yao, Tiechui Yao, Jilin Zhang, Jian Wan

Track

PPoPP 2023 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 27 Feb 2023 15:40 - 16:00 at Montreal 4 - Session 3: Practice Chair(s): I-Ting Angelina Lee

Abstract

The Hybrid Total Finite Element Tearing and Interconnecting (HTFETI) method plays an important role in solving large-scale and complex engineering problems. This method needs to handle numerous matrix-vector multiplications. Directly calling the vendor-optimized library for general matrix-vector multiplication (gemv) on GPU leads to low performance, since it does not consider optimizations for different matrix sizes in HTFETI, i.e. different row and column sizes. In addition, state-of-the-art graph partitioning methods cannot guarantee load balancing for HTFETI, since the matrix size is determined by the length of the subdomain boundary. To solve the problems above, we first port gemv to the multi-stream pipeline scheme and develop a new batched kernel function on GPU, which brings 15%~30% throughput improvement and 37% average GFLOPs improvement, respectively. We also propose a multi-grained load-balancing scheme based on graph repartitioning and work-stealing, and the load imbalance ratio is down to 1.05$\sim$1.09 from 1.5. We have successfully applied the scalable HTFETI method to simulate the whole core assembly of China Experimental Fast Reactor (CEFR) for steady-state analysis, and the efficiencies of weak scalability and strong scalability reach 78% and 72% on 12,288 GPUs, respectively. As far as we know, this is the first time that HTFETI has been used in large-scale and high-fidelity whole core assembly simulation.

Kehao Lin

Hangzhou Dianzi University

Chunbao Zhou

Computer Network Information Center, Chinese Academy of Sciences

Yan Zeng

Hangzhou Dianzi University

Ningming Nie

Computer Network Information Center, Chinese Academy of Sciences

Jue Wang

Computer Network Information Center, Chinese Academy of Sciences

Shigang Li

Beijing University of Posts and Telecommunications

China

Yangde Feng

Computer Network Information Center, Chinese Academy of Sciences

Yangang Wang

Computer Network Information Center, Chinese Academy of Sciences

Kehan Yao

Hangzhou Dianzi University

Tiechui Yao

Computer Network Information Center, Chinese Academy of Sciences

Jilin Zhang

Hangzhou Dianzi University

Jian Wan

Hangzhou Dianzi University

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 27 Feb
Displayed time zone: Eastern Time (US & Canada) change

15:40 - 17:00	Session 3: PracticeMain Conference at Montreal 4 Chair(s): I-Ting Angelina Lee Washington University in St. Louis, USA

15:40 20m Talk		A Scalable Hybrid Total FETI Method for Massively Parallel FEM Simulations Main Conference Kehao Lin Hangzhou Dianzi University, Chunbao Zhou Computer Network Information Center, Chinese Academy of Sciences, Yan Zeng Hangzhou Dianzi University, Ningming Nie Computer Network Information Center, Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences, Shigang Li Beijing University of Posts and Telecommunications, Yangde Feng Computer Network Information Center, Chinese Academy of Sciences, Yangang Wang Computer Network Information Center, Chinese Academy of Sciences, Kehan Yao Hangzhou Dianzi University, Tiechui Yao Computer Network Information Center, Chinese Academy of Sciences, Jilin Zhang Hangzhou Dianzi University, Jian Wan Hangzhou Dianzi University
16:00 20m Talk		Lifetime-based Optimization for Simulating Quantum Circuits on a New Sunway Supercomputer Main Conference Yaojian Chen Tsinghua University, Yong Liu National Supercomputer center in wuxi, Xinmin Shi Information Engineering University, Jiawei Song National Supercomputer center in wuxi, Xin Liu National Supercomputer center in wuxi, Lin Gan Tsinghua University, Chu Guo Information Engineering University, Haohuan Fu Tsinghua University, Jie Gao National Research Centre of Parallel Engineering and Technology, Dexun Chen National Supercomputer center in wuxi, Guangwen Yang Tsinghua University
16:20 20m Talk		High-Performance Filters for GPUs Main Conference Hunter James McCoy University of Utah, Steven Hofmeyr Lawrence Berkeley National Laboratory, Katherine Yelick University of California at Berkeley & Lawrence Berkeley National Lab, Prashant Pandey University of Utah
16:40 20m Talk		High-Performance and Scalable Agent-Based Simulation with BioDynaMo Main Conference Lukas Breitwieser European Organization for Nuclear Research (CERN), ETH Zurich, Ahmad Hesam Delft University of Technology, Fons Rademakers European Organization for Nuclear Research (CERN), Juan Gómez Luna ETH Zurich, Onur Mutlu ETH Zurich Pre-print Media Attached