A Scalable Hybrid Total FETI Method for Massively Parallel FEM Simulations
The Hybrid Total Finite Element Tearing and Interconnecting (HTFETI) method plays an important role in solving large-scale and complex engineering problems. This method needs to handle numerous matrix-vector multiplications. Directly calling the vendor-optimized library for general matrix-vector multiplication (gemv) on GPU leads to low performance, since it does not consider optimizations for different matrix sizes in HTFETI, i.e. different row and column sizes. In addition, state-of-the-art graph partitioning methods cannot guarantee load balancing for HTFETI, since the matrix size is determined by the length of the subdomain boundary. To solve the problems above, we first port gemv to the multi-stream pipeline scheme and develop a new batched kernel function on GPU, which brings 15%~30% throughput improvement and 37% average GFLOPs improvement, respectively. We also propose a multi-grained load-balancing scheme based on graph repartitioning and work-stealing, and the load imbalance ratio is down to 1.05$\sim$1.09 from 1.5. We have successfully applied the scalable HTFETI method to simulate the whole core assembly of China Experimental Fast Reactor (CEFR) for steady-state analysis, and the efficiencies of weak scalability and strong scalability reach 78% and 72% on 12,288 GPUs, respectively. As far as we know, this is the first time that HTFETI has been used in large-scale and high-fidelity whole core assembly simulation.
Mon 27 FebDisplayed time zone: Eastern Time (US & Canada) change
15:40 - 17:00 | Session 3: PracticeMain Conference at Montreal 4 Chair(s): I-Ting Angelina Lee Washington University in St. Louis, USA | ||
15:40 20mTalk | A Scalable Hybrid Total FETI Method for Massively Parallel FEM Simulations Main Conference Kehao Lin Hangzhou Dianzi University, Chunbao Zhou Computer Network Information Center, Chinese Academy of Sciences, Yan Zeng Hangzhou Dianzi University, Ningming Nie Computer Network Information Center, Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences, Shigang Li Beijing University of Posts and Telecommunications, Yangde Feng Computer Network Information Center, Chinese Academy of Sciences, Yangang Wang Computer Network Information Center, Chinese Academy of Sciences, Kehan Yao Hangzhou Dianzi University, Tiechui Yao Computer Network Information Center, Chinese Academy of Sciences, Jilin Zhang Hangzhou Dianzi University, Jian Wan Hangzhou Dianzi University | ||
16:00 20mTalk | Lifetime-based Optimization for Simulating Quantum Circuits on a New Sunway Supercomputer Main Conference Yaojian Chen Tsinghua University, Yong Liu National Supercomputer center in wuxi, Xinmin Shi Information Engineering University, Jiawei Song National Supercomputer center in wuxi, Xin Liu National Supercomputer center in wuxi, Lin Gan Tsinghua University, Chu Guo Information Engineering University, Haohuan Fu Tsinghua University, Jie Gao National Research Centre of Parallel Engineering and Technology, Dexun Chen National Supercomputer center in wuxi, Guangwen Yang Tsinghua University | ||
16:20 20mTalk | High-Performance Filters for GPUs Main Conference Hunter James McCoy University of Utah, Steven Hofmeyr Lawrence Berkeley National Laboratory, Katherine Yelick University of California at Berkeley & Lawrence Berkeley National Lab, Prashant Pandey University of Utah | ||
16:40 20mTalk | High-Performance and Scalable Agent-Based Simulation with BioDynaMo Main Conference Lukas Breitwieser European Organization for Nuclear Research (CERN), ETH Zurich, Ahmad Hesam Delft University of Technology, Fons Rademakers European Organization for Nuclear Research (CERN), Juan Gómez Luna ETH Zurich, Onur Mutlu ETH Zurich Pre-print Media Attached |