Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems
One-sided dense matrix decompositions (e.g., Cholesky, LU, and QR) are the key components in scientific computing in many different fields. Although their design has been highly optimized for modern processors, they still consume a considerable amount of energy. As CPU-GPU heterogeneous systems are commonly used for matrix decompositions, in this work, we aim to further improve the energy saving of one-sided matrix decompositions on CPU-GPU heterogeneous systems. We first build an Algorithm-Based Fault Tolerance protected overclocking technique (ABFT-OC) to enable us to exploit reliable overclocking for key matrix decomposition operations. Then, we design an energy-saving matrix decomposition framework, Bi-directional Slack Reclamation (BSR), that can intelligently combine the capability provided by ABFT-OC and DVFS to maximize energy saving and maintain performance and reliability. Experiments show that BSR is able to save up 11.7% more energy compared with the current best energy saving optimization approach with no performance degradation and up to 14.1% $Energy \times Delay^2$ reduction. Also, BSR enables the Pareto efficient performance-energy trade-off, which is able to provide up to 1.43$\times$ performance improvement without costing extra energy.
Tue 28 FebDisplayed time zone: Eastern Time (US & Canada) change
13:50 - 15:10
Session 5: DecompositionsMain Conference at Montreal 4
Chair(s): Milind Chabbi Uber Technologies Inc.
|TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition|
Lizhi Xiang University of utah, Miao Yin Rutgers University, Chengming Zhang Indiana University, Aravind Sukumaran-Rajam Meta, Saday Sadayappan University of Utah, USA, Bo Yuan Rutgers University, Dingwen Tao Indiana University
|Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems|
Jieyang Chen University of Alabama at Birmingham, Xin Liang University of Kentucky, Kai Zhao University of Alabama at Birmingham, Hadi Zamani Sabzi University of California Riverside, Laxmi Bhuyan University of California, Riverside, zizhong chen University of California, Riverside
|End-to-End LU Factorization of Large Matrices on GPUs|
Yang Xia , Peng Jiang The University of Iowa, Rajiv Ramnath The Ohio State University, Gagan Agrawal Augusta University
|Fast Eigenvalue Decomposition via WY Representation on Tensor Core|
Shaoshuai Zhang University of Houston, Ruchi Shah University of Houston, Hiroyuki Ootomo Tokyo Institute of Technology, Rio Yokota Tokyo Institute of Technology, Panruo Wu University of Houston