Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems (PPoPP 2023 - Main Conference)

Who

Jieyang Chen, Xin Liang, Kai Zhao, Hadi Zamani Sabzi, Laxmi Bhuyan, zizhong chen

Track

PPoPP 2023 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 28 Feb 2023 14:10 - 14:30 at Montreal 4 - Session 5: Decompositions Chair(s): Milind Chabbi

Abstract

One-sided dense matrix decompositions (e.g., Cholesky, LU, and QR) are the key components in scientific computing in many different fields. Although their design has been highly optimized for modern processors, they still consume a considerable amount of energy. As CPU-GPU heterogeneous systems are commonly used for matrix decompositions, in this work, we aim to further improve the energy saving of one-sided matrix decompositions on CPU-GPU heterogeneous systems. We first build an Algorithm-Based Fault Tolerance protected overclocking technique (ABFT-OC) to enable us to exploit reliable overclocking for key matrix decomposition operations. Then, we design an energy-saving matrix decomposition framework, Bi-directional Slack Reclamation (BSR), that can intelligently combine the capability provided by ABFT-OC and DVFS to maximize energy saving and maintain performance and reliability. Experiments show that BSR is able to save up 11.7% more energy compared with the current best energy saving optimization approach with no performance degradation and up to 14.1% $Energy \times Delay^2$ reduction. Also, BSR enables the Pareto efficient performance-energy trade-off, which is able to provide up to 1.43$\times$ performance improvement without costing extra energy.

Jieyang Chen

University of Alabama at Birmingham

Xin Liang

University of Kentucky

Kai Zhao

University of Alabama at Birmingham

Hadi Zamani Sabzi

University of California Riverside

Laxmi Bhuyan

University of California, Riverside

zizhong chen