POSTER: Learning to Parallelize in a Shared-Memory Environment with Transformers
In past years, the world has switched to multi and many cores shared memory architectures. As a result, there is a growing need to utilize these architectures by introducing shared memory parallelization schemes, such as OpenMP, to applications. Nevertheless, introducing OpenMP work-sharing loop construct into code, especially legacy code, is challenging due to pervasive pitfalls in management of parallel shared memory. To facilitate the performance of this task, many source-to-source (S2S) compilers have been created over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. In this work, we propose leveraging recent advances in machine learning techniques, specifically in natural language processing (NLP), to suggest the need for an OpenMP work-sharing loop directive and data-sharing attributes clauses — the building blocks of concurrent programming. We train several transformer models, named PragFormer, for these tasks and show that they outperform statistically-trained baselines and automatic source-to-source (S2S) parallelization compilers in both classifying the overall need for an parallel for directive and the introduction of private and reduction clauses. In the future, our corpus can be used for additional tasks, up to generating entire OpenMP directives. The source code and database for our project can be accessed on GitHub and HuggingFace. https://github.com/Scientific-Computing-Lab-NRCN/PragFormer | https://huggingface.co/spaces/Pragformer/PragFormer-demo
Sun 26 FebDisplayed time zone: Eastern Time (US & Canada) change
18:00 - 20:00 | |||
18:00 2hPoster | POSTER: Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU Main Conference Muhammad Osama University of California, Davis, Duane Merrill NVIDIA Corporation, Cris Cecka NVIDIA Corporation, Michael Garland NVIDIA, John D. Owens University of California, Davis Pre-print | ||
18:00 2hPoster | POSTER: Unexpected Scaling in Path Copying Trees Main Conference Vitaly Aksenov Inria & ITMO University, Trevor Brown University of Toronto, Alexander Fedorov IST Austria, Ilya Kokorin ITMO University | ||
18:00 2hPoster | POSTER: Transactional Composition of Nonblocking Data Structures Main Conference Wentao Cai University of Rochester, Haosen Wen University of Rochester, Michael L. Scott University of Rochester | ||
18:00 2hPoster | POSTER: The ERA Theorem for Safe Memory Reclamation Main Conference | ||
18:00 2hPoster | POSTER: AArch64 Atomics: Might they be harming your performance? Main Conference | ||
18:00 2hPoster | POSTER: Fast Parallel Exact Inference on Bayesian Networks Main Conference Jiantong Jiang The University of Western Australia, Zeyi Wen The Hong Kong University of Science and Technology (Guangzhou), Atif Mansoor The University of Western Australia, Ajmal Mian The University of Western Australia | ||
18:00 2hPoster | POSTER: High-Throughput GPU Random Walk with Fine-tuned Concurrent Query Processing Main Conference Cheng Xu Shanghai Jiao Tong University, Chao Li Shanghai Jiao Tong University, Pengyu Wang Shanghai Jiao Tong University, Xiaofeng Hou Hong Kong University of Science and Technology, Jing Wang Shanghai Jiao Tong University, Shixuan Sun National University of Singapore, Minyi Guo Shanghai Jiao Tong University, Hanqing Wu Alibaba Inc, Dongbai Chen Alibaba Inc, Xiangwen Liu Alibaba Inc | ||
18:00 2hPoster | POSTER: Efficient All-reduce for Distributed DNN Training in Optical Interconnect Systems Main Conference Fei Dai University of Otago, Yawen Chen University of Otago, Zhiyi Huang University of Otago, Haibo Zhang University of Otago, Fangfang Zhang Qilu University of Technology | ||
18:00 2hPoster | POSTER: CuPBoP: A framework to make CUDA portable Main Conference Ruobing Han Georgia Institute of Technology, Jun Chen Georgia Institute of Technology, Bhanu Garg Georgia Institute of Technology, Jeffrey Young Georgia Institute of Technology, Jaewoong Sim Seoul National University, Hyesoon Kim Georgia Tech | ||
18:00 2hPoster | POSTER: Generating Fast FFT Kernels on CPUs via FFT-Specific Intrinsics Main Conference Zhihao Li SKLP, Institute of Computing Technology, CAS, Haipeng Jia SKLP, Institute of Computing Technology, CAS, Yunquan Zhang SKLP, Institute of Computing Technology, CAS, Yuyan Sun Huawei Technologies Co., Ltd, Yiwei Zhang SKLP, Institute of Computing Technology, CAS, Tun Chen SKLP, Institute of Computing Technology, CAS | ||
18:00 2hPoster | POSTER: Learning to Parallelize in a Shared-Memory Environment with Transformers Main Conference Pre-print |