CUDA, as one of the most popular choices for GPU programming, can be executed only on NVIDIA GPUs. To execute CUDA on non-NVIDIA devices, researchers have proposed to translate CUDA to other programming languages. However, this approach cannot achieve high coverage due to the challenges in source-to-source translation.
We propose a framework, CuPBoP, that executes CUDA programs on non-NVIDIA devices without relying on other programming languages. CuPBoP consists of two parts. The compilation part applies transformations on CUDA host/kernel IRs. The runtime part consists of the runtime libraries for CUDA built-in functions. For the CPU backends, compared with the existing frameworks, CuPBoP achieves the highest coverage on all CPUs that we evaluate (x86, aarch64, RISC-V).
We make CuPBoP publicly available to inspire more works in this area