GPU Optimization with SYCL
Optimized for | Description |
---|---|
OS | Linux* Ubuntu 18.04, 20 Windows* 10 |
Hardware | Skylake with GEN9 or newer |
Software | Intel® oneAPI DPC++ Compiler, Jupyter Notebooks, Intel DevCloud |
The primary focus of this document is GPUs. Each section focuses on different topics to guide you in your path to creating optimized solutions.
Designing high-performance software requires you to think differently than you might normally do when writing software. You need to be aware of the hardware on which your code is intended to run, and the characteristics that control the performance of that hardware. Your goal is to structure the code such that it produces correct answers, but does so in a way that maximizes the hardware’s ability to execute the code.
Also, it familiarizes you with the use of Jupyter notebooks as a front-end for all training exercises. This workshop is designed to be used on the DevCloud and includes details on submitting batch jobs on the DevCloud environment.
At the end of this course, you will be able to:
- Optimize your SYCL code to run faster and efficiently on GPUs.
- C++ Programming
- SYCL Programming
Modules | Description |
---|---|
Introduction to GPU Optimization | + Phases in the Optimization Workflow + Locality Matters + Parallelization + GPU Execution Model Overview |
Thread Mapping and Occupancy | + nd_range Kernel + Thread Synchronization + Mapping Work-groups to Xe-cores for Maximum Occupancy + Intel® GPU Occupancy Calculator |
Memory Optimizations | Memory Optimization - Buffers + Buffer Accessor Modes + Optimizing Memory Movement Between Host and Device + Avoid Declaring Buffers in a Loop + Avoid Moving Data Back and Forth Between Host and Device Memory Optimization - USM + Overlapping Data Transfer from Host to Device + Avoid Copying Unnecessary Block of Data + Copying Memory from Host to USM Device Allocation |
Kernel Submission | + Kernel Launch + Executing Multiple Kernels + Submitting Kernels to Multiple Queues + Avoid Redundant Queue Construction |
Kernel Programming | + Considerations for Selecting Work-group Size + Removing Conditional Checks + Avoiding Register Spills |
Shared Local Memory | + SLM Size and Work-group Size + Bank Conflicts + Using SLM as Cache + Data Sharing and Work-group Barriers |
Sub-Groups | + Sub-group Sizes + Sub-group Size vs. Maximum Sub-group Size + Vectorization and Memory Access + Data Sharing |
Atomic Operations | + Data Types for Atomic Operations + Atomic Operations in Global vs Local Space |
Kernel Reduction | + Reduction Using Atomic Operations + Reduction Using Shared Local Memory + Reduction Using Sub-Groups + Reduction Using SYCL Reduction Kernel |
Each module folder has a Jupyter Notebook file (*.ipynb
), this can be opened in Jupyter Lab to view the training content, edit code and compile/run. Along with the Notebook file, there is a lab
and a src
folder with SYCL source code for samples used in the Notebook. The module folder also has run_*.sh
files, which can be used in shell terminal to compile and run each sample code.
01_{Module_Name}
lab
{sample_code_name}.cpp
- (sample code editable via Jupyter Notebook)
src
{sample_code_name}.cpp
- (copy of sample code)
01_{Module_Name}.ipynb
- (Jupyter Notebook with training content and sample codes)run_{sample_code_name}.sh
- (script to compile and run {sample_code_name}.cpp)License.txt
Readme.md
The training content can be accessed locally on the computer after installing necessary tools, or you can directly access using Intel DevCloud without any installation necessary.
The Jupyter notebooks are tested and can be run on Intel DevCloud without any installation necessary, below are the steps to access these Jupyter notebooks on Intel DevCloud:
- Register on Intel DevCloud
- Login, Get Started and Launch Jupyter Lab
- Open Terminal in Jupyter Lab and git clone the repo and access the Notebooks
The Jupyter Notebooks can be downloaded locally to computer and accessed:
- Install Intel oneAPI Base Toolkit on local computer: Installation Guide
- Install Jupyter Lab on local computer: Installation Guide
- git clone the repo and access the Notebooks using Jupyter Lab
The Jupyter Notebooks can be viewed on Github and you can run the code on command line:
- Install Intel oneAPI Base Toolkit on local computer (linux): Installation Guide
- git clone the repo
- open command line terminal and use the
run_*.sh
script in each module to compile and run code.
Code samples are licensed under the MIT license. See License.txt for details.
Third party program Licenses can be found here: third-party-programs.txt