gpu-optimization-sycl-training

Title

GPU Optimization with SYCL

Requirements

Optimized for	Description
OS	Linux* Ubuntu 18.04, 20 Windows* 10
Hardware	Skylake with GEN9 or newer
Software	Intel® oneAPI DPC++ Compiler, Jupyter Notebooks, Intel DevCloud

Purpose

The primary focus of this document is GPUs. Each section focuses on different topics to guide you in your path to creating optimized solutions.

Designing high-performance software requires you to think differently than you might normally do when writing software. You need to be aware of the hardware on which your code is intended to run, and the characteristics that control the performance of that hardware. Your goal is to structure the code such that it produces correct answers, but does so in a way that maximizes the hardware’s ability to execute the code.

Also, it familiarizes you with the use of Jupyter notebooks as a front-end for all training exercises. This workshop is designed to be used on the DevCloud and includes details on submitting batch jobs on the DevCloud environment.

At the end of this course, you will be able to:

Optimize your SYCL code to run faster and efficiently on GPUs.

Content Details

Pre-requisites

C++ Programming
SYCL Programming

Training Modules

Modules	Description
Introduction to GPU Optimization	+ Phases in the Optimization Workflow + Locality Matters + Parallelization + GPU Execution Model Overview
Thread Mapping and Occupancy	+ nd_range Kernel + Thread Synchronization + Mapping Work-groups to Xe-cores for Maximum Occupancy + Intel® GPU Occupancy Calculator
Memory Optimizations	Memory Optimization - Buffers + Buffer Accessor Modes + Optimizing Memory Movement Between Host and Device + Avoid Declaring Buffers in a Loop + Avoid Moving Data Back and Forth Between Host and Device Memory Optimization - USM + Overlapping Data Transfer from Host to Device + Avoid Copying Unnecessary Block of Data + Copying Memory from Host to USM Device Allocation
Kernel Submission	+ Kernel Launch + Executing Multiple Kernels + Submitting Kernels to Multiple Queues + Avoid Redundant Queue Construction
Kernel Programming	+ Considerations for Selecting Work-group Size + Removing Conditional Checks + Avoiding Register Spills
Shared Local Memory	+ SLM Size and Work-group Size + Bank Conflicts + Using SLM as Cache + Data Sharing and Work-group Barriers
Sub-Groups	+ Sub-group Sizes + Sub-group Size vs. Maximum Sub-group Size + Vectorization and Memory Access + Data Sharing
Atomic Operations	+ Data Types for Atomic Operations + Atomic Operations in Global vs Local Space
Kernel Reduction	+ Reduction Using Atomic Operations + Reduction Using Shared Local Memory + Reduction Using Sub-Groups + Reduction Using SYCL Reduction Kernel

Content Structure

Each module folder has a Jupyter Notebook file (*.ipynb), this can be opened in Jupyter Lab to view the training content, edit code and compile/run. Along with the Notebook file, there is a lab and a src folder with SYCL source code for samples used in the Notebook. The module folder also has run_*.sh files, which can be used in shell terminal to compile and run each sample code.

01_{Module_Name}
- lab
  - {sample_code_name}.cpp - (sample code editable via Jupyter Notebook)
- src
  - {sample_code_name}.cpp - (copy of sample code)
- 01_{Module_Name}.ipynb - (Jupyter Notebook with training content and sample codes)
- run_{sample_code_name}.sh - (script to compile and run {sample_code_name}.cpp)
- License.txt
- Readme.md

Install Directions

The training content can be accessed locally on the computer after installing necessary tools, or you can directly access using Intel DevCloud without any installation necessary.

Access using Intel DevCloud

The Jupyter notebooks are tested and can be run on Intel DevCloud without any installation necessary, below are the steps to access these Jupyter notebooks on Intel DevCloud:

Register on Intel DevCloud
Login, Get Started and Launch Jupyter Lab
Open Terminal in Jupyter Lab and git clone the repo and access the Notebooks

Local Installation of oneAPI Tools and JupyterLab

The Jupyter Notebooks can be downloaded locally to computer and accessed:

Install Intel oneAPI Base Toolkit on local computer: Installation Guide
Install Jupyter Lab on local computer: Installation Guide
git clone the repo and access the Notebooks using Jupyter Lab

Local Installation of oneAPI Tools and use command line

The Jupyter Notebooks can be viewed on Github and you can run the code on command line:

Install Intel oneAPI Base Toolkit on local computer (linux): Installation Guide
git clone the repo
open command line terminal and use the run_*.sh script in each module to compile and run code.

License

Code samples are licensed under the MIT license. See License.txt for details.

Third party program Licenses can be found here: third-party-programs.txt

Name		Name	Last commit message	Last commit date
parent directory ..
01_Introduction_to_GPU_Optimization		01_Introduction_to_GPU_Optimization
02_Thread_Mapping_and_Occupancy		02_Thread_Mapping_and_Occupancy
03_Memory_Optimization		03_Memory_Optimization
04_Kernel_Submission		04_Kernel_Submission
05_Kernel_Programming		05_Kernel_Programming
06_Shared_Local_Memory		06_Shared_Local_Memory
07_Sub_Groups		07_Sub_Groups
08_Atomic_Operations		08_Atomic_Operations
09_Kernel_Reduction		09_Kernel_Reduction
15_Implicit_Explicit_Scaling		15_Implicit_Explicit_Scaling
Makefile		Makefile
Readme.md		Readme.md
Welcome.ipynb		Welcome.ipynb
sample.json		sample.json

Files

gpu-optimization-sycl-training

Directory actions

More options

Directory actions

More options

Latest commit

History

gpu-optimization-sycl-training

Folders and files

parent directory

Title

Requirements

Purpose

Content Details

Pre-requisites

Training Modules

Content Structure

Install Directions

Access using Intel DevCloud

Local Installation of oneAPI Tools and JupyterLab

Local Installation of oneAPI Tools and use command line

License