Skip to content

Latest commit

 

History

History

gpu-optimization-sycl-training

Title

GPU Optimization with SYCL

Requirements

Optimized for Description
OS Linux* Ubuntu 18.04, 20 Windows* 10
Hardware Skylake with GEN9 or newer
Software Intel® oneAPI DPC++ Compiler, Jupyter Notebooks, Intel DevCloud

Purpose

The primary focus of this document is GPUs. Each section focuses on different topics to guide you in your path to creating optimized solutions.

Designing high-performance software requires you to think differently than you might normally do when writing software. You need to be aware of the hardware on which your code is intended to run, and the characteristics that control the performance of that hardware. Your goal is to structure the code such that it produces correct answers, but does so in a way that maximizes the hardware’s ability to execute the code.

Also, it familiarizes you with the use of Jupyter notebooks as a front-end for all training exercises. This workshop is designed to be used on the DevCloud and includes details on submitting batch jobs on the DevCloud environment.

At the end of this course, you will be able to:

  • Optimize your SYCL code to run faster and efficiently on GPUs.

Content Details

Pre-requisites

  • C++ Programming
  • SYCL Programming

Training Modules

Modules Description
Introduction to GPU Optimization + Phases in the Optimization Workflow
+ Locality Matters
+ Parallelization
+ GPU Execution Model Overview
Thread Mapping and Occupancy + nd_range Kernel
+ Thread Synchronization
+ Mapping Work-groups to Xe-cores for Maximum Occupancy
+ Intel® GPU Occupancy Calculator
Memory Optimizations Memory Optimization - Buffers
+ Buffer Accessor Modes
+ Optimizing Memory Movement Between Host and Device
+ Avoid Declaring Buffers in a Loop
+ Avoid Moving Data Back and Forth Between Host and Device
Memory Optimization - USM
+ Overlapping Data Transfer from Host to Device
+ Avoid Copying Unnecessary Block of Data
+ Copying Memory from Host to USM Device Allocation
Kernel Submission + Kernel Launch
+ Executing Multiple Kernels
+ Submitting Kernels to Multiple Queues
+ Avoid Redundant Queue Construction
Kernel Programming + Considerations for Selecting Work-group Size
+ Removing Conditional Checks
+ Avoiding Register Spills
Shared Local Memory + SLM Size and Work-group Size
+ Bank Conflicts
+ Using SLM as Cache
+ Data Sharing and Work-group Barriers
Sub-Groups + Sub-group Sizes
+ Sub-group Size vs. Maximum Sub-group Size
+ Vectorization and Memory Access
+ Data Sharing
Atomic Operations + Data Types for Atomic Operations
+ Atomic Operations in Global vs Local Space
Kernel Reduction + Reduction Using Atomic Operations
+ Reduction Using Shared Local Memory
+ Reduction Using Sub-Groups
+ Reduction Using SYCL Reduction Kernel

Content Structure

Each module folder has a Jupyter Notebook file (*.ipynb), this can be opened in Jupyter Lab to view the training content, edit code and compile/run. Along with the Notebook file, there is a lab and a src folder with SYCL source code for samples used in the Notebook. The module folder also has run_*.sh files, which can be used in shell terminal to compile and run each sample code.

  • 01_{Module_Name}
    • lab
      • {sample_code_name}.cpp - (sample code editable via Jupyter Notebook)
    • src
      • {sample_code_name}.cpp - (copy of sample code)
    • 01_{Module_Name}.ipynb - (Jupyter Notebook with training content and sample codes)
    • run_{sample_code_name}.sh - (script to compile and run {sample_code_name}.cpp)
    • License.txt
    • Readme.md

Install Directions

The training content can be accessed locally on the computer after installing necessary tools, or you can directly access using Intel DevCloud without any installation necessary.

Access using Intel DevCloud

The Jupyter notebooks are tested and can be run on Intel DevCloud without any installation necessary, below are the steps to access these Jupyter notebooks on Intel DevCloud:

  1. Register on Intel DevCloud
  2. Login, Get Started and Launch Jupyter Lab
  3. Open Terminal in Jupyter Lab and git clone the repo and access the Notebooks

Local Installation of oneAPI Tools and JupyterLab

The Jupyter Notebooks can be downloaded locally to computer and accessed:

  • Install Intel oneAPI Base Toolkit on local computer: Installation Guide
  • Install Jupyter Lab on local computer: Installation Guide
  • git clone the repo and access the Notebooks using Jupyter Lab

Local Installation of oneAPI Tools and use command line

The Jupyter Notebooks can be viewed on Github and you can run the code on command line:

  • Install Intel oneAPI Base Toolkit on local computer (linux): Installation Guide
  • git clone the repo
  • open command line terminal and use the run_*.sh script in each module to compile and run code.

License

Code samples are licensed under the MIT license. See License.txt for details.

Third party program Licenses can be found here: third-party-programs.txt