Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- Understanding OpenCL
- OpenCL compared to CUDA and SYCL
- Overview of OpenCL features and architecture
- Setting up the Development Environment
Getting Started
- Creating a new OpenCL project using Visual Studio Code
- Exploring the project structure and files
- Compiling and running the program
- Displaying output using printf and fprintf
OpenCL API
- Understanding the role of the OpenCL API in host programs
- Using the OpenCL API to query device information and capabilities
- Using the OpenCL API to create contexts, command queues, buffers, kernels, and events
- Using the OpenCL API to enqueue commands such as read, write, copy, map, unmap, execute, and wait
- Using the OpenCL API to handle errors and exceptions
OpenCL C
- Understanding the role of OpenCL C in device programs
- Writing OpenCL C kernels that execute on the device and manipulate data
- Using OpenCL C data types, qualifiers, operators, and expressions
- Using OpenCL C built-in functions, including math, geometric, and relational operations
- Using OpenCL C extensions and libraries, such as atomic, image, and cl_khr_fp16
OpenCL Memory Model
- Understanding the difference between host and device memory models
- Utilizing OpenCL memory spaces such as global, local, constant, and private
- Utilizing OpenCL memory objects such as buffers, images, and pipes
- Utilizing OpenCL memory access modes such as read-only, write-only, and read-write
- Utilizing the OpenCL memory consistency model and synchronization mechanisms
OpenCL Execution Model
- Understanding the difference between host and device execution models
- Defining parallelism using OpenCL work-items, work-groups, and ND-ranges
- Using OpenCL work-item functions like get_global_id, get_local_id, and get_group_id
- Using OpenCL work-group functions like barrier, work_group_reduce, and work_group_scan
- Using OpenCL device functions like get_num_groups, get_global_size, and get_local_size
Debugging
- Understanding common errors and bugs in OpenCL programs
- Using the Visual Studio Code debugger to inspect variables, breakpoints, and call stacks
- Using CodeXL to debug and analyze OpenCL programs on AMD devices
- Using Intel VTune to debug and analyze OpenCL programs on Intel devices
- Using NVIDIA Nsight to debug and analyze OpenCL programs on NVIDIA devices
Optimization
- Understanding factors affecting OpenCL program performance
- Using OpenCL vector data types and vectorization techniques to improve arithmetic throughput
- Using OpenCL loop unrolling and loop tiling techniques to reduce control overhead and increase locality
- Using OpenCL local memory and related functions to optimize memory accesses and bandwidth
- Using OpenCL profiling and profiling tools to measure and enhance execution time and resource utilization
Summary and Next Steps
Requirements
- Proficiency in the C/C++ language and understanding of parallel programming concepts
- Fundamental knowledge of computer architecture and memory hierarchy
- Experience with command-line tools and code editors
Target Audience
- Developers aiming to learn how to use OpenCL for programming heterogeneous devices and exploiting their parallelism
- Developers seeking to write portable and scalable code capable of running on diverse platforms and devices
- Programmers interested in exploring low-level heterogeneous programming aspects and optimizing code performance
28 Hours