Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What constitutes GPU programming?
- The benefits of using CUDA with Python
- Core concepts: Threads, Blocks, and Grids
Overview of CUDA Features and Architecture
- Comparing GPU and CPU architectures
- Understanding SIMT (Single Instruction, Multiple Threads)
- The CUDA programming model
Setting up the Development Environment
- Installing the CUDA Toolkit and drivers
- Installing Python and Numba
- Configuring and verifying the environment
Fundamentals of Parallel Programming
- Introduction to parallel execution
- Comprehending threads and thread hierarchies
- Working with warps and synchronization mechanisms
Working with the Numba Compiler
- Introduction to Numba
- Writing CUDA kernels using Numba
- Understanding the @cuda.jit decorators
Building a Custom CUDA Kernel
- Writing and launching a basic kernel
- Utilizing threads for element-wise operations
- Managing grid and block dimensions
Memory Management
- Different types of GPU memory (global, shared, local, constant)
- Data transfer between host and device
- Optimizing memory usage and preventing bottlenecks
Advanced Topics in GPU Acceleration
- Shared memory and synchronization strategies
- Employing streams for asynchronous execution
- Basics of multi-GPU programming
Converting CPU-based Applications to GPU
- Profiling CPU code
- Identifying sections suitable for parallelization
- Translating logic into CUDA kernels
Troubleshooting
- Debugging CUDA applications
- Common errors and their resolutions
- Tools and techniques for testing and validation
Summary and Next Steps
- Review of key concepts
- Best practices in GPU programming
- Resources for continued learning
Requirements
- Experience with Python programming
- Familiarity with NumPy (including ndarrays, ufuncs, etc.)
Target Audience
- Developers
14 Hours
Testimonials (1)
Very interactive with various examples, with a good progression in complexity between the start and the end of the training.