Syllabus
Review of Traditional Computer Architecture Basic five stage RISC Pipeline, Cache Memory, Register File, SIMD instructions, GPU architectures – Streaming Multi Processors, Cache Hierarchy, The Graphics Pipeline, Introduction to CUDA programming. Multi-dimensional mapping of dataspace, Synchronization, Warp Scheduling, Divergence, Memory Access Coalescing, Optimization examples: optimizing Reduction Kernels, Optimization examples: Kernel Fusion, Thread and Block. OpenCL basics, OpenCL for Heterogeneous Computing, Application Design: Efficient Neural Network Training/Inferencing.