Limitations in the advancement of high-end single-threaded processors have forced new paradigms in software and algorithm development. Research and high-performance applications often require massively parallel systems for simulation, data processing, and data analysis. Several architectures, including nVidia’s CUDA and Intel’s Xeon Phi, provide highly parallel performance at low cost. However, algorithms primarily designed for massively parallel systems are difficult to design and optimize.
In this course, we will focus on the design and development of algorithms that take advantage of highly parallel co-processors, such as the nVidia GPU and Xeon Phi, in order to solve research related problems. This course will include an overview of GPU architectures and principles in programming massively parallel systems. Topics covered will include designing and optimizing parallel algorithms, using available heterogeneous libraries, and case studies in linear systems, n-body problems, deep learning, and differential equations.
Syllabus [ PDF ]
[ PDF ] HW1 - CUDA API, memory allocation, data transfers
[ PDF ] HW2 - Grid design, warps, and control divergence
[ PDF ] HW3 - Memory transactions, alignment, and efficiency
[ PDF ] Final project rubric
[ PDF ] PA1 - Separable Convolution of large images
- [ PPM ] [ source ] Sombrero Galaxy
- [ PPM ] [ source ] Whirlpool Galaxy
- [ PPM ] [ source ] Hereford Cathedral
- [ PPM ] [ source ] Bansberia Royal Estate
[ PDF ] PA2 - Shared memory general matrix multiplication