CUDA GPU assignment
Graphic processing units (GPUs) can contain upto hundreds of Processing Engines (PEs). They achieve performance levels of hundreds of GFLOPS (10^9 floating point operations per second). In the past GPUs were very dedicated, not general programmable, and could only be used to speedup graphics processing. Today, they become more-and-more general purpose. The latest GPUs of ATI and NVIDIA can be programmed in C and OpenCL. For this lab we will use NVIDIA GPUs together with the CUDA (based on C) programming environment.
The purpose of this assignment is to get familiar with multiprocessor architectures and their programming models. The state-of-the-art multicore processors may contain dozens of cores on a single die. The figures below are examples of these processors. The trend of of going multicore posts new challenges to both computer architects and programmers. Putting hundreds of cores on a die is not difficult, but designing memory hierarchy to keep them busy is difficult. On the other hand, programming dozens of cores requires programmers to think 'parallel'. In this assignment, we will try to tackle these challenges, from the view point of both computer architects and programmers.
Image Processing with OpenCL (Bachelor students)
Students map their own image processing algorithm onto the GPU using OpenCL. Similar to a tutorial, they perform the following steps: (1) Copy data to and from the GPU's memory, (2) Create a basic kernel implementation in OpenCL that produces the correct results, and (3) Improve the performance of the kernel. Finally, the students are free to test their OpenCL implementation on different GPU's from AMD and NVIDIA. Then, they will be given a reference CUDA implementation, so they can evaluate performance between both programming languages and on different architectures.