OPTIMIZATION & VECTORIZATION
In the Optimization & Vectorization course, I worked in a group with two other students on optimizing two different programs. We followed the structured optimization process:
1) Profile: determine hotspots.
2) Analyze hotspots: determine scalability.
3) Apply high-level optimizations to hotspots.
4) Profile again.
5) Parallelize.
6) Use GPGPU.
7) Profile again.
8) Apply low-level optimizations to hotspots.
9) Repeat steps 7 and 8 until we're satisfied.
1) Profile: determine hotspots.
2) Analyze hotspots: determine scalability.
3) Apply high-level optimizations to hotspots.
4) Profile again.
5) Parallelize.
6) Use GPGPU.
7) Profile again.
8) Apply low-level optimizations to hotspots.
9) Repeat steps 7 and 8 until we're satisfied.
TANK GAME
Before optimization
After optimization
In the first project, we were tasked to optimize a simple tank game using only high-level and low-level optimizations (without parallelization or GPGPU optimizations). We implemented a grid as a spacial partitioning structure (high-level optimization), which allowed us to simplify calculations in several aspects of the game. We also implemented a lot of low-level optimizations, such as using efficient operations, normalizing data, managing memory, avoiding conditional branches, reducing cache misses, and more.
After implementing all optimizations, we were able to improve the game's performance by a 100 fold. In the images above, each player starts with 1000 tanks; performance went from around 0.6 frames per second to 60 frames per second.
After implementing all optimizations, we were able to improve the game's performance by a 100 fold. In the images above, each player starts with 1000 tanks; performance went from around 0.6 frames per second to 60 frames per second.
IMAGE FILTERING WITH ADAPTIVE MANIFOLDS
The second project we worked on was a real-time image filtering application that uses adaptive manifolds. We mainly focused on GPGPU optimizations: We converted most of the code to OpenCL to parallelize the work. That way, the program was able to run on both the CPU and GPU at the same time. We also implemented some high-level optimizations, such as applying box filters, and low-level optimizations. After all optimizations, the application was 3.8 times faster.