This document outlines strategies for tuning program performance on POWER9 processors. It discusses how performance bottlenecks can arise in the processor front-end and back-end and describes some compiler flags, pragmas, and source code techniques for addressing these bottlenecks. These include techniques like unrolling, inlining, prefetching, parallelization with OpenMP, and leveraging GPUs with OpenACC. Hands-on exercises are provided to demonstrate applying these optimizations to a Jacobi application and measuring the performance impacts.