This document provides an overview of CUDA (Compute Unified Device Architecture), NVIDIA's parallel computing platform and programming model that allows software developers to leverage the parallel compute engines in NVIDIA GPUs. The document discusses key aspects of CUDA including: GPU hardware architecture with many scalar processors and concurrent threads; the CUDA programming model with host CPU code calling parallel kernels that execute across multiple GPU threads; memory hierarchies and data transfers between host and device memory; and programming basics like compiling with nvcc, allocating and copying data between host and device memory.