This document discusses performance optimization of clustering algorithms on GPUs. It begins with an introduction to clustering and hierarchical, partitioning, density-based, and grid-based clustering methods. It then discusses GPU architecture and the CUDA programming model for parallel processing on GPUs. The Markov Clustering (MCL) algorithm is described for hierarchical clustering. Implementation of MCL on GPUs using CUDA and sparse matrix storage formats is discussed. Results show speedups in execution time when using GPUs for clustering large datasets compared to CPUs. Future work to implement other clustering algorithms on different GPU platforms is proposed.