This document describes a sign language text-to-speech converter system using image processing and convolutional neural networks (CNNs). The system captures images of hand gestures using a camera, applies image processing techniques like thresholding and blurring, and then uses a CNN model trained on a dataset of gestures to recognize the gestures and convert them to text and speech. The system was able to accurately recognize gestures for letters and numbers with about 85% accuracy. Future work may involve expanding the dataset to include more signs and working towards word and sentence recognition.