SAM is a new segmentation model that can segment objects in images using natural language prompts. It was trained on over 1,100 datasets totaling over 10,000 images using a model-in-the-loop approach. SAM uses a transformer-based architecture with encoders for images, text, bounding boxes and masks. It achieves state-of-the-art zero-shot segmentation performance without any fine-tuning on target datasets.