This document discusses using locality sensitive hashing (LSH) to detect trips with overlapping routes in large GPS datasets. It describes challenges with noisy GPS data and large search spaces. The approach involves representing trips as sets of area segments, computing Jaccard similarity, and using MinHash to map similar trips to the same buckets with high probability. Multiple hash functions are applied to increase probability. Approaches for efficient distributed processing on Spark are discussed, including reducing network usage. Future work involves migrating to Spark ML APIs and handling streaming inserts.