This document discusses various techniques for document clustering and retrieval, including cosine similarity, k-means clustering, hierarchical clustering, and the EM algorithm. Cosine similarity measures the similarity between document vectors and is often used to compare documents, with higher values indicating more similar documents. K-means clustering partitions documents into k groups to minimize intra-cluster similarity, while hierarchical clustering creates a dendrogram of document clusters by progressively merging the most similar pairs. The EM algorithm computes maximum likelihood estimates for document clustering when data is incomplete. Evaluation of document clusters considers internal metrics like intra-cluster similarity and inter-cluster dissimilarity.