CANTINA is a content-based approach to detecting phishing websites that examines the text-based content of websites along with surface characteristics like the age of the domain and presence of known images. It uses the TF-IDF algorithm to generate lexical signatures of web pages by calculating term frequency-inverse document frequency values for words. It then tests the page by searching for its lexical signature and checking if the domain matches top search results to determine if the page is legitimate or a phishing attempt. The system aims to detect phishing URLs and links to protect users from disclosing personal information on fraudulent websites or through malicious emails.