SlideShare a Scribd company logo
CANTINA
A Content-Based Approach to Detecting Phishing Web
Sites
•CANTINA is a content-based
approach.
•Examines whether the content is
legitimate or not.
•Detects phishing URLs and links.
ABSTRACT
INTRODUCTION
• Phishing
A kind of attack in which victims are tricked by
spoofed emails and fraudulent web sites into giving
up personal information
•How many phishing sites are there?
9,255 unique phishing sites were reported in June of
2006 alone
•How much phishing costs each year?
$1 billion to 2.8 billion per year
EXISTING SYSTEM
• NetCraft(Surface Characteristics)
• SpoofGuard(Surface Characteristics and
blacklist)
• Cloudmark(Blacklist )
PROPOSED SYSTEM
• Detects phishing websites
• Examines text-based content along with surface
characteristics.
• Text based content includes:
-Age of Domain.
-Known Images.
-Suspicious URL.
-Suspicious links.
 Detects phishing links in users email.
TF-IDF ALGORITHM
• Term Frequency (TF)
–The number of times a given term appears
in a specific document
–Measure of the importance of the term
within the particular document
• Inverse Document Frequency (IDF)
–Measure how common a term is across an
entire collection of documents
• High TF-IDF weight means High TF
REAL EBAY WEBPAGE
FAKE EBAY WEBPAGE
MODULES
• Parsing the web pages
• Generating the lexical signature
• Testing Process
• Report Generation
Parsing the web pages
• Link, anchor tag, form tag and attachment in the
web pages is turned into corresponding Text Link,
HTML Link e.t.c.
•Done by parsing each Text
• Uses HTML Parser API
• It is used for extracting information from
HTML code
Generating the lexical signature
• TF-IDF algorithm used to generate
lexical signatures.
• Calculating the TF-IDF value for each
word in a document.
• Selecting the words with highest
value.
Testing Process
• Feed this lexical signature to a search
engine.
• Check domain name of the current
web page matches the domain name
of the N top search results.
Report Generation
• If a page is Legitimate it returns
“legitimate”
• If a page is phishing it returns
“phishing”
• Used to detect fraudulent websites,
emails.
•Protects from giving up personal
information like credit card numbers,
bank details, account passwords etc.
•Used to detect suspicious links in
email.
APPLICATIONS
•Content-based approach for detecting
phishing websites.
•User friendly interface for the users.
•Anti-phishing website that protects users
from giving their personal information.
CONCLUSION

More Related Content

Similar to Cantina content based approach to detect phishing websites (20)

PPTX
Phishing_URL_Detector_Project_Presentation.pptx
jobdearning
 
PPTX
Detection of Phishing Websites
Nikhil Soni
 
PDF
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
Yahoo Developer Network
 
PDF
Cyberscout Corporate Security
Firoze Hussain
 
PPTX
Web mining
SarthakSahoo8
 
PPTX
introduction for web connectivity (IoT)
FabMinds
 
PPTX
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
borith10b
 
PPTX
-Phishing-URL-Detection-Presentation-1.pptx
1234tanupatel1234
 
PPTX
Web Mining & Text Mining
Hemant Sharma
 
PPTX
Eba ppt rajesh
RajeshP153
 
PDF
Detecting Phishing using Machine Learning
ijtsrd
 
PPT
l3bw_internet&web.ppt aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
deepikasssgaupkm
 
PPTX
Automation Attacks At Scale
Mayank Dhiman
 
PDF
Identity Theft
Simpletel
 
PDF
Web mining .pdf module 6 dwm third year ce
NiramayKolalle
 
PDF
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Chi En (Ashley) Shen
 
PDF
AntiPhishStack [EN].pdf
Overkill Security
 
PDF
Patent. US11483343 [EN] .pdf
Snarky Security
 
PPTX
Phishing
Sreekanth Narendran
 
PPTX
Suddala-Scan: Enhancing Website Analysis with AI for Capstone Project at Bost...
Boston Institute of Analytics
 
Phishing_URL_Detector_Project_Presentation.pptx
jobdearning
 
Detection of Phishing Websites
Nikhil Soni
 
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
Yahoo Developer Network
 
Cyberscout Corporate Security
Firoze Hussain
 
Web mining
SarthakSahoo8
 
introduction for web connectivity (IoT)
FabMinds
 
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
borith10b
 
-Phishing-URL-Detection-Presentation-1.pptx
1234tanupatel1234
 
Web Mining & Text Mining
Hemant Sharma
 
Eba ppt rajesh
RajeshP153
 
Detecting Phishing using Machine Learning
ijtsrd
 
l3bw_internet&web.ppt aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
deepikasssgaupkm
 
Automation Attacks At Scale
Mayank Dhiman
 
Identity Theft
Simpletel
 
Web mining .pdf module 6 dwm third year ce
NiramayKolalle
 
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Chi En (Ashley) Shen
 
AntiPhishStack [EN].pdf
Overkill Security
 
Patent. US11483343 [EN] .pdf
Snarky Security
 
Suddala-Scan: Enhancing Website Analysis with AI for Capstone Project at Bost...
Boston Institute of Analytics
 

Recently uploaded (20)

PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
The Future of Artificial Intelligence (AI)
Mukul
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Ad

Cantina content based approach to detect phishing websites

  • 1. CANTINA A Content-Based Approach to Detecting Phishing Web Sites
  • 2. •CANTINA is a content-based approach. •Examines whether the content is legitimate or not. •Detects phishing URLs and links. ABSTRACT
  • 3. INTRODUCTION • Phishing A kind of attack in which victims are tricked by spoofed emails and fraudulent web sites into giving up personal information •How many phishing sites are there? 9,255 unique phishing sites were reported in June of 2006 alone •How much phishing costs each year? $1 billion to 2.8 billion per year
  • 4. EXISTING SYSTEM • NetCraft(Surface Characteristics) • SpoofGuard(Surface Characteristics and blacklist) • Cloudmark(Blacklist )
  • 5. PROPOSED SYSTEM • Detects phishing websites • Examines text-based content along with surface characteristics. • Text based content includes: -Age of Domain. -Known Images. -Suspicious URL. -Suspicious links.  Detects phishing links in users email.
  • 6. TF-IDF ALGORITHM • Term Frequency (TF) –The number of times a given term appears in a specific document –Measure of the importance of the term within the particular document • Inverse Document Frequency (IDF) –Measure how common a term is across an entire collection of documents • High TF-IDF weight means High TF
  • 9. MODULES • Parsing the web pages • Generating the lexical signature • Testing Process • Report Generation
  • 10. Parsing the web pages • Link, anchor tag, form tag and attachment in the web pages is turned into corresponding Text Link, HTML Link e.t.c. •Done by parsing each Text • Uses HTML Parser API • It is used for extracting information from HTML code
  • 11. Generating the lexical signature • TF-IDF algorithm used to generate lexical signatures. • Calculating the TF-IDF value for each word in a document. • Selecting the words with highest value.
  • 12. Testing Process • Feed this lexical signature to a search engine. • Check domain name of the current web page matches the domain name of the N top search results.
  • 13. Report Generation • If a page is Legitimate it returns “legitimate” • If a page is phishing it returns “phishing”
  • 14. • Used to detect fraudulent websites, emails. •Protects from giving up personal information like credit card numbers, bank details, account passwords etc. •Used to detect suspicious links in email. APPLICATIONS
  • 15. •Content-based approach for detecting phishing websites. •User friendly interface for the users. •Anti-phishing website that protects users from giving their personal information. CONCLUSION