Ai Phishing Report
Ai Phishing Report
2. Team Members - Aniket Jaiswal — Backend Developer & Integration Specialist - Anupam Tiwari —
Project Lead & Machine Learning Developer - Ayush Vishwakarma — Frontend Developer & UI/UX
Designer
3. Problem Statement Phishing websites mimic legitimate ones to steal sensitive information like login
credentials and financial data. Traditional detection methods, such as blacklists, fail to identify new and
evolving threats, exposing users to data theft and financial loss.
4. Objectives - Detect phishing websites using Machine Learning (ML). - Integrate heuristic features and
APIs (VirusTotal, WHOIS, Google Safe Browsing). - Develop a real-time Chrome extension for user
protection. - Ensure transparency through explainable AI (LIME, SHAP). - Provide an admin dashboard with
visual analytics and reporting tools.
5. System Architecture Components: - Frontend: Chrome Extension (JavaScript, HTML, CSS), Streamlit
Dashboard (Python) - Backend: Flask REST API - Database: Firebase for logging and real-time sync - ML
Model: Random Forest, XGBoost, SVM using Scikit-learn - APIs: WHOIS, VirusTotal, Google Safe Browsing
Architecture Diagram: (Include flowchart visualizing URL processing through frontend, backend, ML
prediction, and browser alert)
6. Dataset and Feature Engineering Sources: - PhishTank - Kaggle Phishing Dataset - Alexa Top 1M (for
legitimate URLs)
Preprocessing Steps: - Clean and normalize data - Encode categorical variables - Handle class imbalance
with SMOTE or undersampling
Feature Categories: - URL-Based: URL length, presence of '@', redirections - Domain-Based: Age,
expiration time, WHOIS data - Security-Based: HTTPS presence, SSL certificate validity - Content-Based:
HTML pattern analysis (optional)
1
7. Machine Learning Models - Algorithms Used: Random Forest, XGBoost, SVM - Evaluation Metrics:
Accuracy, Precision, Recall, F1-Score, ROC-AUC
8. Browser Extension Workflow - Monitors every opened URL - Sends URL to backend for classification -
Backend returns prediction - Displays red warning screen for phishing
9. Admin Dashboard - Built with Streamlit - Visualizes logs, user-submitted suspicious links - Shows model
predictions, charts, and live analytics
Chart: (Include pie chart of phishing vs safe URLs, trend line of daily detection counts)
10. Future Scope - Deep Learning integration for advanced pattern detection - Voice-based phishing alerts
for accessibility - Integration into antivirus suites or secure browsers - Continuous model retraining with live
data - Telegram/WhatsApp bot for URL checks
12. Conclusion Phishing attacks are growing in scale and sophistication. Our hybrid system leverages ML,
heuristics, and APIs to detect phishing in real-time, preventing user harm proactively. With a Chrome
extension and explainable model ready for deployment, this project bridges academic learning with real-
world impact.
13. References - PhishTank: https://www.phishtank.com/ - Kaggle Phishing Dataset - Alexa Top 1M List -
VirusTotal API - WHOIS API - Google Safe Browsing API
2
14. Appendix (Include graphs, confusion matrix, screenshots of extension, dashboard UI, LIME/SHAP
visualization samples)
Prepared by: Ayush Vishwakarma & Team Institution: Deen Dayal Upadhyay Gorakhpur University
Department: Computer Science and Engineering Semester: 6th Year: 2025