0% found this document useful (0 votes)
3 views3 pages

project

The document consists of three main files for a phishing website detection application using Flask. The 'index.html' file provides the front-end interface for users to input URLs for phishing detection, while 'app.py' handles the backend logic, including model loading and prediction. The 'train_model.py' script trains a machine learning model using XGBoost and LightGBM algorithms, evaluates their performance, and saves the best model for future predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

project

The document consists of three main files for a phishing website detection application using Flask. The 'index.html' file provides the front-end interface for users to input URLs for phishing detection, while 'app.py' handles the backend logic, including model loading and prediction. The 'train_model.py' script trains a machine learning model using XGBoost and LightGBM algorithms, evaluates their performance, and saves the best model for future predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

index.

html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Phishing Website Detection</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div class="animated-background">
<canvas id="background-canvas"></canvas>
</div>

<div class="content">
<!-- Animated Header -->
<div class="header">
<h1>⚡ Cybercrime Detection Portal ⚡</h1>
<p>Secure your digital world with cutting-edge phishing detection.</p>
</div>

<!-- URL Detection Form -->


<div class="form-container">
<form action="/predict" method="post" id="phishing-form">
<label for="url">Enter URL:</label>
<input type="text" id="url" name="url"
placeholder="https://example.com" required>
<button type="submit" class="detect-btn">Detect Phishing</button>
</form>
</div>
</div>

<script src="script.js"></script>
</body>
</html>

app.py:
from flask import Flask, request, render_template
import joblib
import numpy as np
import pandas as pd

# Initialize Flask app


app = Flask(__name__)

# Load the trained model


model_path = 'C:/Users/priya/phishingwebsite/phishing_model.pkl' # Ensure this
path is correct
model = joblib.load(model_path)

# Route for the homepage


@app.route('/')
def home():
return render_template('index.html')

# Route for handling predictions


@app.route('/predict', methods=['POST'])
def predict():
# Get the input URL from the form
url = request.form['url']

# Step 1: Extract features from the input URL


# Implement this function based on your dataset
features = extract_features_from_url(url)

# Step 2: Ensure the features align with the training set


features = pd.DataFrame([features]) # Convert to DataFrame

# Step 3: Make a prediction


prediction = model.predict(features)[0]
result = "Phishing" if prediction == 1 else "Legitimate"

# Return the result to the frontend


return render_template('index.html', prediction_text=f"The URL is: {result}")

# Feature extraction logic (implement based on your dataset)


def extract_features_from_url(url):
# Replace this with your actual feature extraction logic
return {
'url_length': len(url),
'dot_count': url.count('.'),
'slash_count': url.count('/'),
# Add other features based on your dataset here
}

if __name__ == "__main__":
app.run(debug=True)

train_model.py:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report
import joblib
import matplotlib.pyplot as plt
import seaborn as sns
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
import warnings
warnings.filterwarnings('ignore') # Suppress warnings for cleaner output

# Step 2: Load the dataset


# Replace with the path to your dataset
data_path = 'C:/Users/priya/phishingwebsite/phishing_dataset.csv'
data = pd.read_csv(data_path)
print("Dataset loaded successfully.")

# Step 3: Data Preprocessing


# Check and remove any missing values
print("No missing values found. Proceeding without removing rows.")

# Step 4: Feature Selection


X = data.drop('class', axis=1) # Features
y = data['class'].map({-1: 0, 1: 1})

# Step 5: Split the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Step 6: Train models with advanced algorithms
# XGBoost
xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='mlogloss',
random_state=42)
xgb_model.fit(X_train, y_train)
print("XGBoost model trained successfully.")

# LightGBM
lgbm_model = LGBMClassifier(random_state=42)
lgbm_model.fit(X_train, y_train)
print("LightGBM model trained successfully.")

# Step 7: Evaluate the models


y_pred_xgb = xgb_model.predict(X_test)
y_pred_lgbm = lgbm_model.predict(X_test)

accuracy_xgb = accuracy_score(y_test, y_pred_xgb)


accuracy_lgbm = accuracy_score(y_test, y_pred_lgbm)

print(f"XGBoost Accuracy: {accuracy_xgb * 100:.2f}%")


print(f"LightGBM Accuracy: {accuracy_lgbm * 100:.2f}%")

print("\nClassification Report (XGBoost):\n", classification_report(y_test,


y_pred_xgb))
print("\nClassification Report (LightGBM):\n", classification_report(y_test,
y_pred_lgbm))

# Step 8: Save the best model


# Step 8: Save the best model
if accuracy_xgb > accuracy_lgbm:
best_model = xgb_model
print("XGBoost is the best model based on test accuracy.")
else:
best_model = lgbm_model
print("LightGBM is the best model based on test accuracy.")

model_path = 'C:/Users/priya/phishingwebsite/phishing_model.pkl'
joblib.dump(best_model, model_path)
print(f"Best model saved to {model_path}.")

# Visualize feature importance


# Improved Feature Importance Visualization
importances = best_model.feature_importances_
features = X.columns

# Sorting the feature importances for better visualization


sorted_indices = np.argsort(importances)[::-1] # Sort in descending order
plt.figure(figsize=(10, 8))
sns.barplot(x=importances[sorted_indices], y=features[sorted_indices])
plt.title("Feature Importance (Sorted)")
plt.xlabel("Importance")
plt.ylabel("Feature")
plt.show()

You might also like