0% found this document useful (0 votes)
18 views

Finalproject Report Flight Price

Final year project report

Uploaded by

khanhuzaif348
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Finalproject Report Flight Price

Final year project report

Uploaded by

khanhuzaif348
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

TABLE OF CONTENTS

DECLARATION ii

CERTIFICATE iii

ACKNOWLEDGEMENT iv

ABSTRACT V

LIST OF TABLE vi

CHAPTER 1. INTRODUCTION

1.1. Overview 1
1.2. Importance 2

1.3. Objectives 3

CHAPTER 2. LITERATURE SURVEY 5

2.1. Previous Work 5

2.2. Techniques and Algorithms 6

2.3. Challenges 6

CHAPTER 3. METHODOLOGY 7

3.1. Data Collection 7

3.2. Preprocessing 8

3.3 EDA 9

3.4. Feature Engineering 10

3.5. Model Verification 11


3.6. Deploy the Machine learning model 12

3.7 Monitoring 13

CHAPTER 4. IMPLEMENTATION 15

4.1. Tools Used 15

4.2. System Architecture 16

4.3. User Interface 17

4.4 Coding 23

4.4.1 Front end page 26

4.4.2 Prediction Page 31

4.5Testing 32

4.5.1 Type of Testing 33

4.6 System testing 34

4.7 Manual Testing 35

CHAPTER 5. RESULTS AND DISCUSSION 36

5.1. Performance Evaluation 37

5.2. Comparison 38

CHAPTER 6. CONCLUSION 39

6.1. Summary 40

6.2. Achievement 41

6.3. Future Work 41


7. References

8 .BIOGRAPHY
1. Introduction
Machine learning is a subfield of Artificial Intelligence (AI) that works with
algorithms and technologies to extract useful information from data.

Machine learning methods are appropriate in big data since attempting to manually
process vast volumes of data would be impossible without the support of machines.

Machine learning in computer science attempts to solve problems algorithmically rather


than purely mathematically. Therefore, it is based on creating algorithms that permit the
machine to learn. However,

There are two general groups in machine learning which are supervised and
unsupervised. Supervised is where the program gets trained on pre-determined set to be
able to predict when a new data is given. Unsupervised is where the program tries to find
the relationship and the hidden pattern between the data.

1.1Overview: -

The Flight Price prediction is designed to harness the power of machine learning to forecast
flight ticket prices with high precision . Utilizing vast datasets of historical flight information,
the project aims to construct predictive model that can serve as a decision-making tool for
both travellers planning their trips and airline managing their price strategies

1.2. Importance
The ability to predict flight prices has significant implications for the travel industry. For
consumers, it means the potential to secure cost-effective travel by identifying the best times
to purchase tickets. For airlines, it represents an opportunity to fine-tune pricing models,
enhance revenue management, and stay competitive in a fluctuating market.

1.3. Objectives

• To develop a predictive model that can accurately forecast flight prices.

• To analyse historical flight data to understand pricing patterns.

• To provide a tool that can help consumers and airlines make data-driven decisions
2. LITERATURE SURVEY

2.1. Previous Work

In this we focus is on summarizing the existing research and projects related to flight price
prediction. It involves a comprehensive review of academic papers, industry reports, and
existing systems that have attempted to predict flight prices. Previous work may include
studies that employed machine learning algorithms, statistical models, or hybrid approaches
to forecast airfare. The subsection should highlight the methodologies, datasets used, and key
findings of these studies. Additionally, it may discuss the limitations or gaps identified in the
literature, which the current project aims to address.

2.2. Techniques and Algorithms:

Here, the section delves into the various techniques and algorithms commonly used in flight
price prediction. It provides an overview of the different machine learning algorithms such as
linear regression, decision trees, random forests, support vector machines, and neural
networks that have been applied in predicting flight prices. Additionally, it discusses the
feature engineering methods and data preprocessing techniques utilized to prepare the input
data for these algorithms. The subsection may also explore any specific approaches or
modifications tailored to the unique characteristics of flight price data, such as seasonality,
route networks, and pricing dynamics.
2.3. Challenges

The challenges and complexities associated with flight price prediction.

It may include technical challenges such as data quality issues, missing values, and the high
dimensionality of feature space.

Additionally, the subsection examines the inherent uncertainty in predicting human behavior,
as travelers' purchasing decisions are influenced by a multitude of factors beyond historical
price trends. Strategies for mitigating these challenges, as well as potential research directions
to overcome them, are also discussed.

One of the primary challenges identified is the dynamic nature of flight pricing, influenced by
factors like Total Stops. Additionally, the need for real-time data processing and the handling
of missing data present significant hurdles.
3. METHODOLOGY

3.1. Data Collection:

The methodology for gathering the necessary data for the flight price prediction . It
involves identifying relevant sources of data, such as flight booking databases, airline
websites, online travel agencies, or publicly available datasets.the process of collecting flight-
related information, including departure and arrival airports, dates, times, airlines, ticket
prices, and any other relevant features. It may also address considerations such as data
privacy, data licensing agreements, and the frequency of data updates to ensure the timeliness
and legality of the collected data.

But now here , We download a datasets from Kaggle , These datasets are Uncleaned , not
ready to training our model to these Datasets .
Fig 3.1.1

3.2. Preprocessing:

The preprocessing steps applied to the raw data before feeding it into the predictive models.
It includes data cleaning processes to handle missing values, and inconsistencies in the
dataset. Additionally, data normalization or standardization techniques may be employed to
ensure that features are on a similar scale. The preprocessing stage may also involve encoding
categorical variables, handling datetime variables, and performing any necessary
transformations to make the data suitable for analysis. The subsection should provide clarity
on the rationale behind each preprocessing step and its impact on the quality of the data.
Fig3.2.1 :Data Preprocessing

3.3 Exploratory Data Analysis(EDA)

 Pair plot

Here , pair plot used to detect outlier of data Y-Label(price), and X-Label (Duration ,
total_stops)
Fig3.3.1 : pair plot

 Correlation Analysis
Used to find Correlation of Data between to each other,

Let as see some factors

 When Duration are increased Price Also Increased.


 When Total stops are Increased Price also Increased.

Fig.3.3.2

 Categories Distribution
Most Categories Jet Airways

fig3.3.3 : Category wise distribution


3.3. Feature Engineering

Here, the methodology for feature engineering is described, which involves selecting,
creating, or transforming the input variables to improve the predictive performance of the
models. This may include extracting relevant features from the raw data, such as day of the
week, time of day, or holiday indicators, which may influence flight prices. Feature selection
techniques, such as correlation analysis or recursive feature elimination, may be employed to
identify the most informative variables. Additionally, domain knowledge and insights from
the literature.

3.4. Model Verification

selecting suitable predictive models for flight price prediction. It involves evaluating
various machine learning algorithms, such as linear regression, random forests, support
vector machines, neural networks, to determine their performance on the preprocessed
dataset. Model verification criteria may include predictive accuracy, computational
efficiency, interpretability, and scalability to handle large volumes of data. Techniques such as
cross-validation and grid search may be employed to tune hyperparameters and optimize
model performance. The subsection discusses the rationale behind the choice of models and
the criteria used for model evaluation.

3.5.Deploy the Machine Learning Model


In this satge of Machine learning lifecycle , we apply to integrate machine learning model
into processed and applications . The ultimate aim of this stage is tha proper functionality of
the model after deployments .

3.6 Monitoring
It involve the involvements of safety measure for the assurance of proper operation of th
emodel during its lifecycle. It makes proper management

Fig 3.6.1: Machine Learning Life Cycle


4. Implementation

4.1. Hardware And Software Used

All computer Software needs certain hardware components or other Software resources to be
present on computer . These prerequisits are known as system requiremets .

1 – Hardware Requirements

 System Processor : Intel Core i3 or Higher


 Hard Disk : 512SSD
 RAM : 4.0 GB or higher
2. Software Requirements
 Operating System : Wndows 10
 Front-end : Streamlit
 Framework : Streamlit Framework
 IDE : Colab , VsCode

Streamlit is chosen for model deployment,

Additionally, other tools and libraries used for data preprocessing, feature engineering, and
evaluation should be listed. For instance, Python libraries such as Pandas, NumPy, Scikit-
learn, and TensorFlow may be mentioned for data manipulation, machine learning, and deep
learning tasks. The section should also specify the version of Streamlit and other
dependencies used to ensure reproducibility.
4.2. System Architecture

A system architecture is the conceptual model that defines the structure , behaviour ,and more
view of system , An architecture description is a formal discription and representation of a
system .

Here, the architecture of the system developed for flight price prediction are described.
Streamlit, as the chosen deployment platform, plays a central role in hosting the predictive
model and providing a user-friendly interface for interacting with it. The subsection may
discuss how the predictive model is integrated into the Streamlit application, including
loading the trained model, processing user input, generating predictions, and displaying
results. It may also detail any backend services or databases used to support the application,
such as APIs for fetching real-time flight data or caching mechanisms for improving
performance.

Collection of Data Processes the Prediction of Flight


user Data price

csv data sheet on Various


MeasurePerform Prediction
Data ance Evaluation Result
preprocessing

Applied Algorithm
Data

preprocessing

Fig 4.2.1; System Architecture


4.2.1Streamlit Application

Streamlit is employed as the deployment platform due to its ability to create interactive and
user-friendly web applications with minimal effort. The main components of the Streamlit
application include:

4.2.2 USER INTERFACE (UI)

Streamlit's UI elements (such as sliders, date pickers, and dropdown menus) enable users
to input their flight search criteria, such as airline , date_of_journey , source ,
destination , dep_time , total_stops .

4.2.3 MODEL INTEGRATION

The trained predictive model is loaded into the Streamlit application, allowing it to be utilized
for generating flight price predictions based on user input.

4.2.4 PREDICTIVE MODEL INTEGRATION

The integration of the predictive model into the Streamlit application involves several steps:

1. Loading the Trained Model

The predictive model, trained using historical flight data, is saved and loaded into the
Streamlit application upon startup. This ensures that the model is readily available for
generating predictions.

2. Processing User Input

User inputs from the UI are collected and preprocessed to match the format expected
by the predictive model. This may involve converting categorical variables into
numerical representations, normalizing continuous variables, and ensuring all required
features are present.

3. GENERATING PREDICTIONS

The pre-processed user input is fed into the predictive model, which generates a price
prediction for the specified flight criteria.

4. DISPLAYING RESULTS

The predicted flight prices are presented to the user in an easily interpretable format,
such as tables, charts, or summary statistics.

4.3. USER INTERFACE

The design and functionality of the user interface developed using Streamlit. It describes
the layout, features, and interactive elements provided to users for inputting query parameters
(e.g., departure airport, destination airport, date of travel) and viewing predicted flight prices.
The user interface should be intuitive, visually appealing, and responsive, with clear
instructions on how to use the application effectively.
4.4 CODING

4.4.1 Front end

import streamlit as st
import pandas as pd

#setting up the page title,icons

st.set_page_config(page_title="Flight Price
Predictor",page_icon="https://hips.hearstapps.com/hmg-prod/images/gettyimages-
1677184597.jpg?crop=0.668xw:1.00xh;0.167xw,0&resize=1200:*")
st.sidebar.title('MENU BAR')
choice=st.sidebar.selectbox(' ',('Home','Predict'))
st.sidebar.image('https://e0.pxfuel.com/wallpapers/209/716/desktop-wallpaper-
untitled-airplane-sky-aesthetic-travel.jpg')
st.sidebar.image('https://i.pinimg.com/736x/0d/1e/
96/0d1e967cde176af6f8f0568af424d07b.jpg')
if(choice=='Home'):
st.title('Welcome to Flight Price Predictor')
st.text('Hi. Want to predict your flight ticket price❓❓')
st.text('Click the Menu bar for further details')
st.image('https://wallpapers.com/images/featured/airport-
w6v47yjhxcohsjgf.jpg')
elif(choice=='Predict'):
st.text('Kindly fill your flight details to view the predicted price')
st.image('https://feeds.abplive.com/onecms/images/uploaded-images/
2021/09/08/634259599cd6f60c24f9e67a5680c064_original.jpg')
ch=st.selectbox('Airline',('Select','Vistara','Air India','Indigo','GO
FIRST','AirAsia','SpiceJet'))
if(ch=='Vistara'):
a=5
elif(ch=='Air India'):
a=1
elif(ch=='Indigo'):
a=3
elif(ch=='GO FIRST'):
a=2
elif(ch=='AirAsia'):
a=0
elif(ch=='SpiceJet'):
a=4
cg=st.selectbox('From',
('Select','Delhi','Mumbai','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cg=='Delhi'):
b=2
cx=st.selectbox('Destination',
('Select','Mumbai','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Mumbai'):
b=5
cx=st.selectbox('Destination',
('Select','Delhi','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Bangalore'):
b=0
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Kolkata','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Kolkata'):
b=4
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Hyderabad'):
b=3
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Kolkata','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Chennai'):
f=1
else:
b=1
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Kolkata','Hyderabad'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
cf=st.selectbox('Departure time',('Select','Morning','Early
Morning','Evening','Night','Afternoon','Late Night'))
if(cf=='Morning'):
c=4
elif(cf=='Early Morning'):
c=1
elif(cf=='Evening'):
c=2
elif(cf=='Night'):
c=5
elif(cf=='Afternoon'):
c=0
elif(cf=='Late Night'):
c=3
ci=st.selectbox('Stops',('Select','one','zero','two or more'))
if(ci=='one'):
d=0
elif(ci=='zero'):
d=2
elif(ci=='two or more'):
d=1
cs=st.selectbox('Arrival time',
('Select','Night','Evening','Morning','Afternoon','Early Morning','Late
Night'))
if(cs=='Night'):
e=5
elif(cs=='Evening'):
e=2
elif(cs=='Morning'):
e=4
elif(cs=='Afternoon'):
e=0
elif(cs=='Early Morning'):
e=1
elif(cs=='Late Night'):
e=3
cb=st.selectbox('Class',('Select','Economy','Business'))
if(cb=='Economy'):
g=1
else:
g=0
h=st.number_input('Duration')
i=st.number_input('Days left')
btn=st.button('Check')
if btn:
def decompress_pickle(file):
data = bz2.BZ2File(file, 'rb')
data = pickle.load(data)
return data
model = decompress_pickle('Flight.pbz2')
pred=model.predict([[a,b,c,d,e,f,g,h,i]])
st.write("The predicted price is:-",pred[0],'Rs')
st.header('Time to fly ✈🧳')
st.image('https://image.cnbcfm.com/api/v1/image/106537227-
1589463911434gettyimages-890234318.jpeg?v=1589463982&w=1600&h=900')
4.4.3 Prediction Price

# -*- coding: utf-8 -*-


"""Flight.ipynb

Original file is located at


https://colab.research.google.com/drive/1jCVfWPfFdP3xGsbSiXZwEffsfwmsoyAu

Data Source: https://www.kaggle.com/datasets/shubhambathwal/flight-price-


prediction
"""

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

"""(1) **Data Loading**"""

flight_data=pd.read_csv('/content/drive/MyDrive/Clean_Dataset.csv')

# reading the 1st 3 rows of the dataset


flight_data.head(3)

"""As the column Unnmed: 0 is not needed, it is dropped"""

flight_data=flight_data.drop(columns=['Unnamed: 0'])

"""**Reading the dataset**"""

# reading the 1st 3 rows of the dataset


flight_data.head(3)

# reading the last 3 rows of the dataset


flight_data.tail(3)

"""(2) **Exploratory Data Analysis**

Dimensions of the dataset


"""

flight_data.shape
"""Checking the data types for each column"""

flight_data.dtypes

print('Null values:',flight_data.isnull().any().sum())
print('NaN values:', flight_data.isna().any().sum())
print('duplicates:',flight_data.duplicated().any().sum())

"""The dataset does not have any null, missing, duplicate values

a. Checking for no.of distinct values in each column in the dataset


"""

flight_data.nunique()

"""b. No.of flights per class - Economy and Business"""

sns.set(font_scale=0.7)
cl={'Economy':'green','Business':'blue'}
c=sns.countplot(data=flight_data,x='class',palette=cl)
for label in c.containers:
c.bar_label(label)

"""c. Total number of flights under each Airline and class"""

sns.set(font_scale=0.6)
plt.figure(figsize=(6,4))
col={'Economy':'red','Business':'green'}
a=sns.countplot(data=flight_data,x='airline',hue='class',palette=col)
for l in a.containers:
a.bar_label(l)
plt.title('Flight counts per airline')
plt.xlabel('Airline')
plt.ylabel('Total number of flights')

"""1. Among the six airlines, only Vistara and Air India have both classes
Economy and Business
2. And the airline Vistara has the highest no.of flights from both classes
3. Spicejet is the airline which has lowest no.of flights

d. Plotting No.of flights per cities and class category


"""

sns.set(font_scale=0.5) # setting the font scale


plt.figure(figsize=(10,8)) # setting the chart size

plt.subplot(1,2,1) # 1st plot in the subplot


col={'Economy':'purple','Business':'green'}
ax=sns.countplot(data=flight_data,x='source_city',hue='class',palette=col)
plt.title('No.of flights per source city')
plt.xlabel('Source Cities')
plt.ylabel('No.of flights')
for label in ax.containers:
ax.bar_label(label) # adding label to the bars

plt.subplot(1,2,2) # 2nd plot in the sub plot


col={'Economy':'purple','Business':'green'}
bx=sns.countplot(data=flight_data,x='destination_city',hue='class',palette=col
)
sns.move_legend(bx,"right")
plt.title('No.of flights per destination city')
plt.xlabel('Destination Cities')
plt.ylabel('No.of flights')
for c in bx.containers:
bx.bar_label(c)
plt.show()

"""From both charts,


* Economy class:- Delhi has the highest number, and
* Business class:- Mumbai is the city with highest no.of flights

e. Statistical info of the dataset


"""

flight_data.describe()

"""f. Viewing ticket price by each airline and class"""

flight_data[['airline','price','class']].sort_values(by='price',ascending=Fals
e)

"""Among the various airlines, Vistara charges highest price under the
business class.

g. Ticket price vs class based on different airlines


"""

sns.set(font_scale=0.7)
plt.figure(figsize=(9,9))
x=sns.barplot(data=flight_data,x='class',y='price',hue='airline',errorbar=None
)
for i in x.containers:
x.bar_label(i)
plt.xlabel('Class')
plt.ylabel('Ticket Price')
plt.title('Flight ticket price vs class based on each airline')

"""The ticket price charged by Vistara is the highest under both classes, and
AirAsia offers the lowest under Economy class.

h. Plotting No.of flights per class under different departure and arrival
time.
"""

sns.set(font_scale=0.7)
plt.figure(figsize=(8,6))

plt.subplot(2,1,1)
cl=sns.countplot(data=flight_data,x='departure_time',hue='class')
for l in cl.containers:
cl.bar_label(l)

plt.subplot(2,1,2)
cl=sns.countplot(data=flight_data,x='arrival_time',hue='class')
for l in cl.containers:
cl.bar_label(l)

"""This graph shows that, more morning flights are departed as well as more
night flights arrive at the airport.

i. Analysing ticket price vs destination and source cities base on each class
"""

sns.set(font_scale=0.7)
plt.figure(figsize=(8,6))

plt.subplot(2,1,1)
cl=sns.barplot(data=flight_data,x='destination_city',y='price',hue='class')
for l in cl.containers:
cl.bar_label(l)

plt.subplot(2,1,2)
cl=sns.barplot(data=flight_data,x='source_city',y='price',hue='class')
for l in cl.containers:
cl.bar_label(l)

"""Kolkata's flight is the costliest

j. Analysing duration of flights


"""
flight_data['duration'].describe()

# Row numbers of flights with minimum duration


flight_data[flight_data['duration']== 49.830000].index

# Row numbers of flights with maximum duration


flight_data[flight_data['duration']== 0.830000].index

"""(4) Feature Engineering

1. Checking for outliers in price column


"""

sns.boxplot(data=flight_data,x='price')

"""From the boxplot, we can infer that, the flight ticket price falls in the
range of 0 to 100000 only, whereas there are few outliers that is beyond the
value of 120000. Since, the dataset is large enough, the outliers are removed
from the data in order to develop a proper model for the prediction.

"""

f_out=flight_data[flight_data['price']>=100000].index
flight_data=flight_data.drop(index=f_out)

sns.boxplot(x=flight_data['price'])

flight_data.shape

flight_data[['destination_city','price']].groupby('destination_city').max()

flight_data[flight_data['price']==99680]

flight_data.head(2)

"""Vistara offers Business Class at the highest ticket price to the city
Mumbai flies from Bangalore with duration of 14.42 at Rs 99680.

2. Removing unnecessary columns


"""

flight_data=flight_data.drop(columns='flight')

"""3. Encoded multi columns containing categorical varibles at once"""


from sklearn.preprocessing import LabelEncoder

df=flight_data.iloc[:,:7] # poisition of columns that have categorical


variables

# Encoding:
enc_all_cols=df.apply(LabelEncoder().fit_transform)

#Concating with the remaining columns of the dataset


df_enc=pd.concat([enc_all_cols,flight_data.iloc[:,-3:]],axis=1)

# reading the first 2 rows of the dataframe which now has encoded data and
ready for train test split
df_enc.head(2)

"""(5) Model Building

Train test split


"""

from sklearn.model_selection import train_test_split

X = df_enc.drop(columns='price') # feature
y=df_enc['price'] # target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
random_state=0)
print('X_train size: {}, X_test size: {}'.format(X_train.shape, X_test.shape))
print('y_train size: {}, y_test size: {}'.format(y_train.shape, y_test.shape))

"""Finding the best model with the help of GridSearchCV"""

from sklearn.model_selection import GridSearchCV


from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

model_params={
'LR':{
'model':LinearRegression(),
'params':{

}
},
'KNR':{
'model':KNeighborsRegressor(),
'params':{
'n_neighbors':[2,5,10]
}
},
'RFR':{
'model':RandomForestRegressor(),
'params':{
'n_estimators':[5,10,20]
}
}
}

from sklearn.model_selection import ShuffleSplit


scores=[]
cv = ShuffleSplit(n_splits=5, test_size=0.20, random_state=0)
for model,mp in model_params.items():
clf=GridSearchCV(mp['model'],mp['params'],cv=cv,return_train_score=False)
clf.fit(X,y)
scores.append({
'model':model,
'best score':clf.best_score_,
'best params':clf.best_params_
})

dd=pd.DataFrame(scores,columns=['model','best score','best params'])


dd

"""Among the 3 models used, Random Forest Regressor gives the highest score.

Hence, a model with the Random Forest Regression is built and evaluated.
"""

from sklearn.model_selection import cross_val_score


cv=ShuffleSplit(n_splits=5,test_size=0.2)
s=cross_val_score(RandomForestRegressor(n_estimators=20),X,y,cv=cv)
print('Average Accuracy : {}%'.format(round(sum(s)*100/len(s)), 3))

rf=RandomForestRegressor(n_estimators=20)

rf.fit(X_train,y_train)

r_pred=rf.predict(X_test)

"""evaluating the model"""

from sklearn import metrics


metrics.r2_score(r_pred,y_test)
"""Looking for the labels of the categorical columns- For reference (Since the
columns are encoded)"""

print('AIRLINE')
print(flight_data['airline'].value_counts())
print(X['airline'].value_counts())
print('\n')
print('SOURCE CITY')
print(flight_data['source_city'].value_counts())
print(X['source_city'].value_counts())
print('\n')
print('DEPARTURE TIME')
print(flight_data['departure_time'].value_counts())
print(X['departure_time'].value_counts())

print('STOPS')
print(flight_data['stops'].value_counts())
print(X['stops'].value_counts())
print('\n')
print('ARRIVAL TIME')
print(flight_data['arrival_time'].value_counts())
print(X['arrival_time'].value_counts())
print('\n')
print('DESTINATION CITY')
print(flight_data['destination_city'].value_counts())
print(X['destination_city'].value_counts())
print('\n')
print('CLASS')
print(flight_data['class'].value_counts())
print(X['class'].value_counts())

flight_data.sample(1)

"""Testing the model with values"""

print('Price:',rf.predict([[5,5,5,0,4,3,0,12.75,35]])) # org= 64700 -


X_train
print('Price:',rf.predict([[5,5,4,0,4,3,1,24.0,48]]))# org= 3334 - X_train
print('Price:',rf.predict([[2,0,5,0,4,2,1,9.67,34]]))# org = 3826 - X_test
print('Price:',rf.predict([[1,5,0,0,4,0,0,17.33,29]])) #org = 54608 X_test

"""As per the model evaluation, the prediction is around 99% accurate.
Therefore, for flight prediction, 'rf' the model is chosen.

(6) Saving the model


"""
import pickle

flight_data.sample(1)

"""**"""

pickle_out1 = open("rfreg.pkl", "wb")


pickle.dump(rf, pickle_out1)
pickle_out1.close()

"""**"""

filename='trained_model.sav'
pickle.dump(rf,open(filename,'wb'))

"""Checking/loading the model"""

load=pickle.load(open('trained_model.sav','rb'))

load.predict([[5,5,4,0,4,3,1,24.0,48]])
4.5 Testing

Testing in a machine learning project is a crucial step to ensure that the model performs as
expected and generalizes well to new, unseen data. It involves several practices:

4.5.1 Type of Testing

1 – Unit Testing

Unit testing involves checking the correctness of individual components within the ML
pipeline. This could include testing data preprocessing functions, individual algorithms, or
other discrete parts of the ML system

2 - Integration Testing
Integration testing checks the combined functionality of these individual components. It
ensures that when these components work together, they produce the expected results

3 - System Testing

System testing evaluates the complete and integrated ML system to verify that it meets the
specified requirements. This includes testing the model’s performance on unseen data and
ensuring that it integrates well with other systems.

4.7 Manual Testing


Test Case 1 Predict Price based on entered Data
Test Discription The user enters Data for a given Dataset
Requirements Verified Yes

Test Environment System should be connect on internet


Expected result The user find Price of Flights Based on Given input
Pass/Fail Pass
Note Successfully Executed

Fig 4.7.1
5. Results and Discussion
This Section represents the proposed system results which can predict the price of Flight
Accurately and with high reliability then the existing system ,

The result are obtained by various machine learning algorithm , In this project we use
xgBOOSt Machine learning algorithm ,

These system also have an elegent interface which takes all the neccesary inputs for the
evaluation and to facilitate with is very easy to use . The final result of our proposed sustem
can be viewed by GUI .

Fig5.1 Front end Page


Fig 5.2 : All Data Selection
Fig 5.3 Send Data
Fig5.4 : Prediction price

5.1. Comparison
Here, the performance of the developed flight price prediction model(s) is compared with
existing methods or benchmarks. This could involve comparing the predictive accuracy,
computational efficiency, or other relevant metrics against baseline models or state-of-the-art
approaches reported in the literature. The subsection may also discuss how the proposed
model(s) fare against commercial flight booking websites or other publicly availablprediction
services.
6. CONCLUSION

6.1. SUMMARY

This subsection provides a concise summary of the key findings and contributions of the
flight price prediction project. It recaps the objectives outlined in the introduction and
summarizes how they were addressed throughout the project. The summary may include a
brief overview of the methodology employed, the predictive models developed, and the main
results obtained. Additionally, it highlights any novel insights or advancements made in the
field of flight price prediction as a result of the project.

6.2. ACHIEVEMENTS

Here, the subsection discusses the achievements and contributions of the project. It outlines
the specific outcomes or milestones reached during the course of the project, such as the
development of accurate predictive models, the implementation of a user-friendly interface
using Streamlit, or the generation of actionable insights for stakeholders in the aviation and
travel industries. Achievements may be evaluated in terms of technical innovation, practical
utility, or societal impact, depending on the project's goals and objectives.
6.3. FUTURE WORK

This subsection explores potential avenues for future research and development based on
the findings and limitations of the flight price prediction project. It identifies areas where
further improvements or enhancements could be made to advance the state-of-the-art in flight
prediction. Future work may include refining predictive models by incorporating additional
data sources or features, exploring advanced machine learning techniques such as deep
learning or ensemble methods, or conducting longitudinal studies to evaluate model
performance over time. Additionally, opportunities for collaboration with industry partners or
academic researchers may be discussed to validate and extend the project's findings in real-
world settings.

Overall, the Conclusion section serves as a culmination of the flight price prediction project,
summarizing its main outcomes, highlighting achievements, and outlining directions for
future research and development. It provides closure to the project while laying the
groundwork for continued exploration and innovation in the field of airfare prediction.
REFERENCES

1. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends,


perspectives, and prospects. Science, 349(6245), 255-260.

2. Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The
elements of statistical learning:
data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.

3. Valiant, L. G. (1984). A theory of the learnable. Communications of the


ACM, 27(11), 1134-1142.

4. Rao, N. S. S. V. S., & Thangaraj, S. J. J. (2023, April). Flight Ticket


Prediction using Random Forest , Regressor Compared with Decision Tree
Regressor. In 2023 Eighth International Conference on Science
Technology Engineering and Mathematics (ICONSTEM) (pp. 1-5). IEEE.

5. Burger, B., & Fuchs, M. (2005). Dynamic pricing—A future airline business
model. Journal of Revenue
and Pricing Management, 4(1), 39-53.

6. Malighetti, P., Paleari, S., & Redondi, R. (2010). Has Ryanair's pricing
strategy changed over time? An empirical analysis of its 2006–2007 flights.
Tourism management, 31(1), 36-44.
7. Liu, T., Cao, J., Tan, Y., & Xiao, Q. (2017, December). ACER: An adaptive
context-aware ensemble regression model for airfare price prediction. In 2017
International Conference on Progress in Informatics and Computing (PIC) (pp.
312-317). IEEE.
8. Tziridis, K., Kalampokas, T., Papakostas, G. A., & Diamantaras, K. I. (2017,
August). Airfare prices prediction using machine learning techniques. In 2017
25th European Signal Processing Conference (EUSIPCO) (pp. 1036-1039).
IEEE.
9. Can, Y. S., & Alagöz, F. (2023, October). Predicting Local Airfare Prices with
Deep Transfer Learning Technique. In 2023 Innovations in Intelligent Systems
and Applications Conference (ASYU) (pp. 1-4
BIOGRAPHY

Mohd Huzaif(2003820100026) is a computer Science Student . .He is


currently persuing a four-year Bachelor of technology degree in Computer
Science and Engineering at Kamla Nehru Institute of Physical and social
sciences Faridipur , Sultanpur He is working on the Flight price
prediction using Machine Learning .

You might also like