0% found this document useful (0 votes)

18 views

Finalproject Report Flight Price

Final year project report

Uploaded by

khanhuzaif348

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Finalproject Report Flight Price

Final year project report

Uploaded by

khanhuzaif348

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 44

DECLARATION ii

CERTIFICATE iii

ACKNOWLEDGEMENT iv

ABSTRACT V

LIST OF TABLE vi

CHAPTER 1. INTRODUCTION

1.1. Overview 1
1.2. Importance 2

1.3. Objectives 3

CHAPTER 2. LITERATURE SURVEY 5

2.1. Previous Work 5

2.2. Techniques and Algorithms 6

2.3. Challenges 6

CHAPTER 3. METHODOLOGY 7

3.1. Data Collection 7

3.2. Preprocessing 8

3.3 EDA 9

3.4. Feature Engineering 10

3.5. Model Verification 11

3.6. Deploy the Machine learning model 12

3.7 Monitoring 13

CHAPTER 4. IMPLEMENTATION 15

4.1. Tools Used 15

4.2. System Architecture 16

4.3. User Interface 17

4.4 Coding 23

4.4.1 Front end page 26

4.4.2 Prediction Page 31

4.5Testing 32

4.5.1 Type of Testing 33

4.6 System testing 34

4.7 Manual Testing 35

CHAPTER 5. RESULTS AND DISCUSSION 36

5.1. Performance Evaluation 37

5.2. Comparison 38

CHAPTER 6. CONCLUSION 39

6.1. Summary 40

6.2. Achievement 41

6.3. Future Work 41

7. References

8 .BIOGRAPHY
1. Introduction
Machine learning is a subfield of Artificial Intelligence (AI) that works with
algorithms and technologies to extract useful information from data.

Machine learning methods are appropriate in big data since attempting to manually
process vast volumes of data would be impossible without the support of machines.

Machine learning in computer science attempts to solve problems algorithmically rather

than purely mathematically. Therefore, it is based on creating algorithms that permit the
machine to learn. However,

There are two general groups in machine learning which are supervised and
unsupervised. Supervised is where the program gets trained on pre-determined set to be
able to predict when a new data is given. Unsupervised is where the program tries to find
the relationship and the hidden pattern between the data.

1.1Overview: -

The Flight Price prediction is designed to harness the power of machine learning to forecast
flight ticket prices with high precision . Utilizing vast datasets of historical flight information,
the project aims to construct predictive model that can serve as a decision-making tool for
both travellers planning their trips and airline managing their price strategies

1.2. Importance
The ability to predict flight prices has significant implications for the travel industry. For
consumers, it means the potential to secure cost-effective travel by identifying the best times
to purchase tickets. For airlines, it represents an opportunity to fine-tune pricing models,
enhance revenue management, and stay competitive in a fluctuating market.

1.3. Objectives

• To develop a predictive model that can accurately forecast flight prices.

• To analyse historical flight data to understand pricing patterns.

• To provide a tool that can help consumers and airlines make data-driven decisions
2. LITERATURE SURVEY

2.1. Previous Work

In this we focus is on summarizing the existing research and projects related to flight price
prediction. It involves a comprehensive review of academic papers, industry reports, and
existing systems that have attempted to predict flight prices. Previous work may include
studies that employed machine learning algorithms, statistical models, or hybrid approaches
to forecast airfare. The subsection should highlight the methodologies, datasets used, and key
findings of these studies. Additionally, it may discuss the limitations or gaps identified in the
literature, which the current project aims to address.

2.2. Techniques and Algorithms:

Here, the section delves into the various techniques and algorithms commonly used in flight
price prediction. It provides an overview of the different machine learning algorithms such as
linear regression, decision trees, random forests, support vector machines, and neural
networks that have been applied in predicting flight prices. Additionally, it discusses the
feature engineering methods and data preprocessing techniques utilized to prepare the input
data for these algorithms. The subsection may also explore any specific approaches or
modifications tailored to the unique characteristics of flight price data, such as seasonality,
route networks, and pricing dynamics.
2.3. Challenges

The challenges and complexities associated with flight price prediction.

It may include technical challenges such as data quality issues, missing values, and the high
dimensionality of feature space.

Additionally, the subsection examines the inherent uncertainty in predicting human behavior,
as travelers' purchasing decisions are influenced by a multitude of factors beyond historical
price trends. Strategies for mitigating these challenges, as well as potential research directions
to overcome them, are also discussed.

One of the primary challenges identified is the dynamic nature of flight pricing, influenced by
factors like Total Stops. Additionally, the need for real-time data processing and the handling
of missing data present significant hurdles.
3. METHODOLOGY

3.1. Data Collection:

The methodology for gathering the necessary data for the flight price prediction . It
involves identifying relevant sources of data, such as flight booking databases, airline
websites, online travel agencies, or publicly available datasets.the process of collecting flight-
related information, including departure and arrival airports, dates, times, airlines, ticket
prices, and any other relevant features. It may also address considerations such as data
privacy, data licensing agreements, and the frequency of data updates to ensure the timeliness
and legality of the collected data.

But now here , We download a datasets from Kaggle , These datasets are Uncleaned , not
ready to training our model to these Datasets .
Fig 3.1.1

3.2. Preprocessing:

The preprocessing steps applied to the raw data before feeding it into the predictive models.
It includes data cleaning processes to handle missing values, and inconsistencies in the
dataset. Additionally, data normalization or standardization techniques may be employed to
ensure that features are on a similar scale. The preprocessing stage may also involve encoding
categorical variables, handling datetime variables, and performing any necessary
transformations to make the data suitable for analysis. The subsection should provide clarity
on the rationale behind each preprocessing step and its impact on the quality of the data.
Fig3.2.1 :Data Preprocessing

3.3 Exploratory Data Analysis(EDA)

 Pair plot

Here , pair plot used to detect outlier of data Y-Label(price), and X-Label (Duration ,
total_stops)
Fig3.3.1 : pair plot

 Correlation Analysis
Used to find Correlation of Data between to each other,

Let as see some factors

 When Duration are increased Price Also Increased.

 When Total stops are Increased Price also Increased.

Fig.3.3.2

 Categories Distribution
Most Categories Jet Airways

fig3.3.3 : Category wise distribution

3.3. Feature Engineering

Here, the methodology for feature engineering is described, which involves selecting,
creating, or transforming the input variables to improve the predictive performance of the
models. This may include extracting relevant features from the raw data, such as day of the
week, time of day, or holiday indicators, which may influence flight prices. Feature selection
techniques, such as correlation analysis or recursive feature elimination, may be employed to
identify the most informative variables. Additionally, domain knowledge and insights from
the literature.

3.4. Model Verification

selecting suitable predictive models for flight price prediction. It involves evaluating
various machine learning algorithms, such as linear regression, random forests, support
vector machines, neural networks, to determine their performance on the preprocessed
dataset. Model verification criteria may include predictive accuracy, computational
efficiency, interpretability, and scalability to handle large volumes of data. Techniques such as
cross-validation and grid search may be employed to tune hyperparameters and optimize
model performance. The subsection discusses the rationale behind the choice of models and
the criteria used for model evaluation.

3.5.Deploy the Machine Learning Model

In this satge of Machine learning lifecycle , we apply to integrate machine learning model
into processed and applications . The ultimate aim of this stage is tha proper functionality of
the model after deployments .

3.6 Monitoring
It involve the involvements of safety measure for the assurance of proper operation of th
emodel during its lifecycle. It makes proper management

Fig 3.6.1: Machine Learning Life Cycle

4. Implementation

4.1. Hardware And Software Used

All computer Software needs certain hardware components or other Software resources to be
present on computer . These prerequisits are known as system requiremets .

1 – Hardware Requirements

 System Processor : Intel Core i3 or Higher

 Hard Disk : 512SSD
 RAM : 4.0 GB or higher
2. Software Requirements
 Operating System : Wndows 10
 Front-end : Streamlit
 Framework : Streamlit Framework
 IDE : Colab , VsCode

Streamlit is chosen for model deployment,

Additionally, other tools and libraries used for data preprocessing, feature engineering, and
evaluation should be listed. For instance, Python libraries such as Pandas, NumPy, Scikit-
learn, and TensorFlow may be mentioned for data manipulation, machine learning, and deep
learning tasks. The section should also specify the version of Streamlit and other
dependencies used to ensure reproducibility.
4.2. System Architecture

A system architecture is the conceptual model that defines the structure , behaviour ,and more
view of system , An architecture description is a formal discription and representation of a
system .

Here, the architecture of the system developed for flight price prediction are described.
Streamlit, as the chosen deployment platform, plays a central role in hosting the predictive
model and providing a user-friendly interface for interacting with it. The subsection may
discuss how the predictive model is integrated into the Streamlit application, including
loading the trained model, processing user input, generating predictions, and displaying
results. It may also detail any backend services or databases used to support the application,
such as APIs for fetching real-time flight data or caching mechanisms for improving
performance.

Collection of Data Processes the Prediction of Flight

user Data price

csv data sheet on Various

MeasurePerform Prediction
Data ance Evaluation Result
preprocessing

Applied Algorithm
Data

preprocessing

Fig 4.2.1; System Architecture

4.2.1Streamlit Application

Streamlit is employed as the deployment platform due to its ability to create interactive and
user-friendly web applications with minimal effort. The main components of the Streamlit
application include:

4.2.2 USER INTERFACE (UI)

Streamlit's UI elements (such as sliders, date pickers, and dropdown menus) enable users
to input their flight search criteria, such as airline , date_of_journey , source ,
destination , dep_time , total_stops .

4.2.3 MODEL INTEGRATION

The trained predictive model is loaded into the Streamlit application, allowing it to be utilized
for generating flight price predictions based on user input.

4.2.4 PREDICTIVE MODEL INTEGRATION

The integration of the predictive model into the Streamlit application involves several steps:

1. Loading the Trained Model

The predictive model, trained using historical flight data, is saved and loaded into the
Streamlit application upon startup. This ensures that the model is readily available for
generating predictions.

2. Processing User Input

User inputs from the UI are collected and preprocessed to match the format expected
by the predictive model. This may involve converting categorical variables into
numerical representations, normalizing continuous variables, and ensuring all required
features are present.

3. GENERATING PREDICTIONS

The pre-processed user input is fed into the predictive model, which generates a price
prediction for the specified flight criteria.

4. DISPLAYING RESULTS

The predicted flight prices are presented to the user in an easily interpretable format,
such as tables, charts, or summary statistics.

4.3. USER INTERFACE

The design and functionality of the user interface developed using Streamlit. It describes
the layout, features, and interactive elements provided to users for inputting query parameters
(e.g., departure airport, destination airport, date of travel) and viewing predicted flight prices.
The user interface should be intuitive, visually appealing, and responsive, with clear
instructions on how to use the application effectively.
4.4 CODING

4.4.1 Front end

import streamlit as st
import pandas as pd

#setting up the page title,icons

st.set_page_config(page_title="Flight Price
Predictor",page_icon="https://hips.hearstapps.com/hmg-prod/images/gettyimages-
1677184597.jpg?crop=0.668xw:1.00xh;0.167xw,0&resize=1200:*")
st.sidebar.title('MENU BAR')
choice=st.sidebar.selectbox(' ',('Home','Predict'))
st.sidebar.image('https://e0.pxfuel.com/wallpapers/209/716/desktop-wallpaper-
untitled-airplane-sky-aesthetic-travel.jpg')
st.sidebar.image('https://i.pinimg.com/736x/0d/1e/
96/0d1e967cde176af6f8f0568af424d07b.jpg')
if(choice=='Home'):
st.title('Welcome to Flight Price Predictor')
st.text('Hi. Want to predict your flight ticket price❓❓')
st.text('Click the Menu bar for further details')
st.image('https://wallpapers.com/images/featured/airport-
w6v47yjhxcohsjgf.jpg')
elif(choice=='Predict'):
st.text('Kindly fill your flight details to view the predicted price')
st.image('https://feeds.abplive.com/onecms/images/uploaded-images/
2021/09/08/634259599cd6f60c24f9e67a5680c064_original.jpg')
ch=st.selectbox('Airline',('Select','Vistara','Air India','Indigo','GO
FIRST','AirAsia','SpiceJet'))
if(ch=='Vistara'):
a=5
elif(ch=='Air India'):
a=1
elif(ch=='Indigo'):
a=3
elif(ch=='GO FIRST'):
a=2
elif(ch=='AirAsia'):
a=0
elif(ch=='SpiceJet'):
a=4
cg=st.selectbox('From',
('Select','Delhi','Mumbai','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cg=='Delhi'):
b=2
cx=st.selectbox('Destination',
('Select','Mumbai','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Mumbai'):
b=5
cx=st.selectbox('Destination',
('Select','Delhi','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Bangalore'):
b=0
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Kolkata','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Kolkata'):
b=4
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Hyderabad'):
b=3
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Kolkata','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Chennai'):
f=1
else:
b=1
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Kolkata','Hyderabad'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
cf=st.selectbox('Departure time',('Select','Morning','Early
Morning','Evening','Night','Afternoon','Late Night'))
if(cf=='Morning'):
c=4
elif(cf=='Early Morning'):
c=1
elif(cf=='Evening'):
c=2
elif(cf=='Night'):
c=5
elif(cf=='Afternoon'):
c=0
elif(cf=='Late Night'):
c=3
ci=st.selectbox('Stops',('Select','one','zero','two or more'))
if(ci=='one'):
d=0
elif(ci=='zero'):
d=2
elif(ci=='two or more'):
d=1
cs=st.selectbox('Arrival time',
('Select','Night','Evening','Morning','Afternoon','Early Morning','Late
Night'))
if(cs=='Night'):
e=5
elif(cs=='Evening'):
e=2
elif(cs=='Morning'):
e=4
elif(cs=='Afternoon'):
e=0
elif(cs=='Early Morning'):
e=1
elif(cs=='Late Night'):
e=3
cb=st.selectbox('Class',('Select','Economy','Business'))
if(cb=='Economy'):
g=1
else:
g=0
h=st.number_input('Duration')
i=st.number_input('Days left')
btn=st.button('Check')
if btn:
def decompress_pickle(file):
data = bz2.BZ2File(file, 'rb')
data = pickle.load(data)
return data
model = decompress_pickle('Flight.pbz2')
pred=model.predict([[a,b,c,d,e,f,g,h,i]])
st.write("The predicted price is:-",pred[0],'Rs')
st.header('Time to fly ✈🧳')
st.image('https://image.cnbcfm.com/api/v1/image/106537227-
1589463911434gettyimages-890234318.jpeg?v=1589463982&w=1600&h=900')
4.4.3 Prediction Price

# -- coding: utf-8 --

"""Flight.ipynb

Original file is located at

https://colab.research.google.com/drive/1jCVfWPfFdP3xGsbSiXZwEffsfwmsoyAu

Data Source: https://www.kaggle.com/datasets/shubhambathwal/flight-price-

prediction
"""

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

"""(1) Data Loading"""

flight_data=pd.read_csv('/content/drive/MyDrive/Clean_Dataset.csv')

# reading the 1st 3 rows of the dataset

flight_data.head(3)

"""As the column Unnmed: 0 is not needed, it is dropped"""

flight_data=flight_data.drop(columns=['Unnamed: 0'])

"""Reading the dataset"""

# reading the 1st 3 rows of the dataset

flight_data.head(3)

# reading the last 3 rows of the dataset

flight_data.tail(3)

"""(2) Exploratory Data Analysis

Dimensions of the dataset

"""

flight_data.shape
"""Checking the data types for each column"""

flight_data.dtypes

print('Null values:',flight_data.isnull().any().sum())
print('NaN values:', flight_data.isna().any().sum())
print('duplicates:',flight_data.duplicated().any().sum())

"""The dataset does not have any null, missing, duplicate values

a. Checking for no.of distinct values in each column in the dataset

"""

flight_data.nunique()

"""b. No.of flights per class - Economy and Business"""

sns.set(font_scale=0.7)
cl={'Economy':'green','Business':'blue'}
c=sns.countplot(data=flight_data,x='class',palette=cl)
for label in c.containers:
c.bar_label(label)

"""c. Total number of flights under each Airline and class"""

sns.set(font_scale=0.6)
plt.figure(figsize=(6,4))
col={'Economy':'red','Business':'green'}
a=sns.countplot(data=flight_data,x='airline',hue='class',palette=col)
for l in a.containers:
a.bar_label(l)
plt.title('Flight counts per airline')
plt.xlabel('Airline')
plt.ylabel('Total number of flights')

"""1. Among the six airlines, only Vistara and Air India have both classes
Economy and Business
2. And the airline Vistara has the highest no.of flights from both classes
3. Spicejet is the airline which has lowest no.of flights

d. Plotting No.of flights per cities and class category

"""

sns.set(font_scale=0.5) # setting the font scale

plt.figure(figsize=(10,8)) # setting the chart size

plt.subplot(1,2,1) # 1st plot in the subplot

col={'Economy':'purple','Business':'green'}
ax=sns.countplot(data=flight_data,x='source_city',hue='class',palette=col)
plt.title('No.of flights per source city')
plt.xlabel('Source Cities')
plt.ylabel('No.of flights')
for label in ax.containers:
ax.bar_label(label) # adding label to the bars

plt.subplot(1,2,2) # 2nd plot in the sub plot

col={'Economy':'purple','Business':'green'}
bx=sns.countplot(data=flight_data,x='destination_city',hue='class',palette=col
)
sns.move_legend(bx,"right")
plt.title('No.of flights per destination city')
plt.xlabel('Destination Cities')
plt.ylabel('No.of flights')
for c in bx.containers:
bx.bar_label(c)
plt.show()

"""From both charts,

* Economy class:- Delhi has the highest number, and
* Business class:- Mumbai is the city with highest no.of flights

e. Statistical info of the dataset

"""

flight_data.describe()

"""f. Viewing ticket price by each airline and class"""

flight_data[['airline','price','class']].sort_values(by='price',ascending=Fals
e)

"""Among the various airlines, Vistara charges highest price under the
business class.

g. Ticket price vs class based on different airlines

"""

sns.set(font_scale=0.7)
plt.figure(figsize=(9,9))
x=sns.barplot(data=flight_data,x='class',y='price',hue='airline',errorbar=None
)
for i in x.containers:
x.bar_label(i)
plt.xlabel('Class')
plt.ylabel('Ticket Price')
plt.title('Flight ticket price vs class based on each airline')

"""The ticket price charged by Vistara is the highest under both classes, and
AirAsia offers the lowest under Economy class.

h. Plotting No.of flights per class under different departure and arrival
time.
"""

sns.set(font_scale=0.7)
plt.figure(figsize=(8,6))

plt.subplot(2,1,1)
cl=sns.countplot(data=flight_data,x='departure_time',hue='class')
for l in cl.containers:
cl.bar_label(l)

plt.subplot(2,1,2)
cl=sns.countplot(data=flight_data,x='arrival_time',hue='class')
for l in cl.containers:
cl.bar_label(l)

"""This graph shows that, more morning flights are departed as well as more
night flights arrive at the airport.

i. Analysing ticket price vs destination and source cities base on each class
"""

sns.set(font_scale=0.7)
plt.figure(figsize=(8,6))

plt.subplot(2,1,1)
cl=sns.barplot(data=flight_data,x='destination_city',y='price',hue='class')
for l in cl.containers:
cl.bar_label(l)

plt.subplot(2,1,2)
cl=sns.barplot(data=flight_data,x='source_city',y='price',hue='class')
for l in cl.containers:
cl.bar_label(l)

"""Kolkata's flight is the costliest

j. Analysing duration of flights

"""
flight_data['duration'].describe()

# Row numbers of flights with minimum duration

flight_data[flight_data['duration']== 49.830000].index

# Row numbers of flights with maximum duration

flight_data[flight_data['duration']== 0.830000].index

"""(4) Feature Engineering

1. Checking for outliers in price column

"""

sns.boxplot(data=flight_data,x='price')

"""From the boxplot, we can infer that, the flight ticket price falls in the
range of 0 to 100000 only, whereas there are few outliers that is beyond the
value of 120000. Since, the dataset is large enough, the outliers are removed
from the data in order to develop a proper model for the prediction.

"""

f_out=flight_data[flight_data['price']>=100000].index
flight_data=flight_data.drop(index=f_out)

sns.boxplot(x=flight_data['price'])

flight_data.shape

flight_data[['destination_city','price']].groupby('destination_city').max()

flight_data[flight_data['price']==99680]

flight_data.head(2)

"""Vistara offers Business Class at the highest ticket price to the city
Mumbai flies from Bangalore with duration of 14.42 at Rs 99680.

2. Removing unnecessary columns

"""

flight_data=flight_data.drop(columns='flight')

"""3. Encoded multi columns containing categorical varibles at once"""

from sklearn.preprocessing import LabelEncoder

df=flight_data.iloc[:,:7] # poisition of columns that have categorical

variables

# Encoding:
enc_all_cols=df.apply(LabelEncoder().fit_transform)

#Concating with the remaining columns of the dataset

df_enc=pd.concat([enc_all_cols,flight_data.iloc[:,-3:]],axis=1)

# reading the first 2 rows of the dataframe which now has encoded data and
ready for train test split
df_enc.head(2)

"""(5) Model Building

Train test split

"""

from sklearn.model_selection import train_test_split

X = df_enc.drop(columns='price') # feature
y=df_enc['price'] # target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
random_state=0)
print('X_train size: {}, X_test size: {}'.format(X_train.shape, X_test.shape))
print('y_train size: {}, y_test size: {}'.format(y_train.shape, y_test.shape))

"""Finding the best model with the help of GridSearchCV"""

from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

model_params={
'LR':{
'model':LinearRegression(),
'params':{

}
},
'KNR':{
'model':KNeighborsRegressor(),
'params':{
'n_neighbors':[2,5,10]
}
},
'RFR':{
'model':RandomForestRegressor(),
'params':{
'n_estimators':[5,10,20]
}
}
}

from sklearn.model_selection import ShuffleSplit

scores=[]
cv = ShuffleSplit(n_splits=5, test_size=0.20, random_state=0)
for model,mp in model_params.items():
clf=GridSearchCV(mp['model'],mp['params'],cv=cv,return_train_score=False)
clf.fit(X,y)
scores.append({
'model':model,
'best score':clf.best_score_,
'best params':clf.best_params_
})

dd=pd.DataFrame(scores,columns=['model','best score','best params'])

"""Among the 3 models used, Random Forest Regressor gives the highest score.

Hence, a model with the Random Forest Regression is built and evaluated.
"""

from sklearn.model_selection import cross_val_score

cv=ShuffleSplit(n_splits=5,test_size=0.2)
s=cross_val_score(RandomForestRegressor(n_estimators=20),X,y,cv=cv)
print('Average Accuracy : {}%'.format(round(sum(s)*100/len(s)), 3))

rf=RandomForestRegressor(n_estimators=20)

rf.fit(X_train,y_train)

r_pred=rf.predict(X_test)

"""evaluating the model"""

from sklearn import metrics

metrics.r2_score(r_pred,y_test)
"""Looking for the labels of the categorical columns- For reference (Since the
columns are encoded)"""

print('AIRLINE')
print(flight_data['airline'].value_counts())
print(X['airline'].value_counts())
print('\n')
print('SOURCE CITY')
print(flight_data['source_city'].value_counts())
print(X['source_city'].value_counts())
print('\n')
print('DEPARTURE TIME')
print(flight_data['departure_time'].value_counts())
print(X['departure_time'].value_counts())

print('STOPS')
print(flight_data['stops'].value_counts())
print(X['stops'].value_counts())
print('\n')
print('ARRIVAL TIME')
print(flight_data['arrival_time'].value_counts())
print(X['arrival_time'].value_counts())
print('\n')
print('DESTINATION CITY')
print(flight_data['destination_city'].value_counts())
print(X['destination_city'].value_counts())
print('\n')
print('CLASS')
print(flight_data['class'].value_counts())
print(X['class'].value_counts())

flight_data.sample(1)

"""Testing the model with values"""

print('Price:',rf.predict([[5,5,5,0,4,3,0,12.75,35]])) # org= 64700 -

X_train
print('Price:',rf.predict([[5,5,4,0,4,3,1,24.0,48]]))# org= 3334 - X_train
print('Price:',rf.predict([[2,0,5,0,4,2,1,9.67,34]]))# org = 3826 - X_test
print('Price:',rf.predict([[1,5,0,0,4,0,0,17.33,29]])) #org = 54608 X_test

"""As per the model evaluation, the prediction is around 99% accurate.
Therefore, for flight prediction, 'rf' the model is chosen.

(6) Saving the model

"""
import pickle

flight_data.sample(1)

"""**"""

pickle_out1 = open("rfreg.pkl", "wb")

pickle.dump(rf, pickle_out1)
pickle_out1.close()

"""**"""

filename='trained_model.sav'
pickle.dump(rf,open(filename,'wb'))

"""Checking/loading the model"""

load=pickle.load(open('trained_model.sav','rb'))

load.predict([[5,5,4,0,4,3,1,24.0,48]])
4.5 Testing

Testing in a machine learning project is a crucial step to ensure that the model performs as
expected and generalizes well to new, unseen data. It involves several practices:

4.5.1 Type of Testing

1 – Unit Testing

Unit testing involves checking the correctness of individual components within the ML
pipeline. This could include testing data preprocessing functions, individual algorithms, or
other discrete parts of the ML system

2 - Integration Testing
Integration testing checks the combined functionality of these individual components. It
ensures that when these components work together, they produce the expected results

3 - System Testing

System testing evaluates the complete and integrated ML system to verify that it meets the
specified requirements. This includes testing the model’s performance on unseen data and
ensuring that it integrates well with other systems.

4.7 Manual Testing

Test Case 1 Predict Price based on entered Data
Test Discription The user enters Data for a given Dataset
Requirements Verified Yes

Test Environment System should be connect on internet

Expected result The user find Price of Flights Based on Given input
Pass/Fail Pass
Note Successfully Executed

Fig 4.7.1
5. Results and Discussion
This Section represents the proposed system results which can predict the price of Flight
Accurately and with high reliability then the existing system ,

The result are obtained by various machine learning algorithm , In this project we use
xgBOOSt Machine learning algorithm ,

These system also have an elegent interface which takes all the neccesary inputs for the
evaluation and to facilitate with is very easy to use . The final result of our proposed sustem
can be viewed by GUI .

Fig5.1 Front end Page

Fig 5.2 : All Data Selection
Fig 5.3 Send Data
Fig5.4 : Prediction price

5.1. Comparison
Here, the performance of the developed flight price prediction model(s) is compared with
existing methods or benchmarks. This could involve comparing the predictive accuracy,
computational efficiency, or other relevant metrics against baseline models or state-of-the-art
approaches reported in the literature. The subsection may also discuss how the proposed
model(s) fare against commercial flight booking websites or other publicly availablprediction
services.
6. CONCLUSION

6.1. SUMMARY

This subsection provides a concise summary of the key findings and contributions of the
flight price prediction project. It recaps the objectives outlined in the introduction and
summarizes how they were addressed throughout the project. The summary may include a
brief overview of the methodology employed, the predictive models developed, and the main
results obtained. Additionally, it highlights any novel insights or advancements made in the
field of flight price prediction as a result of the project.

6.2. ACHIEVEMENTS

Here, the subsection discusses the achievements and contributions of the project. It outlines
the specific outcomes or milestones reached during the course of the project, such as the
development of accurate predictive models, the implementation of a user-friendly interface
using Streamlit, or the generation of actionable insights for stakeholders in the aviation and
travel industries. Achievements may be evaluated in terms of technical innovation, practical
utility, or societal impact, depending on the project's goals and objectives.
6.3. FUTURE WORK

This subsection explores potential avenues for future research and development based on
the findings and limitations of the flight price prediction project. It identifies areas where
further improvements or enhancements could be made to advance the state-of-the-art in flight
prediction. Future work may include refining predictive models by incorporating additional
data sources or features, exploring advanced machine learning techniques such as deep
learning or ensemble methods, or conducting longitudinal studies to evaluate model
performance over time. Additionally, opportunities for collaboration with industry partners or
academic researchers may be discussed to validate and extend the project's findings in real-
world settings.

Overall, the Conclusion section serves as a culmination of the flight price prediction project,
summarizing its main outcomes, highlighting achievements, and outlining directions for
future research and development. It provides closure to the project while laying the
groundwork for continued exploration and innovation in the field of airfare prediction.
REFERENCES

1. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends,

perspectives, and prospects. Science, 349(6245), 255-260.

2. Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The
elements of statistical learning:
data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.

3. Valiant, L. G. (1984). A theory of the learnable. Communications of the

ACM, 27(11), 1134-1142.

4. Rao, N. S. S. V. S., & Thangaraj, S. J. J. (2023, April). Flight Ticket

Prediction using Random Forest , Regressor Compared with Decision Tree
Regressor. In 2023 Eighth International Conference on Science
Technology Engineering and Mathematics (ICONSTEM) (pp. 1-5). IEEE.

5. Burger, B., & Fuchs, M. (2005). Dynamic pricing—A future airline business
model. Journal of Revenue
and Pricing Management, 4(1), 39-53.

6. Malighetti, P., Paleari, S., & Redondi, R. (2010). Has Ryanair's pricing
strategy changed over time? An empirical analysis of its 2006–2007 flights.
Tourism management, 31(1), 36-44.
7. Liu, T., Cao, J., Tan, Y., & Xiao, Q. (2017, December). ACER: An adaptive
context-aware ensemble regression model for airfare price prediction. In 2017
International Conference on Progress in Informatics and Computing (PIC) (pp.
312-317). IEEE.
8. Tziridis, K., Kalampokas, T., Papakostas, G. A., & Diamantaras, K. I. (2017,
August). Airfare prices prediction using machine learning techniques. In 2017
25th European Signal Processing Conference (EUSIPCO) (pp. 1036-1039).
IEEE.
9. Can, Y. S., & Alagöz, F. (2023, October). Predicting Local Airfare Prices with
Deep Transfer Learning Technique. In 2023 Innovations in Intelligent Systems
and Applications Conference (ASYU) (pp. 1-4
BIOGRAPHY

Mohd Huzaif(2003820100026) is a computer Science Student . .He is

currently persuing a four-year Bachelor of technology degree in Computer
Science and Engineering at Kamla Nehru Institute of Physical and social
sciences Faridipur , Sultanpur He is working on the Flight price
prediction using Machine Learning .

Flight Fare Prediction Final
No ratings yet
Flight Fare Prediction Final
65 pages
Crop - Diagnosis - System
No ratings yet
Crop - Diagnosis - System
53 pages
Black Book On Automatic TimeTable Generator
80% (5)
Black Book On Automatic TimeTable Generator
59 pages
De Tia-Pro1 en 01 V130100
100% (1)
De Tia-Pro1 en 01 V130100
433 pages
DCCCCCCCCCCC
No ratings yet
DCCCCCCCCCCC
41 pages
Report
No ratings yet
Report
42 pages
Indian Airline Ticket Price Analysis
No ratings yet
Indian Airline Ticket Price Analysis
60 pages
Final RSR Word Report
No ratings yet
Final RSR Word Report
63 pages
Cryptocurrency Price Prediction Using Deep Learning
No ratings yet
Cryptocurrency Price Prediction Using Deep Learning
52 pages
Diabetes Prediction Using Machine Learning Classification Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Classification Techniques
34 pages
Ensemble Approach On Customer Churn Prediction
No ratings yet
Ensemble Approach On Customer Churn Prediction
11 pages
Final Doc Fin
No ratings yet
Final Doc Fin
87 pages
Tejasinterncontent
No ratings yet
Tejasinterncontent
6 pages
Thesis
No ratings yet
Thesis
73 pages
Sat - 67.Pdf - Human Activity Recognition With Smartphones Using Machine Learning Process
No ratings yet
Sat - 67.Pdf - Human Activity Recognition With Smartphones Using Machine Learning Process
11 pages
Project R 19
No ratings yet
Project R 19
94 pages
66
No ratings yet
66
82 pages
Reg - No: 91009534002 Of: in Partial Fulfillment of The Requirement For The Award of The Degree
No ratings yet
Reg - No: 91009534002 Of: in Partial Fulfillment of The Requirement For The Award of The Degree
50 pages
Sat - 19.Pdf - Prediction of Network Attacks Using Superrvised Machine Learning Algorithm
No ratings yet
Sat - 19.Pdf - Prediction of Network Attacks Using Superrvised Machine Learning Algorithm
11 pages
Predicting Health Insurance Claim Frauds Using Machine Learning
No ratings yet
Predicting Health Insurance Claim Frauds Using Machine Learning
11 pages
Medical Kidney Images Diagnosis Using Association Rule Based Neural Network
No ratings yet
Medical Kidney Images Diagnosis Using Association Rule Based Neural Network
5 pages
Project Report
No ratings yet
Project Report
27 pages
New Report
No ratings yet
New Report
73 pages
Telecom Report
No ratings yet
Telecom Report
45 pages
Agriculture Crop Recommendation System Using Machine Learning
No ratings yet
Agriculture Crop Recommendation System Using Machine Learning
11 pages
Project Report
No ratings yet
Project Report
27 pages
Facemask Detection Using Convolutional Neural Networks
No ratings yet
Facemask Detection Using Convolutional Neural Networks
11 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
50 pages
Agriculture Crop Recommendation System Using
No ratings yet
Agriculture Crop Recommendation System Using
57 pages
Coronavirus Disease (Covid-19) Cases Analysis Using Machine Learning
No ratings yet
Coronavirus Disease (Covid-19) Cases Analysis Using Machine Learning
11 pages
Contents
No ratings yet
Contents
2 pages
Objectfy 1
No ratings yet
Objectfy 1
54 pages
content part_merged
No ratings yet
content part_merged
76 pages
predic edited
No ratings yet
predic edited
41 pages
Wordprediction Reportfinal
No ratings yet
Wordprediction Reportfinal
45 pages
Intelligent Computational Systems: A Multi-Disciplinary Perspective
From Everand
Intelligent Computational Systems: A Multi-Disciplinary Perspective
Faria Nassiri-Mofakham
No ratings yet
A Machine Learning Project Report Fake News Prediction (1) (1)
No ratings yet
A Machine Learning Project Report Fake News Prediction (1) (1)
24 pages
Intern Report
No ratings yet
Intern Report
43 pages
REPORT HFP
No ratings yet
REPORT HFP
71 pages
FOOD CLASSIFICATION USING KERAS Final
No ratings yet
FOOD CLASSIFICATION USING KERAS Final
21 pages
Final Doc Fin PDF
No ratings yet
Final Doc Fin PDF
87 pages
17BIT008
No ratings yet
17BIT008
19 pages
Batch Num 11 PDF
No ratings yet
Batch Num 11 PDF
86 pages
Handbook of Artificial Intelligence
From Everand
Handbook of Artificial Intelligence
Dumpala Shanthi
No ratings yet
Ml Report Final
No ratings yet
Ml Report Final
37 pages
Kiran Kumar Mini
No ratings yet
Kiran Kumar Mini
113 pages
Front Pages1
No ratings yet
Front Pages1
6 pages
1922 B.SC Cs Batchno 24
No ratings yet
1922 B.SC Cs Batchno 24
64 pages
A Novel Image Style Transfer Model Using Generative AI
No ratings yet
A Novel Image Style Transfer Model Using Generative AI
72 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
21 pages
Project Final1 Sirr
No ratings yet
Project Final1 Sirr
37 pages
Sinemn Pro
No ratings yet
Sinemn Pro
54 pages
Paper 5
No ratings yet
Paper 5
44 pages
Final Report Phase-1
No ratings yet
Final Report Phase-1
23 pages
Group 11 Final Book
No ratings yet
Group 11 Final Book
56 pages
1822 B.tech It Batchno 358
No ratings yet
1822 B.tech It Batchno 358
119 pages
A15 Final Document
No ratings yet
A15 Final Document
68 pages
Sat - 33.Pdf - Recognition and Listing of Acute Stroke Progression Based On Oct Images Using Curvelet Analysis
No ratings yet
Sat - 33.Pdf - Recognition and Listing of Acute Stroke Progression Based On Oct Images Using Curvelet Analysis
11 pages
Pro Front
No ratings yet
Pro Front
6 pages
Minor Project (7-37)
No ratings yet
Minor Project (7-37)
31 pages
Students Project Report Coverage (V1.1) : The Following Sequence Should Be Followed and Maintained
No ratings yet
Students Project Report Coverage (V1.1) : The Following Sequence Should Be Followed and Maintained
67 pages
Getting Started With Micropython On The Raspberry Pi Pico
100% (2)
Getting Started With Micropython On The Raspberry Pi Pico
15 pages
Mizan Tepi University
No ratings yet
Mizan Tepi University
14 pages
Data Analysis Project Plan Template
No ratings yet
Data Analysis Project Plan Template
8 pages
Eze Onyedika Hillary: Education
No ratings yet
Eze Onyedika Hillary: Education
3 pages
Phase Locked Loop FM Detector (PLL FM Demodulator) : Rohini College of Engineering and Technology
No ratings yet
Phase Locked Loop FM Detector (PLL FM Demodulator) : Rohini College of Engineering and Technology
9 pages
How To Reset Epson Printer To Factory Settings
No ratings yet
How To Reset Epson Printer To Factory Settings
6 pages
CheatSheet Sky11 Linux
No ratings yet
CheatSheet Sky11 Linux
2 pages
Online Voting
No ratings yet
Online Voting
11 pages
Using ZeroTier VPN with Enigma 2 receivers - Satnigmo.com - Enigma2 stuff
No ratings yet
Using ZeroTier VPN with Enigma 2 receivers - Satnigmo.com - Enigma2 stuff
2 pages
CHANDANKUMAR
No ratings yet
CHANDANKUMAR
15 pages
Web Access To Email With Office 365
No ratings yet
Web Access To Email With Office 365
12 pages
Labview 3d Control Simulation Using Solidworks 3d Models 4
100% (1)
Labview 3d Control Simulation Using Solidworks 3d Models 4
4 pages
TVL CSS11 Q4 M7
No ratings yet
TVL CSS11 Q4 M7
13 pages
Iat1 Oops
No ratings yet
Iat1 Oops
15 pages
SmartPosAndroidSDK EMV - English - V2.0.0 - 20190312
No ratings yet
SmartPosAndroidSDK EMV - English - V2.0.0 - 20190312
30 pages
Pac-E100 Modbus Map 2
No ratings yet
Pac-E100 Modbus Map 2
26 pages
JN0-664-Demo
No ratings yet
JN0-664-Demo
9 pages
Download the updated Data Communications and Computer Networks A Business Users Approach 8th Edition White Solutions Manual (PDF) containing all chapters.
100% (3)
Download the updated Data Communications and Computer Networks A Business Users Approach 8th Edition White Solutions Manual (PDF) containing all chapters.
50 pages
DE10-Lite User Manual: June 5, 2020
No ratings yet
DE10-Lite User Manual: June 5, 2020
62 pages
Research Briefs: Lynne L. Ornes, PHD, RN and Carole Gassert, PHD, RN, Facmi, Faan
No ratings yet
Research Briefs: Lynne L. Ornes, PHD, RN and Carole Gassert, PHD, RN, Facmi, Faan
5 pages
MC EDT Designee Maintenance Procedure Summary: Ministry of Health and Long-Term Care
No ratings yet
MC EDT Designee Maintenance Procedure Summary: Ministry of Health and Long-Term Care
6 pages
FIT9136 Catchup Cheat Sheet
No ratings yet
FIT9136 Catchup Cheat Sheet
6 pages
OXO Connect 3.1 SD HardwarePlatformandInterfaces 8AL91201USAG 1 en
No ratings yet
OXO Connect 3.1 SD HardwarePlatformandInterfaces 8AL91201USAG 1 en
158 pages
Midterm Last2 1
No ratings yet
Midterm Last2 1
4 pages
Infrastructure Testing Training v1.0
No ratings yet
Infrastructure Testing Training v1.0
96 pages
2400 BC Abacus
No ratings yet
2400 BC Abacus
12 pages
Principles of information systems 14th Edition Ralph M. Stair download pdf
100% (4)
Principles of information systems 14th Edition Ralph M. Stair download pdf
66 pages
CSG Tutorials Explain Each of The Following Types of Processor
No ratings yet
CSG Tutorials Explain Each of The Following Types of Processor
11 pages
Experiment No 1 Analysis of Control System Parameters
No ratings yet
Experiment No 1 Analysis of Control System Parameters
6 pages

Finalproject Report Flight Price

Uploaded by

Finalproject Report Flight Price

Uploaded by

TABLE OF CONTENTS

CHAPTER 2. LITERATURE SURVEY 5

2.1. Previous Work 5

2.2. Techniques and Algorithms 6

3.1. Data Collection 7

3.4. Feature Engineering 10

3.5. Model Verification 11

4.1. Tools Used 15

4.2. System Architecture 16

4.3. User Interface 17

4.4.1 Front end page 26

4.4.2 Prediction Page 31

4.5.1 Type of Testing 33

4.6 System testing 34

4.7 Manual Testing 35

CHAPTER 5. RESULTS AND DISCUSSION 36

5.1. Performance Evaluation 37

6.3. Future Work 41

Machine learning in computer science attempts to solve problems algorithmically rather

• To develop a predictive model that can accurately forecast flight prices.

• To analyse historical flight data to understand pricing patterns.

2.1. Previous Work

2.2. Techniques and Algorithms:

The challenges and complexities associated with flight price prediction.

3.1. Data Collection:

3.3 Exploratory Data Analysis(EDA)

Let as see some factors

 When Duration are increased Price Also Increased.

fig3.3.3 : Category wise distribution

3.4. Model Verification

3.5.Deploy the Machine Learning Model

Fig 3.6.1: Machine Learning Life Cycle

4.1. Hardware And Software Used

 System Processor : Intel Core i3 or Higher

Streamlit is chosen for model deployment,

Collection of Data Processes the Prediction of Flight

csv data sheet on Various

Fig 4.2.1; System Architecture

4.2.2 USER INTERFACE (UI)

4.2.3 MODEL INTEGRATION

4.2.4 PREDICTIVE MODEL INTEGRATION

1. Loading the Trained Model

2. Processing User Input

4.3. USER INTERFACE

4.4.1 Front end

#setting up the page title,icons

# -*- coding: utf-8 -*-

Original file is located at

Data Source: https://www.kaggle.com/datasets/shubhambathwal/flight-price-

"""(1) **Data Loading**"""

# reading the 1st 3 rows of the dataset

"""As the column Unnmed: 0 is not needed, it is dropped"""

"""**Reading the dataset**"""

# reading the 1st 3 rows of the dataset

# reading the last 3 rows of the dataset

"""(2) **Exploratory Data Analysis**

Dimensions of the dataset

a. Checking for no.of distinct values in each column in the dataset

"""b. No.of flights per class - Economy and Business"""

"""c. Total number of flights under each Airline and class"""

d. Plotting No.of flights per cities and class category

sns.set(font_scale=0.5) # setting the font scale

plt.subplot(1,2,1) # 1st plot in the subplot

plt.subplot(1,2,2) # 2nd plot in the sub plot

"""From both charts,

e. Statistical info of the dataset

"""f. Viewing ticket price by each airline and class"""

g. Ticket price vs class based on different airlines

"""Kolkata's flight is the costliest

j. Analysing duration of flights

# Row numbers of flights with minimum duration

# Row numbers of flights with maximum duration

"""(4) Feature Engineering

1. Checking for outliers in price column

2. Removing unnecessary columns

"""3. Encoded multi columns containing categorical varibles at once"""

# -- coding: utf-8 --

"""(1) Data Loading"""

"""Reading the dataset"""

"""(2) Exploratory Data Analysis