Finalproject Report Flight Price
Finalproject Report Flight Price
DECLARATION ii
CERTIFICATE iii
ACKNOWLEDGEMENT iv
ABSTRACT V
LIST OF TABLE vi
CHAPTER 1. INTRODUCTION
1.1. Overview 1
1.2. Importance 2
1.3. Objectives 3
2.3. Challenges 6
CHAPTER 3. METHODOLOGY 7
3.2. Preprocessing 8
3.3 EDA 9
3.7 Monitoring 13
CHAPTER 4. IMPLEMENTATION 15
4.4 Coding 23
4.5Testing 32
5.2. Comparison 38
CHAPTER 6. CONCLUSION 39
6.1. Summary 40
6.2. Achievement 41
8 .BIOGRAPHY
1. Introduction
Machine learning is a subfield of Artificial Intelligence (AI) that works with
algorithms and technologies to extract useful information from data.
Machine learning methods are appropriate in big data since attempting to manually
process vast volumes of data would be impossible without the support of machines.
There are two general groups in machine learning which are supervised and
unsupervised. Supervised is where the program gets trained on pre-determined set to be
able to predict when a new data is given. Unsupervised is where the program tries to find
the relationship and the hidden pattern between the data.
1.1Overview: -
The Flight Price prediction is designed to harness the power of machine learning to forecast
flight ticket prices with high precision . Utilizing vast datasets of historical flight information,
the project aims to construct predictive model that can serve as a decision-making tool for
both travellers planning their trips and airline managing their price strategies
1.2. Importance
The ability to predict flight prices has significant implications for the travel industry. For
consumers, it means the potential to secure cost-effective travel by identifying the best times
to purchase tickets. For airlines, it represents an opportunity to fine-tune pricing models,
enhance revenue management, and stay competitive in a fluctuating market.
1.3. Objectives
• To provide a tool that can help consumers and airlines make data-driven decisions
2. LITERATURE SURVEY
In this we focus is on summarizing the existing research and projects related to flight price
prediction. It involves a comprehensive review of academic papers, industry reports, and
existing systems that have attempted to predict flight prices. Previous work may include
studies that employed machine learning algorithms, statistical models, or hybrid approaches
to forecast airfare. The subsection should highlight the methodologies, datasets used, and key
findings of these studies. Additionally, it may discuss the limitations or gaps identified in the
literature, which the current project aims to address.
Here, the section delves into the various techniques and algorithms commonly used in flight
price prediction. It provides an overview of the different machine learning algorithms such as
linear regression, decision trees, random forests, support vector machines, and neural
networks that have been applied in predicting flight prices. Additionally, it discusses the
feature engineering methods and data preprocessing techniques utilized to prepare the input
data for these algorithms. The subsection may also explore any specific approaches or
modifications tailored to the unique characteristics of flight price data, such as seasonality,
route networks, and pricing dynamics.
2.3. Challenges
It may include technical challenges such as data quality issues, missing values, and the high
dimensionality of feature space.
Additionally, the subsection examines the inherent uncertainty in predicting human behavior,
as travelers' purchasing decisions are influenced by a multitude of factors beyond historical
price trends. Strategies for mitigating these challenges, as well as potential research directions
to overcome them, are also discussed.
One of the primary challenges identified is the dynamic nature of flight pricing, influenced by
factors like Total Stops. Additionally, the need for real-time data processing and the handling
of missing data present significant hurdles.
3. METHODOLOGY
The methodology for gathering the necessary data for the flight price prediction . It
involves identifying relevant sources of data, such as flight booking databases, airline
websites, online travel agencies, or publicly available datasets.the process of collecting flight-
related information, including departure and arrival airports, dates, times, airlines, ticket
prices, and any other relevant features. It may also address considerations such as data
privacy, data licensing agreements, and the frequency of data updates to ensure the timeliness
and legality of the collected data.
But now here , We download a datasets from Kaggle , These datasets are Uncleaned , not
ready to training our model to these Datasets .
Fig 3.1.1
3.2. Preprocessing:
The preprocessing steps applied to the raw data before feeding it into the predictive models.
It includes data cleaning processes to handle missing values, and inconsistencies in the
dataset. Additionally, data normalization or standardization techniques may be employed to
ensure that features are on a similar scale. The preprocessing stage may also involve encoding
categorical variables, handling datetime variables, and performing any necessary
transformations to make the data suitable for analysis. The subsection should provide clarity
on the rationale behind each preprocessing step and its impact on the quality of the data.
Fig3.2.1 :Data Preprocessing
Pair plot
Here , pair plot used to detect outlier of data Y-Label(price), and X-Label (Duration ,
total_stops)
Fig3.3.1 : pair plot
Correlation Analysis
Used to find Correlation of Data between to each other,
Fig.3.3.2
Categories Distribution
Most Categories Jet Airways
Here, the methodology for feature engineering is described, which involves selecting,
creating, or transforming the input variables to improve the predictive performance of the
models. This may include extracting relevant features from the raw data, such as day of the
week, time of day, or holiday indicators, which may influence flight prices. Feature selection
techniques, such as correlation analysis or recursive feature elimination, may be employed to
identify the most informative variables. Additionally, domain knowledge and insights from
the literature.
selecting suitable predictive models for flight price prediction. It involves evaluating
various machine learning algorithms, such as linear regression, random forests, support
vector machines, neural networks, to determine their performance on the preprocessed
dataset. Model verification criteria may include predictive accuracy, computational
efficiency, interpretability, and scalability to handle large volumes of data. Techniques such as
cross-validation and grid search may be employed to tune hyperparameters and optimize
model performance. The subsection discusses the rationale behind the choice of models and
the criteria used for model evaluation.
3.6 Monitoring
It involve the involvements of safety measure for the assurance of proper operation of th
emodel during its lifecycle. It makes proper management
All computer Software needs certain hardware components or other Software resources to be
present on computer . These prerequisits are known as system requiremets .
1 – Hardware Requirements
Additionally, other tools and libraries used for data preprocessing, feature engineering, and
evaluation should be listed. For instance, Python libraries such as Pandas, NumPy, Scikit-
learn, and TensorFlow may be mentioned for data manipulation, machine learning, and deep
learning tasks. The section should also specify the version of Streamlit and other
dependencies used to ensure reproducibility.
4.2. System Architecture
A system architecture is the conceptual model that defines the structure , behaviour ,and more
view of system , An architecture description is a formal discription and representation of a
system .
Here, the architecture of the system developed for flight price prediction are described.
Streamlit, as the chosen deployment platform, plays a central role in hosting the predictive
model and providing a user-friendly interface for interacting with it. The subsection may
discuss how the predictive model is integrated into the Streamlit application, including
loading the trained model, processing user input, generating predictions, and displaying
results. It may also detail any backend services or databases used to support the application,
such as APIs for fetching real-time flight data or caching mechanisms for improving
performance.
Applied Algorithm
Data
preprocessing
Streamlit is employed as the deployment platform due to its ability to create interactive and
user-friendly web applications with minimal effort. The main components of the Streamlit
application include:
Streamlit's UI elements (such as sliders, date pickers, and dropdown menus) enable users
to input their flight search criteria, such as airline , date_of_journey , source ,
destination , dep_time , total_stops .
The trained predictive model is loaded into the Streamlit application, allowing it to be utilized
for generating flight price predictions based on user input.
The integration of the predictive model into the Streamlit application involves several steps:
The predictive model, trained using historical flight data, is saved and loaded into the
Streamlit application upon startup. This ensures that the model is readily available for
generating predictions.
User inputs from the UI are collected and preprocessed to match the format expected
by the predictive model. This may involve converting categorical variables into
numerical representations, normalizing continuous variables, and ensuring all required
features are present.
3. GENERATING PREDICTIONS
The pre-processed user input is fed into the predictive model, which generates a price
prediction for the specified flight criteria.
4. DISPLAYING RESULTS
The predicted flight prices are presented to the user in an easily interpretable format,
such as tables, charts, or summary statistics.
The design and functionality of the user interface developed using Streamlit. It describes
the layout, features, and interactive elements provided to users for inputting query parameters
(e.g., departure airport, destination airport, date of travel) and viewing predicted flight prices.
The user interface should be intuitive, visually appealing, and responsive, with clear
instructions on how to use the application effectively.
4.4 CODING
import streamlit as st
import pandas as pd
st.set_page_config(page_title="Flight Price
Predictor",page_icon="https://hips.hearstapps.com/hmg-prod/images/gettyimages-
1677184597.jpg?crop=0.668xw:1.00xh;0.167xw,0&resize=1200:*")
st.sidebar.title('MENU BAR')
choice=st.sidebar.selectbox(' ',('Home','Predict'))
st.sidebar.image('https://e0.pxfuel.com/wallpapers/209/716/desktop-wallpaper-
untitled-airplane-sky-aesthetic-travel.jpg')
st.sidebar.image('https://i.pinimg.com/736x/0d/1e/
96/0d1e967cde176af6f8f0568af424d07b.jpg')
if(choice=='Home'):
st.title('Welcome to Flight Price Predictor')
st.text('Hi. Want to predict your flight ticket price❓❓')
st.text('Click the Menu bar for further details')
st.image('https://wallpapers.com/images/featured/airport-
w6v47yjhxcohsjgf.jpg')
elif(choice=='Predict'):
st.text('Kindly fill your flight details to view the predicted price')
st.image('https://feeds.abplive.com/onecms/images/uploaded-images/
2021/09/08/634259599cd6f60c24f9e67a5680c064_original.jpg')
ch=st.selectbox('Airline',('Select','Vistara','Air India','Indigo','GO
FIRST','AirAsia','SpiceJet'))
if(ch=='Vistara'):
a=5
elif(ch=='Air India'):
a=1
elif(ch=='Indigo'):
a=3
elif(ch=='GO FIRST'):
a=2
elif(ch=='AirAsia'):
a=0
elif(ch=='SpiceJet'):
a=4
cg=st.selectbox('From',
('Select','Delhi','Mumbai','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cg=='Delhi'):
b=2
cx=st.selectbox('Destination',
('Select','Mumbai','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Mumbai'):
b=5
cx=st.selectbox('Destination',
('Select','Delhi','Bangalore','Kolkata','Hyderabad','Chennai'))
if(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Bangalore'):
b=0
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Kolkata','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Kolkata'):
b=4
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Hyderabad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Hyderabad'):
b=3
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Kolkata','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Chennai'):
f=1
else:
b=1
cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Kolkata','Hyderabad'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
cf=st.selectbox('Departure time',('Select','Morning','Early
Morning','Evening','Night','Afternoon','Late Night'))
if(cf=='Morning'):
c=4
elif(cf=='Early Morning'):
c=1
elif(cf=='Evening'):
c=2
elif(cf=='Night'):
c=5
elif(cf=='Afternoon'):
c=0
elif(cf=='Late Night'):
c=3
ci=st.selectbox('Stops',('Select','one','zero','two or more'))
if(ci=='one'):
d=0
elif(ci=='zero'):
d=2
elif(ci=='two or more'):
d=1
cs=st.selectbox('Arrival time',
('Select','Night','Evening','Morning','Afternoon','Early Morning','Late
Night'))
if(cs=='Night'):
e=5
elif(cs=='Evening'):
e=2
elif(cs=='Morning'):
e=4
elif(cs=='Afternoon'):
e=0
elif(cs=='Early Morning'):
e=1
elif(cs=='Late Night'):
e=3
cb=st.selectbox('Class',('Select','Economy','Business'))
if(cb=='Economy'):
g=1
else:
g=0
h=st.number_input('Duration')
i=st.number_input('Days left')
btn=st.button('Check')
if btn:
def decompress_pickle(file):
data = bz2.BZ2File(file, 'rb')
data = pickle.load(data)
return data
model = decompress_pickle('Flight.pbz2')
pred=model.predict([[a,b,c,d,e,f,g,h,i]])
st.write("The predicted price is:-",pred[0],'Rs')
st.header('Time to fly ✈🧳')
st.image('https://image.cnbcfm.com/api/v1/image/106537227-
1589463911434gettyimages-890234318.jpeg?v=1589463982&w=1600&h=900')
4.4.3 Prediction Price
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
flight_data=pd.read_csv('/content/drive/MyDrive/Clean_Dataset.csv')
flight_data=flight_data.drop(columns=['Unnamed: 0'])
flight_data.shape
"""Checking the data types for each column"""
flight_data.dtypes
print('Null values:',flight_data.isnull().any().sum())
print('NaN values:', flight_data.isna().any().sum())
print('duplicates:',flight_data.duplicated().any().sum())
"""The dataset does not have any null, missing, duplicate values
flight_data.nunique()
sns.set(font_scale=0.7)
cl={'Economy':'green','Business':'blue'}
c=sns.countplot(data=flight_data,x='class',palette=cl)
for label in c.containers:
c.bar_label(label)
sns.set(font_scale=0.6)
plt.figure(figsize=(6,4))
col={'Economy':'red','Business':'green'}
a=sns.countplot(data=flight_data,x='airline',hue='class',palette=col)
for l in a.containers:
a.bar_label(l)
plt.title('Flight counts per airline')
plt.xlabel('Airline')
plt.ylabel('Total number of flights')
"""1. Among the six airlines, only Vistara and Air India have both classes
Economy and Business
2. And the airline Vistara has the highest no.of flights from both classes
3. Spicejet is the airline which has lowest no.of flights
flight_data.describe()
flight_data[['airline','price','class']].sort_values(by='price',ascending=Fals
e)
"""Among the various airlines, Vistara charges highest price under the
business class.
sns.set(font_scale=0.7)
plt.figure(figsize=(9,9))
x=sns.barplot(data=flight_data,x='class',y='price',hue='airline',errorbar=None
)
for i in x.containers:
x.bar_label(i)
plt.xlabel('Class')
plt.ylabel('Ticket Price')
plt.title('Flight ticket price vs class based on each airline')
"""The ticket price charged by Vistara is the highest under both classes, and
AirAsia offers the lowest under Economy class.
h. Plotting No.of flights per class under different departure and arrival
time.
"""
sns.set(font_scale=0.7)
plt.figure(figsize=(8,6))
plt.subplot(2,1,1)
cl=sns.countplot(data=flight_data,x='departure_time',hue='class')
for l in cl.containers:
cl.bar_label(l)
plt.subplot(2,1,2)
cl=sns.countplot(data=flight_data,x='arrival_time',hue='class')
for l in cl.containers:
cl.bar_label(l)
"""This graph shows that, more morning flights are departed as well as more
night flights arrive at the airport.
i. Analysing ticket price vs destination and source cities base on each class
"""
sns.set(font_scale=0.7)
plt.figure(figsize=(8,6))
plt.subplot(2,1,1)
cl=sns.barplot(data=flight_data,x='destination_city',y='price',hue='class')
for l in cl.containers:
cl.bar_label(l)
plt.subplot(2,1,2)
cl=sns.barplot(data=flight_data,x='source_city',y='price',hue='class')
for l in cl.containers:
cl.bar_label(l)
sns.boxplot(data=flight_data,x='price')
"""From the boxplot, we can infer that, the flight ticket price falls in the
range of 0 to 100000 only, whereas there are few outliers that is beyond the
value of 120000. Since, the dataset is large enough, the outliers are removed
from the data in order to develop a proper model for the prediction.
"""
f_out=flight_data[flight_data['price']>=100000].index
flight_data=flight_data.drop(index=f_out)
sns.boxplot(x=flight_data['price'])
flight_data.shape
flight_data[['destination_city','price']].groupby('destination_city').max()
flight_data[flight_data['price']==99680]
flight_data.head(2)
"""Vistara offers Business Class at the highest ticket price to the city
Mumbai flies from Bangalore with duration of 14.42 at Rs 99680.
flight_data=flight_data.drop(columns='flight')
# Encoding:
enc_all_cols=df.apply(LabelEncoder().fit_transform)
# reading the first 2 rows of the dataframe which now has encoded data and
ready for train test split
df_enc.head(2)
X = df_enc.drop(columns='price') # feature
y=df_enc['price'] # target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
random_state=0)
print('X_train size: {}, X_test size: {}'.format(X_train.shape, X_test.shape))
print('y_train size: {}, y_test size: {}'.format(y_train.shape, y_test.shape))
model_params={
'LR':{
'model':LinearRegression(),
'params':{
}
},
'KNR':{
'model':KNeighborsRegressor(),
'params':{
'n_neighbors':[2,5,10]
}
},
'RFR':{
'model':RandomForestRegressor(),
'params':{
'n_estimators':[5,10,20]
}
}
}
"""Among the 3 models used, Random Forest Regressor gives the highest score.
Hence, a model with the Random Forest Regression is built and evaluated.
"""
rf=RandomForestRegressor(n_estimators=20)
rf.fit(X_train,y_train)
r_pred=rf.predict(X_test)
print('AIRLINE')
print(flight_data['airline'].value_counts())
print(X['airline'].value_counts())
print('\n')
print('SOURCE CITY')
print(flight_data['source_city'].value_counts())
print(X['source_city'].value_counts())
print('\n')
print('DEPARTURE TIME')
print(flight_data['departure_time'].value_counts())
print(X['departure_time'].value_counts())
print('STOPS')
print(flight_data['stops'].value_counts())
print(X['stops'].value_counts())
print('\n')
print('ARRIVAL TIME')
print(flight_data['arrival_time'].value_counts())
print(X['arrival_time'].value_counts())
print('\n')
print('DESTINATION CITY')
print(flight_data['destination_city'].value_counts())
print(X['destination_city'].value_counts())
print('\n')
print('CLASS')
print(flight_data['class'].value_counts())
print(X['class'].value_counts())
flight_data.sample(1)
"""As per the model evaluation, the prediction is around 99% accurate.
Therefore, for flight prediction, 'rf' the model is chosen.
flight_data.sample(1)
"""**"""
"""**"""
filename='trained_model.sav'
pickle.dump(rf,open(filename,'wb'))
load=pickle.load(open('trained_model.sav','rb'))
load.predict([[5,5,4,0,4,3,1,24.0,48]])
4.5 Testing
Testing in a machine learning project is a crucial step to ensure that the model performs as
expected and generalizes well to new, unseen data. It involves several practices:
1 – Unit Testing
Unit testing involves checking the correctness of individual components within the ML
pipeline. This could include testing data preprocessing functions, individual algorithms, or
other discrete parts of the ML system
2 - Integration Testing
Integration testing checks the combined functionality of these individual components. It
ensures that when these components work together, they produce the expected results
3 - System Testing
System testing evaluates the complete and integrated ML system to verify that it meets the
specified requirements. This includes testing the model’s performance on unseen data and
ensuring that it integrates well with other systems.
Fig 4.7.1
5. Results and Discussion
This Section represents the proposed system results which can predict the price of Flight
Accurately and with high reliability then the existing system ,
The result are obtained by various machine learning algorithm , In this project we use
xgBOOSt Machine learning algorithm ,
These system also have an elegent interface which takes all the neccesary inputs for the
evaluation and to facilitate with is very easy to use . The final result of our proposed sustem
can be viewed by GUI .
5.1. Comparison
Here, the performance of the developed flight price prediction model(s) is compared with
existing methods or benchmarks. This could involve comparing the predictive accuracy,
computational efficiency, or other relevant metrics against baseline models or state-of-the-art
approaches reported in the literature. The subsection may also discuss how the proposed
model(s) fare against commercial flight booking websites or other publicly availablprediction
services.
6. CONCLUSION
6.1. SUMMARY
This subsection provides a concise summary of the key findings and contributions of the
flight price prediction project. It recaps the objectives outlined in the introduction and
summarizes how they were addressed throughout the project. The summary may include a
brief overview of the methodology employed, the predictive models developed, and the main
results obtained. Additionally, it highlights any novel insights or advancements made in the
field of flight price prediction as a result of the project.
6.2. ACHIEVEMENTS
Here, the subsection discusses the achievements and contributions of the project. It outlines
the specific outcomes or milestones reached during the course of the project, such as the
development of accurate predictive models, the implementation of a user-friendly interface
using Streamlit, or the generation of actionable insights for stakeholders in the aviation and
travel industries. Achievements may be evaluated in terms of technical innovation, practical
utility, or societal impact, depending on the project's goals and objectives.
6.3. FUTURE WORK
This subsection explores potential avenues for future research and development based on
the findings and limitations of the flight price prediction project. It identifies areas where
further improvements or enhancements could be made to advance the state-of-the-art in flight
prediction. Future work may include refining predictive models by incorporating additional
data sources or features, exploring advanced machine learning techniques such as deep
learning or ensemble methods, or conducting longitudinal studies to evaluate model
performance over time. Additionally, opportunities for collaboration with industry partners or
academic researchers may be discussed to validate and extend the project's findings in real-
world settings.
Overall, the Conclusion section serves as a culmination of the flight price prediction project,
summarizing its main outcomes, highlighting achievements, and outlining directions for
future research and development. It provides closure to the project while laying the
groundwork for continued exploration and innovation in the field of airfare prediction.
REFERENCES
2. Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The
elements of statistical learning:
data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
5. Burger, B., & Fuchs, M. (2005). Dynamic pricing—A future airline business
model. Journal of Revenue
and Pricing Management, 4(1), 39-53.
6. Malighetti, P., Paleari, S., & Redondi, R. (2010). Has Ryanair's pricing
strategy changed over time? An empirical analysis of its 2006–2007 flights.
Tourism management, 31(1), 36-44.
7. Liu, T., Cao, J., Tan, Y., & Xiao, Q. (2017, December). ACER: An adaptive
context-aware ensemble regression model for airfare price prediction. In 2017
International Conference on Progress in Informatics and Computing (PIC) (pp.
312-317). IEEE.
8. Tziridis, K., Kalampokas, T., Papakostas, G. A., & Diamantaras, K. I. (2017,
August). Airfare prices prediction using machine learning techniques. In 2017
25th European Signal Processing Conference (EUSIPCO) (pp. 1036-1039).
IEEE.
9. Can, Y. S., & Alagöz, F. (2023, October). Predicting Local Airfare Prices with
Deep Transfer Learning Technique. In 2023 Innovations in Intelligent Systems
and Applications Conference (ASYU) (pp. 1-4
BIOGRAPHY