0% found this document useful (0 votes)
373 views

Flight DElay Report

Uploaded by

MadhanDhonian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
373 views

Flight DElay Report

Uploaded by

MadhanDhonian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

1

Flight Delay Prediction Based on Aviation


Big Data and Machine Learning
ABSTRACT:
Accurate flight delay prediction is fundamental to establish the more
efficient airline business. Recent studies have been focused on applying
machine learning methods to predict the flight delay. Most of the
previous prediction methods are conducted in a single route or airport.
This paper explores a broader scope of factors which may potentially
influence the flight delay, and compares several machine learning-
based models in designed generalized flight delay prediction tasks. To
build a dataset for the proposed scheme, automatic dependent
surveillance broadcast (ADS-B) messages are received, pre-processed,
and integrated with other information such as weather condition, flight
schedule, and airport information. The designed prediction tasks
contain different classification tasks and a regression task.
Experimental results show that long short-term memory (LSTM) is
capable of handling the obtained aviation sequence data, but overfitting
problem occurs in our limited dataset. Compared with the previous
schemes, the proposed random forest-based model can obtain higher
prediction accuracy (90.2% for the binary classification) and can
overcome the overfitting problem.
2

CHAPTER 1. INTRODUCTION

In the present world, the major components of any transportation system include
passenger airline, cargo airline, and air traffic control system. With the passage of
time, nations around the world have tried to evolve numerous techniques of
improving the airline transportation system. This has brought drastic change in
the airline operations. Flight delays occasionally cause inconvenience to the modern
passengers [1]. Every year approximately 20% of airline flights are canceled or
delayed, costing passengers more than 20 billion dollars in money and their time.

1.1 Research Motivation

Average aircraft delay is regularly referred to as an indication of airport capacity.


Flight delay is a prevailing problem in this world. It's very tough to explain the
reason for a delay. A few factors responsible for the flight delays like runway
construction to excessive traffic are rare, but bad weather seems to be a common
cause. Some flights are delayed because of the reactionary delays, due to the late
arrival of the previous flight. It hurts airports, airlines, and affects a company's
marketing strategies as companies rely on customer loyalty to support their frequent
flying programs.
1.2 Problem Statement

My case study was about LaGuardia Airport in New York, Logan International
Airport in Boston, San Francisco International Airport in San Francisco, and
O’Hare International Airport in Chicago, which are four major airports in the United
States of America. But we focused the idea and research on LaGuardia
International Airport. Compared with the data produced by all airports in USA, the
data which we gathered was very limited, but it gave us a great direction on how
weather plays a part in flight delays. In this project, the goal is to use exploratory
analysis and to build machine learning models to predict airline departure and
arrival delays.

1.3 Report Structure

This master project report is organized into nine chapters. The preface of the project,
research motivation, and problem statement form chapter 1. Chapter 2 describes the
basic concepts of flight and weather data. Chapter 3 focuses on structures of the
project. Chapter 4 and 5 explain the data collection and data exploration part of the
flight data, while the chapter 6 focuses on predictive modelling implemented on the
flight data. Chapter 7 focuses on predictive modelling implemented on the weather
data. Chapter 8 starts with the introduction of the Twitter data and some tweets
exploration that helped me in the course of building the project. It focuses on
predictive modeling of Twitter data using Random Forest and Support Vector
Machine. Chapter 9 concludes the paper and finally chapter 10 talks about the future
scope of the project.

1.4 Related Work

The main concern of the researchers and analysts is to predict the reasons for flight
delays and for that they have put in their efforts on collecting data about flight and
the weather. Mohamed et al. [2] have studied the pattern of arrival delay for non-
stop domestic flights at the Orlando International Airport. They focused primarily
on the cyclic variations that happen in the air travel demand and the weather at that
particular airport.

In Shervin et al.’s work [3], their motive of research is to propose an approach that
improves the operational performance without hampering or effecting the planned
cost.

Adrian et al. [4] have created a data mining model which enables the flight delays by
observing the weather conditions. They have used WEKA and R to build their
models by selecting different classifiers and choosing the one with the best results.
They have used different machine learning techniques like Naïve Bayes and Linear
Discriminant Analysis classifier.
Choi et al. [5] have focused on overcoming the effects of the data imbalancing caused
during data training. They have used techniques like Decision Trees, AdaBoost, and
K-Nearest Neighbors for predicting individual flight delays. A binary classification
was performed by the model to predict the scheduled flight delay.

Schaefer et al. [6] have made Detailed Policy Assessment Tool (DPAT) that is used
to stimulate the minor changes in the flight delay caused by the weather changes.

Bing Liu [7] has done a sentiment analysis and opinion mining that analyzes people’s
opinions, sentiments, and studies their behavior. The output of the research is a
feature-based opinion summary which is also known as sentiment classification.

Using techniques such as Natural Language Processing, Naïve Bayes, and Support Vector
Machine, researchers built algorithms for analysis that helped them in extracting features
in the model. Most of them focused on predicting overall flight delays. Our research
concentrated mainly on predicting flight delays for a particular airport over a specific
period of time. First, we used a regression model to examine the significance of each
feature and then, a feature selection approach to examine the impact of feature
combination. These two techniques determined the features to retain in the model. Instead
of using the whole set, we sampled 5,000 records at a time to run through different
machine learning models. The machine learning models implemented here were
Random Forest classifier and Support Vector Machine (SVM) classifier. Further, we
applied an approach
called One-Hot-Encoder to create a variant of the model for evaluating potential
prediction performance.
CHAPTER 2. LITERATURE SURVEY
CHAPTER 3. SYSTEM ANALYSIS

EXISTING SYSTEM:
A supervised machine learning classifies data inputs accordingly labeled output and
unsupervised learning classifies the inputs without having any labeled data. Several
researchers had used machine learning algorithms to solve the classification problems
in the educational domain.
Keeping in view, the identification of Flight demographic, Climate, social, personal,
and others Features some latest literature proved that machine learning played a much
significant role in predictive modeling.

DISADVANTAGES OF EXISTING SYSTEM:


An innovative regression algorithm used for grade prediction of a Flight with an
accuracy of 85%. The SVM algorithm achieved a better prediction accuracy of 96% as
compared to the K- nearest neighbor to predict the attitude of Hungarian and Indian
Flights towards technology.
PROPOSED SYSTEM:
In this paper, we have used the latest optimization techniques on the MLP and
compared respective optimizer with dynamic testing, activation functions,
regularization parameters, etc. Afterward, we compared this optimistic MLP model
with a robust SVM algorithm. All experiments performed in the popular machine
learning software Orange 3.24.1, which is opensource, hands-on, machine learning
software enriched with a massive library of algorithms. Using hybrid languages. it was
developed at the Flight Database. For better visualization of results, we used The
statistical analyses is performed with support of popular STAC web platform
ADVANTAGES OF PROPOSED SYSTEM:
This research is accomplished to support the Airport’s realtime Flight demographic
system.
Before deploying online in a real-time environment, we need to propose an optimistic
native place predictive model in the international Flight environment.
The present promising native place identification models gained the highest prediction
accuracy on the prime data depicted in preliminary work with state-of-theart research
inclusive mathematical equations and feature engineering.
CHAPTER 4.
IMPLEMENTATION

IMPLEMENTATION
MODULES:
 Data Collection
 Dataset
 Data Preparation
 Model Selection
 Analyze and Prediction
 Accuracy on test set
 Saving the Trained Model

MODULES DESCSRIPTION:
Data Collection:
This is the first real step towards the real development of a machine learning
model, collecting data. This is a critical step that will cascade in how good the
model will be, the more and better data that we get, the better our model will
perform.
There are several techniques to collect the data, like web scraping, manual
interventions and etc.

Data Preparation:

we will transform the data. By getting rid of missing data and removing some
columns. First we will create a list of column names that we want to keep or
retain.

Next we drop or remove all columns except for the columns that we want to
retain.

Finally we drop or remove the rows that have missing values from the data set.

Model Selection:

While creating a machine learning model, we need two dataset, one for
training and other for testing. But now we have only one. So lets split
this in two with a ratio of 80:20. We will also divide the dataframe into
feature column and label column.

Here we imported train_test_split function of sklearn. Then use it to split the


dataset. Also, test_size = 0.2, it makes the split with 80% as train dataset and 20%
as test dataset.

The random_state parameter seeds random number generator that helps to split


the dataset.

The function returns four datasets. Labelled them as train_x, train_y, test_x,
test_y. If we see shape of this datasets we can see the split of dataset.

We will use Random Forest Classifier, which fits multiple decision tree to the
data. Finally I train the model by passing train_x, train_y to the fit method.

Once the model is trained, we need to Test the model. For that we will
pass test_x to the predict method.
Random Forest is one of the most powerful methods that is used in machine
learning for regression problems. The random forest comes in the category of the
supervised regressor algorithm. This algorithm is carried out in two different
stages the first one deals with the creation of the forest of the given dataset, and
the other one deals with the prediction from the regressor.

Accuracy on test set:

We got a accuracy of 95.1%,97.1%, 98.1%, 96.5%, on test set.

Saving the Trained Model:

Once you’re confident enough to take your trained and tested model into the
production-ready environment, the first step is to save it into a .h5 or . pkl file
using a library like pickle .
Make sure you have pickle installed in your environment.
Next, let’s import the module and dump the model into . pkl file
CHAPTER 5.

SYSTEM DESIGN

SYSTEM ARCHITECTURE:
DATA FLOW DIAGRAM:

1. The DFD is also called as bubble chart. It is a simple graphical formalism


that can be used to represent a system in terms of input data to the system,
various processing carried out on this data, and the output data is generated
by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools.
It is used to model the system components. These components are the
system process, the data used by the process, an external entity that
interacts with the system and the information flows in the system.
3. DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that
depicts information flow and the transformations that are applied as data
moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a
system at any level of abstraction. DFD may be partitioned into levels that
represent increasing information flow and functional detail.
Input data

Preprocessing

Training dataset

Feature Extraction

Prediction/regression Testing Data

Yes
Yes

Delay Predict

UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized


general-purpose modeling language in the field of object-oriented software
engineering. The standard is managed, and was created by, the Object
Management Group.
The goal is for UML to become a common language for creating models of
object oriented computer software. In its current form UML is comprised of two
major components: a Meta-model and a notation. In the future, some form of
method or process may also be added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying,
Visualization, Constructing and documenting the artifacts of software system, as
well as for business modeling and other non-software systems.
The UML represents a collection of best engineering practices that have
proven successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software
and the software development process. The UML uses mostly graphical notations
to express the design of software projects.

GOALS:

The Primary goals in the design of the UML are as follows:


1. Provide users a ready-to-use, expressive visual modeling Language so that
they can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core
concepts.
3. Be independent of particular programming languages and development
process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations,
frameworks, patterns and components.
7. Integrate best practices.
USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose
is to present a graphical overview of the functionality provided by a system in
terms of actors, their goals (represented as use cases), and any dependencies
between those use cases. The main purpose of a use case diagram is to show what
system functions are performed for which actor. Roles of the actors in the system
can be depicted.
CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language
(UML) is a type of static structure diagram that describes the structure of a
system by showing the system's classes, their attributes, operations (or methods),
and the relationships among the classes. It explains which class contains
information.

Input Output

Input data Features extraction


Regression
Finally get
Regression&Display Result:

Delay Predict

SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in
what order. It is a construct of a Message Sequence Chart. Sequence diagrams are
sometimes called event diagrams, event scenarios, and timing diagrams.
ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. In the
Unified Modeling Language, activity diagrams can be used to describe the
business and operational step-by-step workflows of components in a system. An
activity diagram shows the overall flow of control.
Input data

Preprocessing

Training

Prediction using proposed


algorithm

Predicted t Delay

CHAPTER 6
INPUT DESIGN AND OUTPUT DESIGN

INPUT DESIGN
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and
those steps are necessary to put transaction data in to a usable form for processing
can be achieved by inspecting the computer to read data from a written or printed
document or it can occur by having people keying the data directly into the
system. The design of input focuses on controlling the amount of input required,
controlling the errors, avoiding delay, avoiding extra steps and keeping the
process simple. The input is designed in such a way so that it provides security
and ease of use with retaining the privacy. Input Design considered the following
things:
 What data should be given as input?
 How the data should be arranged or coded?
 The dialog to guide the operating personnel in providing input.
 Methods for preparing input validations and steps to follow when error
occur.

OBJECTIVES

1. Input Design is the process of converting a user-oriented description of the


input into a computer-based system. This design is important to avoid errors in
the data input process and show the correct direction to the management for
getting correct information from the computerized system.

2. It is achieved by creating user-friendly screens for the data entry to handle


large volume of data. The goal of designing input is to make data entry easier and
to be free from errors. The data entry screen is designed in such a way that all the
data manipulates can be performed. It also provides record viewing facilities.
3. When the data is entered it will check for its validity. Data can be entered with
the help of screens. Appropriate messages are provided as when needed so that
the user will not be in maize of instant. Thus the objective of input design is to
create an input layout that is easy to follow

OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and
presents the information clearly. In any system results of processing are
communicated to the users and to other system through outputs. In output design
it is determined how the information is to be displaced for immediate need and
also the hard copy output. It is the most important and direct source information
to the user. Efficient and intelligent output design improves the system’s
relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought out
manner; the right output must be developed while ensuring that each output
element is designed so that people will find the system can use easily and
effectively. When analysis design computer output, they should Identify the
specific output that is needed to meet the requirements.
2. Select methods for presenting information.
3. Create document, report, or other formats that contain information produced by
the system.
The output form of an information system should accomplish one or more of the
following objectives.
 Convey information about past activities, current status or projections of
the
 Future.
 Signal important events, opportunities, problems, or warnings.
 Trigger an action.
 Confirm an action.

CHAPTER 7

SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:

• System : Pentium IV 2.4 GHz.


• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

 Operating system : Windows 7.


 Coding Language : Python
 Database : MYSQL
CHAPTER 8

SOFTWARE ENVIRONMENT
Python:
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer syntactical
constructions than other languages.
 Python is Interpreted − Python is processed at runtime by the interpreter.
You do not need to compile your program before executing it. This is
similar to PERL and PHP.

 Python is Interactive − You can actually sit at a Python prompt and


interact with the interpreter directly to write your programs.

 Python is Object-Oriented − Python supports Object-Oriented style or


technique of programming that encapsulates code within objects.

 Python is a Beginner's Language − Python is a great language for the


beginner-level programmers and supports the development of a wide range
of applications from simple text processing to WWW browsers to games.

History of Python
Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer Science
in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C+
+, Algol-68, SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the
GNU General Public License (GPL).
Python is now maintained by a core development team at the institute, although
Guido van Rossum still holds a vital role in directing its progress.
Python Features
Python's features include −
 Easy-to-learn − Python has few keywords, simple structure, and a clearly
defined syntax. This allows the Flight to pick up the language quickly.

 Easy-to-read − Python code is more clearly defined and visible to the


eyes.

 Easy-to-maintain − Python's source code is fairly easy-to-maintain.

 A broad standard library − Python's bulk of the library is very portable


and cross-platform compatible on UNIX, Windows, and Macintosh.

 Interactive Mode − Python has support for an interactive mode which


allows interactive testing and debugging of snippets of code.

 Portable − Python can run on a wide variety of hardware platforms and


has the same interface on all platforms.

 Extendable − You can add low-level modules to the Python interpreter.


These modules enable programmers to add to or customize their tools to be
more efficient.

 Databases − Python provides interfaces to all major commercial databases.


 GUI Programming − Python supports GUI applications that can be
created and ported to many system calls, libraries and windows systems,
such as Windows MFC, Macintosh, and the X Window system of Unix.

 Scalable − Python provides a better structure and support for large


programs than shell scripting.

Apart from the above-mentioned features, Python has a big list of good features,
few are listed below −
 It supports functional and structured programming methods as well as
OOP.

 It can be used as a scripting language or can be compiled to byte-code for


building large applications.

 It provides very high-level dynamic data types and supports dynamic type
checking.

 It supports automatic garbage collection.

 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and


Java.

Python is available on a wide variety of platforms including Linux and Mac


OS X. Let's understand how to set up our Python environment.
Getting Python
The most up-to-date and current source code, binaries, documentation, news, etc.,
is available on the official website of Python https://www.python.org.
Windows Installation
Here are the steps to install Python on Windows machine.
 Open a Web browser and go to https://www.python.org/downloads/.
 Follow the link for the Windows installer python-XYZ.msifile where XYZ
is the version you need to install.

 To use this installer python-XYZ.msi, the Windows system must support


Microsoft Installer 2.0. Save the installer file to your local machine and
then run it to find out if your machine supports MSI.

 Run the downloaded file. This brings up the Python install wizard, which is
really easy to use. Just accept the default settings, wait until the install is
finished, and you are done.

The Python language has many similarities to Perl, C, and Java. However, there
are some definite differences between the languages.

First Python Program


Let us execute programs in different modes of programming.

Interactive Mode Programming


Invoking the interpreter without passing a script file as a parameter brings up the
following prompt −

$ python

Python2.4.3(#1,Nov112010,13:34:43)

[GCC 4.1.220080704(RedHat4.1.2-48)] on linux2

Type"help","copyright","credits"or"license"for more information.

>>>

Type the following text at the Python prompt and press the Enter −

>>>print"Hello, Python!"
If you are running new version of Python, then you would need to use print
statement with parenthesis as in print ("Hello, Python!");. However in Python
version 2.4.3, this produces the following result −

Hello, Python!

Script Mode Programming


Invoking the interpreter with a script parameter begins execution of the script and
continues until the script is finished. When the script is finished, the interpreter is
no longer active.
Let us write a simple Python program in a script. Python files have extension .py.
Type the following source code in a test.py file −

print"Hello, Python!"

We assume that you have Python interpreter set in PATH variable. Now, try to
run this program as follows −

$ python test.py

This produces the following result −

Hello, Python!

Flask Framework:
Flask is a web application framework written in Python. Armin
Ronacher, who leads an international group of Python enthusiasts named
Pocco, develops it. Flask is based on Werkzeug WSGI toolkit and Jinja2
template engine. Both are Pocco projects.
Http protocol is the foundation of data communication in world wide web.
Different methods of data retrieval from specified URL are defined in this
protocol.

The following table summarizes different http methods −

Sr.N Methods & Description


o

1 GET

Sends data in unencrypted form to the server. Most common method.

2 HEAD

Same as GET, but without response body

3 POST

Used to send HTML form data to server. Data received by POST


method is not cached by server.

4 PUT

Replaces all current representations of the target resource with the


uploaded content.

5 DELETE

Removes all current representations of the target resource given by a


URL

By default, the Flask route responds to the GET requests. However, this


preference can be altered by providing methods argument to route() decorator.

In order to demonstrate the use of POST method in URL routing, first let us


create an HTML form and use the POST method to send form data to a URL.

Save the following script as login.html


<html>

<body>

<formaction="http://localhost:5000/login"method="post">

<p>Enter Name:</p>

<p><inputtype="text"name="nm"/></p>

<p><inputtype="submit"value="submit"/></p>

</form>

</body>

</html>

Now enter the following script in Python shell.

from flask importFlask, redirect,url_for, request

app=Flask(__name__)

@app.route('/success/<name>')

def success(name):

return'welcome %s'% name

@app.route('/login',methods=['POST','GET'])

def login():

ifrequest.method=='POST':

user=request.form['nm']
return redirect(url_for('success',name= user))

else:

user=request.args.get('nm')

return redirect(url_for('success',name= user))

if __name__ =='__main__':

app.run(debug =True)

After the development server starts running, open login.html in the browser,


enter name in the text field and click Submit.

Form data is POSTed to the URL in action clause of form tag.

http://localhost/login is mapped to the login() function. Since the server has


received data by POST method, value of ‘nm’ parameter obtained from the form
data is obtained by −

user = request.form['nm']
It is passed to ‘/success’ URL as variable part. The browser displays
a welcome message in the window.

Change the method parameter to ‘GET’ in login.html and open it again in the


browser. The data received on server is by the GET method. The value of ‘nm’
parameter is now obtained by −

User = request.args.get(‘nm’)

Here, args is dictionary object containing a list of pairs of form parameter and


its corresponding value. The value corresponding to ‘nm’ parameter is passed on
to ‘/success’ URL as before.
What is Python?
Python is a popular programming language. It was created in 1991 by Guido van
Rossum.
It is used for:
 web development (server-side),
 software development,
 mathematics,
 system scripting.
What can Python do?
 Python can be used on a server to create web applications.
 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software
development.
Why Python?
 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi,
etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer
lines than some other programming languages.
 Python runs on an interpreter system, meaning that code can be executed as
soon as it is written. This means that prototyping can be very quick.
 Python can be treated in a procedural way, an object-orientated way or a
functional way.
Good to know
 The most recent major version of Python is Python 3, which we shall be
using in this tutorial. However, Python 2, although not being updated with
anything other than security updates, is still quite popular.
 In this tutorial Python will be written in a text editor. It is possible to write
Python in an Integrated Development Environment, such as Thonny,
Pycharm, Netbeans or Eclipse which are particularly useful when
managing larger collections of Python files.
Python Syntax compared to other programming languages
 Python was designed to for readability, and has some similarities to the
English language with influence from mathematics.
 Python uses new lines to complete a command, as opposed to other
programming languages which often use semicolons or parentheses.
 Python relies on indentation, using whitespace, to define scope; such as the
scope of loops, functions and classes. Other programming languages often
use curly-brackets for this purpose.

Python Install
Many PCs and Macs will have python already installed.

To check if you have python installed on a Windows PC, search in the start bar
for Python or run the following on the Command Line (cmd.exe):

C:\Users\Your Name>python --version

To check if you have python installed on a Linux or Mac, then on linux open
the command line or on Mac open the Terminal and type:

python --version

If you find that you do not have python installed on your computer, then you
can download it for free from the following website: https://www.python.org/

Python Quickstart
Python is an interpreted programming language, this means that as a
developer you write Python (.py) files in a text editor and then put those files
into the python interpreter to be executed.

The way to run a python file is like this on the command line:

C:\Users\Your Name>python helloworld.py

Where "helloworld.py" is the name of your python file.

Let's write our first Python file, called helloworld.py, which can be done in any
text editor.

helloworld.py

print("Hello, World!")

Simple as that. Save your file. Open your command line, navigate to the
directory where you saved your file, and run:

C:\Users\Your Name>python helloworld.py

The output should read:

Hello, World!

Congratulations, you have written and executed your first Python program.

The Python Command Line


To test a short amount of code in python sometimes it is quickest and easiest
not to write the code in a file. This is made possible because Python can be run
as a command line itself.

Type the following on the Windows, Mac or Linux command line:

C:\Users\Your Name>python

From there you can write any python, including our hello world example from
earlier in the tutorial:

C:\Users\Your Name>python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Hello, World!")

Which will write "Hello, World!" in the command line:

C:\Users\Your Name>python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Hello, World!")
Hello, World!

Whenever you are done in the python command line, you can simply type the
following to quit the python command line interface:

exit()

Execute Python Syntax

As we learned in the previous page, Python syntax can be executed by writing


directly in the Command Line:

>>> print("Hello, World!")


Hello, World!

Or by creating a python file on the server, using the .py file extension, and
running it in the Command Line:

C:\Users\Your Name>python myfile.py

Python Indentations

Where in other programming languages the indentation in code is for readability


only, in Python the indentation is very important.

Python uses indentation to indicate a block of code.

Example
if 5 > 2:
  print("Five is greater than two!")

Python will give you an error if you skip the indentation:

Example

if 5 > 2:
print("Five is greater than two!")

Comments

Python has commenting capability for the purpose of in-code documentation.

Comments start with a #, and Python will render the rest of the line as a
comment:

Example

Comments in Python:

#This is a comment.
print("Hello, World!")

Docstrings

Python also has extended documentation capability, called docstrings.

Docstrings can be one line, or multiline.

Python uses triple quotes at the beginning and end of the docstring:
Example

Docstrings are also comments:

"""This is a 
multiline docstring."""
print("Hello, World!")

CHAPTER 9

SYSTEM STUDY

FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business
proposal is put forth with a very general plan for the project and some cost
estimates. During system analysis the feasibility study of the proposed system
is to be carried out. This is to ensure that the proposed system is not a burden
to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system
will have on the organization. The amount of fund that the company can pour
into the research and development of the system is limited. The expenditures
must be justified. Thus the developed system as well within the budget and
this was achieved because most of the technologies used are freely available.
Only the customized products had to be purchased.

TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is,
the technical requirements of the system. Any system developed must not
have a high demand on the available technical resources. This will lead to
high demands on the available technical resources. This will lead to high
demands being placed on the client. The developed system must have a
modest requirement, as only minimal or null changes are required for
implementing this system.

SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently.
The user must not feel threatened by the system, instead must accept it as a
necessity. The level of acceptance by the users solely depends on the methods
that are employed to educate the user about the system and to make him familiar
with it. His level of confidence must be raised so that he is also able to make
some constructive criticism, which is welcomed, as he is the final user of the
system.

CHAPTER 10

SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of
trying to discover every conceivable fault or weakness in a work product. It
provides a way to check the functionality of components, sub assemblies,
assemblies and/or a finished product It is the process of exercising software with
the intent of ensuring that the
Software system meets its requirements and user expectations and does not fail in
an unacceptable manner. There are various types of test. Each test type addresses
a specific testing requirement.

TYPES OF TESTS

Unit testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is
the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing,
that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly
defined inputs and expected results.

Integration testing

Integration tests are designed to test integrated software components to


determine if they actually run as one program. Testing is event driven and is
more concerned with the basic outcome of screens or fields. Integration tests
demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct and
consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

Functional test

Functional tests provide systematic demonstrations that functions tested are


available as specified by the business and technical requirements, system
documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements,


key functions, or special test cases. In addition, systematic coverage pertaining to
identify Business process flows; data fields, predefined processes, and successive
processes must be considered for testing. Before functional testing is complete,
additional tests are identified and the effective value of current tests is
determined.
System Test
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An
example of system testing is the configuration oriented system integration test.
System testing is based on process descriptions and flows, emphasizing pre-
driven process links and integration points.

White Box Testing


White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at
least its purpose. It is purpose. It is used to test areas that cannot be reached from
a black box level.

Black Box Testing


Black Box Testing is testing the software without any knowledge of the
inner workings, structure or language of the module being tested. Black box tests,
as most other kinds of tests, must be written from a definitive source document,
such as specification or requirements document, such as specification or
requirements document. It is a testing in which the software under test is treated,
as a black box .you cannot “see” into it. The test provides inputs and responds to
outputs without considering how the software works.

6.1 Unit Testing:

Unit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding and unit
testing to be conducted as two distinct phases.
Test strategy and approach
Field testing will be performed manually and functional tests will be
written in detail.

Test objectives
 All field entries must work properly.
 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.

Features to be tested
 Verify that the entries are of the correct format
 No duplicate entries should be allowed
 All links should take the user to the correct page.

6.2 Integration Testing

Software integration testing is the incremental integration testing of two or


more integrated software components on a single platform to produce failures
caused by interface defects.
The task of the integration test is to check that components or software
applications, e.g. components in a software system or – one step up – software
applications at the company level – interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
6.3 Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires


significant participation by the end user. It also ensures that the system meets the
functional requirements.

Test Results: All the test cases mentioned above passed successfully. No defects
encountered.

CHAPTER 11

CONCLUSIONS

In this project, we use flight data, weather, and demand data to predict flight
departure delay. Our result shows that the Random Forest method yields the best

performance compared to the SVM model. Somehow the SVM model is very time

consuming and does not necessarily produce better results. In the end, our model

correctly predicts 91% of the non-delayed flights. However, the delayed flights

are only correctly predicted 41% of time. As a result, there can be additional

features related to the causes of flight delay that are not yet discovered using our

existing data sources.

In the second part of the project, we can see that it is possible to predict flight delay

patterns from just the volume of concurrently published tweets, and their sentiment

and objectivity. This is not unreasonable; people tend to post about airport delays on

Twitter; it stands to reason that these posts would become more frequent, and more

profoundly emotional, as the delays get worse. Without more data, we cannot make a

robust model and find out the role of related factors and chance on these results.

However, as a proof of concept, there is potential for these results. It may be possible

to routinely use tweets to ascertain an understanding of concurrent airline delays and

traffic patterns, which could be useful in a variety of circumstances.

REFERENCE

[1] A. B. Guy, "Flight delays cost $32.9 billion, passengers foot half the bill". [Online]
Available : https://news.berkeley.edu/2010/10/18/flight_delays/3/. [Accessed on

June 2017].

[2] M. Abdel-Aty, C. Lee, Y. Bai, X. Li and M. Michalak, "Detecting periodic patterns

of arrival delay", Journal of Air Transport Management,, Volume 13(6), pp. 355–

361, November, 2007.

[3] S. AhmadBeygi, A. Cohn and M. Lapp, "Decreasing Airline Delay Propagation By

Re-Allocating Scheduled Slack", Annual Conference, Boston, 2008.

[4] A. A. Simmons, "Flight Delay Forecast due to Weather Using Data Mining", M.S.

Disseration, University of the Basque Country, Department of Computer Science,

2015.

[5] S. Choi, Y. J. Kim, S. Briceno and D. Mavris, "Prediction of weather-induced

airline delays based on machine learning algorithms", Digital Avionics Systems

Conference (DASC), 2016 IEEE/AIAA 35th, Sacramento, CA, USA, 2016.

[6] L. Schaefer and D. Millner, "Flight Delay Propagation Analysis With The Detailed

Policy Assessment Tool", Man and Cybernetics Conference, Tucson, AZ, 2001.

[7] B. Liu "Sentiment Analysis and Opinion Mining Synthesis", Morgan & Claypool

Publishers, p. 167, 2012.


[8] Statistical Computing Statistical Graphics. [Online]. Available: http://stat-

computing.org/dataexpo/2009/the-data.html. [Accessed on April 2017].

[9] FAA Operations & Performance Data. [Online]. Available: https://aspm.faa.gov/.

[Accessed on April 2017].

[10] B. Bailey, "Data Cleaning 101". [Online]. Available:

https://towardsdatascience.com/data-cleaning-101-948d22a92e4. [Accessed on

March 2018].

[11] P. Panov, L. Soldatova and S. Džeroski, " OntoDM-KDD: Ontology for

Representing the Knowledge Discovery Process", Discovery Science 2013,

Volume 8140, pp. 126-140, 2013.

[12] Bureau of Transportation Statistics. [Online]. Available:

https://www.transtats.bts.gov/carriers.asp. [Accessed on 2 April 2017].

[13] How to Predict Yes/No Outcomes Using Logistic Regression. [Online]. Available:

https://blog.cleaarbrain.com/posts/how-to-predict-yesno-outcomes-using-logistic-

regression [Accessed on 3 Feubrary 2018].

[14] S. Polamuri, "How The Random Forest Algorithm Works In Machine Learning".

[Online]. Available: https://medium.com/@Synced/how-random-forest-algorithm-

works-in-machine-learning-3c0fe15b6674. [Accessed January 2018].

[15] S. Ray, "Understanding Support Vector Machine algorithm". [Online]. Available:

https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-

machine-example-code/.[Accessed November 2017].


[16] OneHotEncoder. [Online]. Available: http://scikit-

learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html.

[Accessed on March 2018].

[17] R. Vasudev, "Why and When do you have to use OneHotEncoder?".[Online].

Available: https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-

you-have-to-use-it-e3c6186d008f. [Accessed on March 2018].

[18] Twitter API Twitter. [Online]. Available: https://developer.twitter.com/en/docs.

[19] S. Loria , "TextBlob: Simplified Text Processing", 2016. [Online]. Available:

http://textblob.readthedocs.io/en/dev/ [Accessed on December 12, 2017].

[20] A. Agarwal, B. Xie, I. Vovsha, O. Rambow and R. Passonneau, "Sentiment

Analysis of Twitter Data," Columbia University, New York, December, 2011.

[21] V. A. Kharde and S. Sonawane, "Sentiment Analysis of Twitter Data: A Survey of

Techniques", International Journal of Computer Applications (0975 – 8887),

Volume 139, no.11, p.11, April 2016.

You might also like