0% found this document useful (0 votes)
369 views

Bandwidth Enhancement Using Microstrip Patch Antenna - Project Phase-1

Uploaded by

Rbalaji Beece
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
369 views

Bandwidth Enhancement Using Microstrip Patch Antenna - Project Phase-1

Uploaded by

Rbalaji Beece
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 104

BANDWIDTH ENHANCEMENT USING

MICROSTRIP PATCH ANTENNA FOR WIRELESS


COMMUNICATION APPLICATIONS

A THESIS

Submitted by

R.BALAJI
(212218483002)

in partial fulfillment for the award of the degree of

MASTER OF ENGINEERING IN
COMMUNICATION AND
NETWORKING

DEPARTMENT OF ELECTRONICS & COMMUNICATION


ENGINEERING, SAVEETHA ENGINEERING COLLEGE
ANNA UNIVERSITY, CHENNAI- 600 025

NOVEMBER 2019
ANNA UNIVERSITY, CHENNAI

BONAFIDE CERTIFICATE

Certified that this Report titled “BANDWIDTH ENHANCEMENT USING


MICROSTRIP PATCH ANTENNA FOR WIRELESS COMMUNICATION
APPLICATIONS” is the bonafide work of Mr. R.BALAJI (212218483002) who carried
out the work under my supervision. Certified further that to the best of my knowledge the work
reported herein does not form part of any other thesis or dissertation on the basis of which a
degree or award was conferred on an earlier occasion on this or any other candidate.

SIGNATURE SIGNATURE

Dr. SRIGITHA.S.NATH,M.E.,Ph.D., Dr. A.K.SHRIVASTAV,M.E.,Ph.D.,


PROFESSOR, PROFESSOR,
HEAD OF THE DEPARTMENT, SUPERVISOR,
DEPARTMENT OF ELECTRONICS AND DEPARTMENT OF ELECTRONICS AND
COMMUNICATION ENGINEERING, COMMUNICATION ENGINEERING,
SAVEETHA ENGINEERING COLLEGE, SAVEETHA ENGINEERING COLLEGE,
CHENNAI. CHENNAI.

Submitted for Viva-voce Examination held on

INTERNAL EXAMINER EXTERNAL EXAMINER


ii
i

ABSTRACT

The study of Dual microstrip patch antennas has made great progress in recent years.

Compared with conventional antennas, Dual microstrip patch antennas have more

advantages and better prospects. They are lighter in weight, low volume, low cost,

low profile, smaller in dimension and ease of fabrication and conformity. Moreover,

the Dual microstrip patch antennas can provide dual and circular polarizations, dual-

frequency operation, frequency agility, broad band-width, feedline flexibility, beam

scanning omnidirectional patterning. In this paper we discuss the microstrip antenna,

types of Dual microstrip antenna, feeding techniques and application of Dual

microstrip patch antenna with their advantage and disadvantages over conventional

microwave antennas.

Keywords: Microstrip Antenna (MSA), Microstrip patch antenna (MPA), Feeding


techniques.
iii

ACKNOWLEDGEMENT

The process I have taken to create and complete this project has provided me
with a platform to better understand my capabilities and skills both personally and
academically. This journey of self-actualization would not have been fulfilled
without the guidance and support of several individuals.
Firstly, our creator, who has provided us with the talents we can use to him, be
all the praise. I express my deep sense of gratitude to our honorable and beloved
Founder President Dr. N. M. VEERAIYAN, our President Dr. V.SAVEETHA
RAJESH, our Director Dr. S. RAJESH and other management members for
providing the infrastructure needed.
My sincere thanks to our college Principal, Dr. R. RAMESH M.E., Ph.D.,
Professor, Department of Electronics and Communication Engineering for his
constant encouragement and Dean, Prof. A. GANDHI, for extending their support
and encouragement to do the project work.
I would like to express my sincere thanks to Dr. SRIGITHA. S. NATH,
M.E., Ph.D., Head of the Department, Department of Electronics and
Communication Engineering for her review of my thesis and her useful opinions.
My deep appreciation is extended to my Project Guide Dr.
A.K.SHRIVASTAV, M.E., Ph.D., Professor, Department of Electronics and
Communication Engineering for wisdom, guidance and encouraging appreciation.
Also we thank to Mr.G.DINESH RAM, M.E., project Associate for his
continuous support. Also gratitude towards, Saveetha MEMS Design Centre for
providing excellent facilities and guidance. We manifest our deep sense of gratitude
for his encouragement and faith in us through this endeavor.

Finally to my family, for their constant encouragement, support and


motivation throughout my post graduate career and for always being there for me.
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.


ABSTRACT i
LIST OF FIGURES x
LIST OF TABLES xi
1 INTRODUCTION 1
1.1 Machine Learning 1
1.1.1 Technical Overview 1
1.2 Signs And Symptoms 3
1.3 Type Of Breast Cancer 4
1.4 Background 5
1.4.1 Sequential Ensemble Learning 5
(Boosting)
1.4.2 Parallel Ensemble Learning 6
(Bagging)
1.4.3 Stacking And Blending (Voting) 6
1.5 Preparing The Dataset 7

2 LITERATURE SURVEY 9
2.1 General 9
2.1.1 Review Of Literature Survey 9

3 OUTLINE OF THE PROJECT 20


3.1 Overview Of The System 20
3.1.1 Causes 21
3.1.2 Inherited Breast Cancer 21
3.1.3 Risk Factors 22
3.1.4 Some Risk Factors For Breast Cancer 23
3.2 Menopause 26
3.3 Tumor(T) 27
3.4 Node(N) 28
3.5 Metastasis(M) 29
3.6 Cancer Stage Grouping 30
3.6.1 Recurrent 32
3.7 Prevention 32
3.7.1 Breast Self Exam 32
3.7.2 Breast Cancer Risk Reduction For 34
Women With A High Risk
3.8 TNM Staging System 34
3.8.1 Significance Of The Stage Of The 35
Cancer
3.9 Grade 36
3.9.1 Staging And Grading 36
3.9.2 Staging 37
3.10 Project Goals 40
3.10.1 Exploration Data Analysis Of 40
Variable Identification
3.10.2 Univariate Data Analysis 40
3.10.3 Method Of Outlier Detection With 40
Feature Engineering
3.11 Objectives 40
3.12 Aim 41
3.13 Scope 41

4 EXISTING AND PROPOSED SYSTEM 42


4.1 Existing System 42
4.1.1 Drawbacks 42
4.2 Proposed Systems 43
4.2.1 Exploratory Data Analysis 43
4.2.1.1 Splitting The Dataset 43
4.2.1.2 Data Wrangling 43
4.2.1.3 Data Collection 43
4.2.1.4 Preprocessing 44
4.3 Building The Classification Model 44
4.4 Construction Of A Predictive Model 44
4.5 Training The Data Set 45
4.6 Testing The Data Set 46
4.7 General Properties 46
4.8 Ensemble Learning 47
4.9 Applications Of Ensemble Methods 48
4.10 Voting Based Ensemble Learning 49
4.10.1 Advantages 49

5 HARDWARE AND SOFTWARE 50


REQUIREMENTS
5.1 General 50
5.1.1 Functional Requirements 50
5.1.2 Non Functional Requirements 51
5.1.3 Environmental Requirements 51
5.2 Software Description 51
5.2.1 Anaconda Navigator 52
5.2.2 Jupyter Notebook 53
5.3 Working Process 54

6 DESIGN AND BLOCK DIAGRAMS 56


6.1 System Architecture – Phase-1 56
6.2 Design Architecture /System 56
Architecture/Business Diagram – Phase-2
6.3 Work Flow Diagram 57
6.4 Use Case Diagram 58
6.5 Class Diagram 59
6.6 Activity Diagram 60
6.7 Sequence Diagram 61
6.8 Overall Protection Separation 61
6.8.1 Phase 1 Working Process 62
6.8.2 Phase 2 (Major Project) 62
6.9 Modules Of (Phase 1 Working Process) 62
6.10 Modules (For Major Project Only) 62
6.11 Phase 1 Working Description 63
6.11.1 Module -01 63
6.11.1.1 Variable Identification 63
Process/Data Validation Process
6.11.1.2 Data Validation 64
/Cleaning Process
6.11.1.3 Data Pre-Processing 65
6.11.2 Module-02 65
6.11.2.1 Exploration Data 65
Analysis Of Visualization
6.11.2.2 Advantages Of Train 67
Test Split
6.11.2.3 Advantages Of Cross 67
Validation
6.11.2.4 Training The Data Set 67
6.11.2.5 Testing The Data Set 68
6.11.3 Module -03 69
6.11.3.1 Logistic Regression 69
6.11.3.2 Decision Tree 70
6.11.4 Module -04 71
6.11.4.1 Support Vector 71
Machine (Svm)
6.11.4.2 Random Forest 71
6.11.5 Module -05 72
6.11.5.1 Sensitivity 75
6.11.5.2 Specificity 76
6.11.5.3 Prediction Result By 77
Accuracy
6.11.6 Used Python Packages 79
6.11.6.1 Sklearn 79
6.11.6.2 Numpy 79
6.11.6.3 Pandas 79
6.11.6.4 Matplotlib 79
6.11.6.5 Tkinter 79
6.12 Sample Code 80

7 RESULT AND DISCUSSION 89


7.1 Software Involvement Steps 89
ix

7.2 Output Screenshots 90


7.2.1 Input 90
7.2.2 Output 91

8 CONCLUSION AND FUTURE WORK 93


8.1 Conclusion 93
8.2 Future Work 93
9 REFERENCES 94
LIST OF FIGURES

S.NO FIGURE PAGE NO


1 Process of machine learning 2
2 Data flow diagram of machine learning model 25
3 Type of grade cells 37
4 Stage 1 38
5 Stage 2 38
6 Stage 3 39
7 Stage 4 39
8 Process of data flow diagram 45
9 Architecture of proposed model 46
10 Ensemble structure 48
11 Phase 1 flow diagram 56
12 Phase 2 flow diagram 57
13 Work flow diagram 58
14 Class diagram 59
15 Activity diagram 60
16 Sequence diagram 61
17 Given data frame 64
18 To validate patient ages 65
19 Categories of patient ages 66
20 Position of breast cancer 68
21 Type of patient Vs tumor size of each patient 74
22 Open the anaconda navigator 89
23 Launch the jupyter notebook plat form 90
24 Open the correspondent result folder 90
LIST OF TABLES

S.NO TABLE NO PAGE NO


1 Preparing data set 7
12

CHAPTER 1

INTRODUCTION

1.1 Antenna

Antenna is a transducer designed to transmit or receive electromagnetic waves.


Microstrip antennas have several advantages over conventional microwave antenna and
therefore are widely used in many practical applications. Microstrip antennas in its simplest
configuration are shown in Fig1. It consists of a radiating patch on one side of dielectric
substrate (Єr≤10), which has a ground plane on other side.
Microstrip antennas are characterized by a larger number of physical parameters
than are conventional microwave antennas. They can be designed to have many geometrical
shapes and dimensions [2]. All microstrip antennas can be divided into four basic catagories:

1.2 Microstrip Patch Antenna

A microstrip patch antenna (MPA) consists of a conducting patch of any planar or


nonplanar geometry on one side of a dielectric substrate with a ground plane on other
side. It is a popular printed resonant antenna for narrow-band microwave wireless links
that require semihemispherical coverage. Due to its planar configuration and ease of
integration with microstrip technology, the microstrip patch antenna has been heavily
studied and is often used as elements for an array. A large number of microstrip patch
antennas have been studied to date. An exhaustive list of the geometries along with their
13

salient features is available [1]. The rectangular and circular patches are the basic and
most commonly used microstrip antennas. These patches are used for the simplest and
the most demanding applications. Rectangular geometries are separable in nature and
their analysis is also simple. The circular patch antenna has the advantage of their
radiation pattern being symmetric. A rectangular microstrip patch antenna in its simplest
form is shown in Figure 2.

1.3 Feeding Techniques

A feedline is used to excite to radiate by direct or indirect contact. There are many
different techniques of feeding and four most popular techniques are coaxial probe feed,
microstrip line, aperture coupling and proximity coupling [2]. Coaxial probe feeding is
feeding method in which that the inner conductor of the coaxial is attached to the radiation
patch of the antenna while the outer conductor is connected to the ground plane. Advantages
of coaxial feeding is easy of fabrication, easy to match, low spurious radiation and its
disadvantages is narrow bandwidth, Difficult to model specially for thick substrate.
14

Microstrip line feed is one of the easier methods to fabricate as it is a just conducting strip
connecting to the patch and therefore can be consider as extension of patch. It is simple to
model and easy to match by controlling the inset position. However the disadvantage of this
method is that as substrate thickness increases, surface wave and spurious feed radiation
increases which limit the bandwidth.

Aperture coupled feed consist of two different substrate separated by a ground plane. On the
bottom side of lower substrate there is a microstrip feed line whose energy is coupled to the
patch through a slot on the ground plane separating two substrates. This arrangement allows
independent optimization of the feed mechanism and the radiating element. Normally top
substrate uses a thick low dielectric constant substrate while for the bottom substrate; it is the
high dielectric substrate. The ground plane, which is in the middle, isolates the feed from
radiation element and minimizes interference of spurious radiation for pattern formation and
polarization purity. Advantages is allows independent optimization of feed mechanism
element.
15

Proximity coupling has the largest bandwidth, has low spurious radiation. However
fabrication is difficult. Length of feeding stub and width-to-length ratio of patch is used to
control the match. Its coupling mechanism is capacitive in nature. The major disadvantage of
this feeding technique is that it is difficult to fabricate because of the two dielectric layers
that need proper alignment. Also there is increase in overall thickness of the antenna.

1.4 CST Microwave Studio (CST MWS)

CST Microwave Studio (CST MWS) is based on the finite integration technique (FIT).
It allows to choose the time domain as well as the frequency domain approach. Despite the
presence of transient, eigenmode, and frequency domain solvers within CST MWS, the
transient solver was examined for benchmarking further in this chapter as the flag ship
module of CST MWS. The Time Domain Solver calculates the broadband behavior of
electromagnetic devices in one simulation run with an arbitrarily fine frequency resolution.
The modeling of curved structures using the Perfect Boundary Approximation technique and
the modeling of thin perfectly electric conducting sheets with the Thin Sheet Technique tries
to cope with the typical difficulties inherent to classical FDTD methods. The transient
analysis of the proposed antennas is done utilizing the hexahedral mesh type. The automatic
mesh generator detects the important points inside the structure (fixpoints) and locates mesh
nodes there. The user can manually add fixpoints on a structure, as well as fully control the
number of mesh lines in each coordinate with regards to the specified wavelength. Energy
based adaptation of the mesh allows to refine it in a predefined number of passes, providing a
16

mesh refinement of sophisticated design features for the price of longer overall simulation
time. The analyses in this chapter use automatic direct meshing without any local settings.
Although the FIT in principle can handle material parameters changing over the dielectric
volumes defined, this is not implemented yet. CST, as a general purpose software package
being a real competitor for HFSS, has gained popularity in the last few years. Also for the
analysis and design of planar and small antennas, more and more results obtained with CST
can be found in literature. A problem sometimes observed with CST is a ripple in the
frequency response in case the tool settings are not appropriate. This is due to the fact that
the flagship of CST is inherently a time domain solver.

1.1.1 Finite Integration in Techniques ( FIT ):

CST is based in the Finite Integration in Techniques it uses Maxwell's equations


are a set of coupled partial differential equations that, together with the Lorentz
force law, form the foundation of classical electromagnetism, classical optics, and
electric circuits. The equations provide a mathematical model for electric, optical,
and radio technologies, such as power generation, electric motors, wireless
communication, lenses, radar etc. Maxwell's equations describe how electric and
magnetic fields are generated by charges, currents, and changes of the fields. An
important consequence of the equations is that they demonstrate how fluctuating
electric and magnetic fields propagate at a constant speed in a vacuum. Known as
electromagnetic radiation, these waves may occur at various wavelengths to
produce a spectrum of light from radio waves to γ-rays. The equations are named
after the physicist and mathematician James Clerk Maxwell, who published an
early form of the equations that included the Lorentz force law between 1861 and
1862. Maxwell first used the equations to propose that light is an electromagnetic
phenomenon.

The equations have two major variants. The microscopic Maxwell equations have
universal applicability but are unwieldy for common calculations. They relate the
electric and magnetic fields to total charge and total current, including the
complicated charges and currents in materials at the atomic scale. The
"macroscopic" Maxwell equations define two new auxiliary fields that describe
the large-scale behaviour of matter without having to consider atomic scale
17

charges and quantum phenomena like spins. However, their use requires
experimentally determined parameters for a phenomenological description of the
electromagnetic response of materials.

The term "Maxwell's equations" is often also used for equivalent alternative
formulations. Versions of Maxwell's equations based on the electric and magnetic
potentials are preferred for explicitly solving the equations as a boundary value
problem, analytical mechanics, or for use in quantum mechanics. The covariant
formulation (on spacetime rather than space and time separately) makes the
compatibility of Maxwell's equations with special relativity manifest. Maxwell's
equations in curved spacetime, commonly used in high energy and gravitational
physics, are compatible with general relativity. In fact, Einstein developed special
and general relativity to accommodate the invariant speed of light, a consequence
of Maxwell's equations, with the principle that only relative movement has
physical consequences.

Other Types of Softwares:

HFSS:

The Finite Element Method (FEM) FEM is a method based on solving partial
differential equations. It is most commonly formulated based on a variational
expression. It subdivides space in elements, for example tetrahedra. Fields inside
these elements are expressed in terms of a number of basic functions, for example
polynomials. These expressions are inserted into the functional of the equations,
and the variation of the functional is made zero. This yields a matrix eigenvalue
equation whose solution yields the fields at the nodes. Its first formulations were
developed as matrix methods for structural mechanics. This lead to the idea to
approximate solids and Courant (1942) introduced an assembly of triangular
elements and the minimum of potential energy to torsion problems . The first
paper on the application of FEM to electrical problems appeared in 1968 . An
extensive review on the history of FEM in electromagnetics was published in an
issue of the Antennas and Propagation Magazine . FEM normally is formulated in
the frequency domain, i.e. for time-harmonic problems. This means that, as for IE-
MoM, the solution has to be calculated for every frequency of interest.

Software Tools:

3. Software tools In the following sections, several solvers are briefly described
18

with more specific information on the implemented solution method. Commercial


solvers and academic solvers are considered. For some of the commercial solvers
an application section is given, where references to literature can be found. In
these papers designs based on these solvers are reported. It has to be emphasized
that by no means this overview is complete. The references given are just mere
illustrations. A wealth of information concerning this issue can be found in the
IEEE Antennas and Wireless propagation Letters, where over the years numerous
designs of planar and integrated antennas can be found.

3.1 Commercial software tools Since the cost of a commercial solver in many
cases is high, the choice of the commercial solvers considered was based on their
availability to the authors, directly or through cooperation with others. The fact
that a specific solver is missing does not represent any statement about its quality.
Nevertheless, the authors believe that the solvers selected do represent the global
landscape in this area. The IE solvers are roughly ordered according to increasing
complexity of the geometries that can be handled and in addition to that, meshing
abilities are clarified for every software package.

3.1.1 Momentum [26]: IE-MoM HP-Momentum is the IE-MoM solver integrated


within the ADS system of Agilent. The integral equations are formulated in mixed
potential form and the matrix elements are evaluated completely in the spatial
domain. Momentum was originally developed to analyze planar circuitry. The
latest version of Momentum can model vertical currents per layer as well as
horizontal side-currents on thick conductors. A combination of rectangles and
triangles represent cells of Momentum’s mesh grid. The mesh reduction option
allows the shape of basis functions to incorporate some of the physics of the
design and the mesh cell can be any polygon compounded by rectangles and
triangles. Nevertheless, the modeling of finite dielectric volumes is not included,
which limits the modeling capabilities for full 3D structures. These limitations can
be resolved by running EMDS (the FE based solver included into the ADS
package) without having to run a stand-alone tool. Mesh frequency and number of
cells per wavelength are used for determining the mesh density of the entire
circuit or any single object. The edge meshing option adds a relatively dense mesh
along the edges of objects. Since the current density is higher along the edges of
objects, the edge mesh can improve the accuracy and speed of a simulation.
19

3.1.2 IE3D [27]: IE-MoM The integral equations are formulated with a full dyadic
Green’s function and the matrix elements are computed completely numerical in
the spatial domain. IE3D can model truly arbitrary 3D metal structures. Since
2006 also finite 3D dielectric volumes can be modeled with a VIE approach
(Volume Integral Equation). IE3D performs automatic generation of a non-
uniform mesh with both rectangular and triangular cells. The user can control the
highest meshing frequency and the number of cells per wavelength. An automatic
edge cell feature is available for accurate simulation of the currents concentrated
near the edges of metallic surfaces. IE3D has been successfully used in the design
of small antennas for mobile phones. Specific topologies can be found in [28],
[29], [30], [31], [32]. In [30] for example, an internal dualband patch antenna is
described. The antenna is simulated and measured and the agreement is excellent
for this topology. An external multi-band antenna can be found in [33]. In general,
impedance, radiation patterns, and radiation efficiency seem to be well predicted.

3.1.3 FEKO [34]: IE-MoM, in combination with other techniques FEKO is based
on the IE-MoM method, which can be combined with other techniques, like the
geometric optics approach (GO), the unified theory of diffraction (UTD), and the
multilevel fast multimode method (MLFMM). GO and UTD are at the moment
the only practical approaches for solving a class of very large problems, the size
of which exceeds the handling capabilities of MoM, FEM or FDTD [35]. The
matrix elements are computed using a mixed-potential formulation and a spatial
domain approach. The solver can model truly arbitrary 3D structures. Dielectric
volumes can be modeled in three different ways: with a SIE approach, with a VIE
approach, or with a hybrid approach with the FE method. The surface of the
structure is discretized using a triangular mesh, while tetrahedra are used for
volumetric discretization. In order to allow flexible control of the mesh, a user can
specify different cell dimensions for any selected region, face or edge.

3.1.4 HFSS [36]: FEM Since it was one of the first tools in the market, and also
due to its generality and flexibility, HFFS is one of the tools heavily used in
industrial design environments. The purpose of HFSS is to extract parasitic
parameters (S, Y, Z), visualize 3D electromagnetic fields (near- and far-field), and
generate SPICE models, all based on a 3D FEM solution of the electromagnetic
topology under consideration. Very useful features of HFSS are its automatic
20

adaptive mesh generation and refinement, which in most cases frees the designer
of worrying about which mesh/grid to choose. This software is extremely popular
and is used for all kinds of purposes. Specific results for small planar antenna
topologies can be found in [37], [38], [39], [40], [41], [42]. Input impedance and
radiation patterns are generally predicted very well. Few results are found about
the efficiency.

3.1.5 Empire [21]: FDTD Empire XCcel is based on the FDTD technique. Due to
adaptive on-the-fly code generation it comes with a highly accelerated kernel,
providing very fast simulations. It features the Perfect Geometry Approximation
(PGA) algorithm to yield more accurate results for curved structures, frequency
dependent loss calculation and special algorithms for modeling thin conducting
sheets. Several structure import and export formats are supported. EMPIRE
XCcel’s applicability ranges from analyzing planar, multi-layered and conformal
circuits, components and antennas to multi-pin packages, waveguides, and
SI/EMC problems including the device’s operational environment. Time signals,
scattering parameters, and field animations are generated for a broad frequency
range within only one simulation run. Monitoring and animation capabilities give
physical insight into electromagnetic wave phenomena while accurate results are
obtained with little effort.

3.1.6 CST: FIT CST Microwave Studio (CST MWS) is based on the finite
integration technique (FIT). It allows to choose the time domain as well as the
frequency domain approach. Despite the presence of transient, eigenmode, and
frequency domain solvers within CST MWS, the transient solver was examined
for benchmarking further in this chapter as the flag ship module of CST MWS.
The Time Domain Solver calculates the broadband behavior of electromagnetic
devices in one simulation run with an arbitrarily fine frequency resolution. The
modeling of curved structures using the Perfect Boundary Approximation®
technique and the modeling of thin perfectly electric conducting sheets with the
Thin Sheet Technique® tries to cope with the typical difficulties inherent to
classical FDTD methods. The transient analysis of the proposed antennas is done
utilizing the hexahedral mesh type. The automatic mesh generator detects the
important points inside the structure (fixpoints) and locates mesh nodes there. The
user can manually add fixpoints on a structure, as well as fully control the number
21

of mesh lines in each coordinate with regards to the specified wavelength. Energy
based adaptation of the mesh allows to refine it in a predefined number of passes,
providing a mesh refinement of sophisticated design features for the price of
longer overall simulation time. The analyses in this chapter use automatic direct
meshing without any local settings. Although the FIT in principle can handle
material parameters changing over the dielectric volumes defined, this is not
implemented yet. CST, as a general purpose software package being a real
competitor for HFSS, has gained popularity in the last few years. Also for the
analysis and design of planar and small antennas, more and more results obtained
with CST can be found in literature. A problem sometimes observed with CST is a
ripple in the frequency response in case the tool settings are not appropriate. This
is due to the fact that the flagship of CST is inherently a time domain solver.

3.2 Non-commercial software tools 3.2.1 MAGMAS 3D :

IE-MoM MAGMAS 3D is the IE-MoM code developed at the Katholieke


Universiteit Leuven, Belgium. It was developed in cooperation with the European
Space Agency for 2D, 2.5D and quasi-3D structures. Specific in comparison with
other MoM codes is that the matrix elements are computed using a hybrid dyadic-
mixed potential formulation and a combined spectral-space domain approach.
This allows to perform a large part of the computation procedure for these matrix
elements analytically in the spectral domain. This makes the code computationally
more efficient. Surface and volume currents are decomposed in horizontal and
vertical currents (= quasi 3D approximation), which are both expanded using
generalized rooftop functions. A full mesh control of combined rectangular and
triangular mesh cells is available in manual meshing mode. Exact coordinates and
dimensions can be set for every single mesh cell. A Graphical User Interface is
available.

1.4 Performance Comparision:

The Below table show the different types of Microstrip


patch antenna and its ability to operation in different conditions.

Table shows Characteristics of the different Microstrip Antenna:

S.n Microstrip Microstrip Slot Printed


Characteristics
o Patch Antenna Antenna Diopole
22

antenna
1 Profile Thin Thin Thin
2 Fabrication Very easy Easy Easy
Both linear Both linear
3 Polarization Linear
and circula and circula
DualFrequency
4 Possible Possible Possible
operation
Mostly
rectangular Rectangulat ar
5 Shape flexibility Any shape
and circular and triangula
shape
6 Bandwidth 2-50% 5-30% 30%

Table 1: Preparing Data Set


CHAPTER 2

LITERATURE

SURVEY

2.1 GENERAL

Literature surveys provide brief overviews or a summary of the current research on


topics. The structure written requires to be in a way that it seemed logical. It needs to
chronologically represent a development of the ideas in the field that is being researched.
The length of a literature survey depends much on whether the purpose of the project report
is to complete a college assignment or submitting for journal publication. It can review a few
research papers on a topic or be a full-length discussion on the significant work in the field
until that date.

Some of the objectives required in writing a literature survey include for the
understanding on some of the fundamentals of learning the definitions and concepts that will
help in discovering topics that are based on previous research.

2.1.1 REVIEW OF LITERATURE SURVEY

Title: CPW Fed Wide to Dual Band Frequency Reconfigurable Antenna for 5G
Applications

Author: Abir Zaidi, Halima, Abdelhakim Ballouk

Year: 2019

This paper contains a Co-Planar waveguide (CPW) fed wideband to dual band frequency
reconfigurable patch antenna for 5G applications. The frequency reconfigurability is
achieved using two variable resistors, which vary the voltage level and hence disturb the
current charge distribution. Antenna is designed on ROGERS RO3003 substrate using High
Frequency Structure Simulator (HFSS) and compared with CST results. The proposed
antenna shows promising results which are shown by comparing with similar works for
proposed 5G frequencies.
In past several decades the technologies advance rapidly due to which the number of user’s
increase haphazardly, this increase need a high-speed data rate with low latency rate to
communicate with each other effectively [1]. 5G promises to provide a solution for this
problem, the proposed band for 5G ranges from 5GHz to 300GHz [2], from which some
frequencies due to their low loss in propagation attract the researcher’s interest, hence results
in many proposed antennas for that specific frequencies.
Doing a literature survey we found few strong candidate frequencies for 5G including band
spectrum of 28GHz, 38GHz, V-band and E-band [3]. In past few years, the reconfigurable
antennas attract the attention of researcher due to on-demand switching of frequency from
one band to another, which results in less interface with other frequency band spectrum as
compared to the multiband and wideband antennas.
Several techniques to achieve reconfigurability are proposed in the literature, the most
common of them are pin diode [4,5], MEMS [6] and optical switches [7]. But recent work
shows that variable resistor can be used to achieve reconfigurability [8,9]. In [8] a T shaped
patch antenna is presented for 23-29GHz applications, while in [9] a slotted Y shaped
antenna is presented for 23-29GHz, both antennas are single band frequency reconfigurable
with less gain and bandwidth. The proposed Psi shaped antenna has a wider bandwidth in the
dual band of operation with a higher value of gain.
Title: Very small form factor with Ultra Wide Band Rectangular Patch
Antenna for 5G Applications

Author: Wahaj Abbas Awan

Year: 2018

A rectangular patch antenna resonating at the frequency of 73GHz is presented


in this paper. The antenna is designed on RO4730G3 substrate which is newly
introduced by Rogers Company for higher frequencies antenna. The experiment
shows that antenna cover ultra-wide impedance bandwidth of 35.41GHz ranges
from 50.86GHz to 86.27GHz for S11<-10dB, which cover majority portion of V-
Band, E-Band and W-Band. The results also state that the antenna has a good
peak gain of the 6.73dB and very small form factor of 0.8mm x 0.9mm.

Due to advancement in latest technologies i.e. Internet of Things (IoT), Internet


of Vehicles (IoV), mobile communication and others, there is a huge increase in
demand of high-speed data transfer with low latency rate. The result of this
demand is countered by 5G networks, which provides data transfer rate of Giga
Bytes per second using millimetre waves. Millimeter wave ranges from 30GHz to
300GHz [1] has attracted increasing attention from all fields including wireless
communication networks [2], passive millimetre wave imaging, automotive radar
systems [3] and Biomedical Applications. Many researchers are finding a solution
for the propagation of these millimetre waves, on the other hand, many others are
looking for high gain, larger bandwidth and small-sized antenna for the 5G
applications. Experimental results show that attenuation of atmospheric
absorption at 28 GHz and 38 GHz is comparatively small @200 meters, while
frequencies of 70 to 100 GHz and 125GHz to 140GHz also show small
attenuation as shown in Fig. 1. Although 28GHz and 38GHz are the strong
candidate frequencies for 5G as the technologies need larger bandwidth, the
researchers are moving toward new large bandwidth frequency spectrums to fulfil
demand and reduce the transmission losses. V band spectrum (40–75 GHz) and
W-Band spectrum (75–110 GHz) shows better results and economical advantage
over other mm-wave bands. Large typical link distance and regulatory protection
of E-Band (71-76 GHz, 81-86GHz and 92-95 GHz) make these frequencies
potential candidate for 5G communication system [6]. In V band, E band and W
band spectrum, wireless systems are able to use larger allocated spectrum as
compared to 6-40 GHz allocated spectrum. Thus V-Band, E-Band and W-Band
can provide significant cost advantages over 6–40 GHz millimetre-wave wireless
systems, which provide multi Gigabits scaling capacity, without any other cost or
additional equipment.

e to advancement in latest technologies


i.e.
Internet of Things (IoT), Internet of
Vehicles (IoV),
mobile communication and others,
there is a huge
increase in demand of high-speed data
transfer with
low latency rate. The result of this
demand is
countered by 5G networks, which
provides data
transfer rate of Giga Bytes per second
using millimetre
waves. Millimeter wave ranges from
30GHz to
300GHz [1] has attracted increasing
attention from all
fields including wireless communication
networks [2],
passive millimetre wave imaging,
automotive radar
systems [3] and Biomedical
Applications. Many
researchers are finding a solution for
the propagation
of these millimetre waves, on the other
hand, many
others are looking for high gain, larger
bandwidth and
small-sized antenna for the 5G
applications.
Experimental results show that
attenuation of
atmospheric absorption at 28 GHz and
38 GHz is
comparatively small @200 meters,
while frequencies
of 70 to 100 GHz and 125GHz to
140GHz also show
small attenuation as shown in Fig. 1.
Although 28GHz
and 38GHz are the strong candidate
frequencies for
5G as the technologies need larger
bandwidth, the
researchers are moving toward new
large bandwidth
frequency spectrums to fulfil demand
and reduce the
Title: Dynamic Spectrum Sharing in 5G Wireless Networks With Full-Duplex
Technology
Author: Shree Krishna Sharma, Tadilo Endeshaw Bogale, Long Bao Le

Year: 2018

Full-duplex (FD) wireless technology enables a radio to transmit and receive on


the same frequency band at the same time, and it is considered to be one of the
candidate technologies for the fifth generation (5G) and beyond wireless
communication systems due to its advantages, including potential doubling of the
capacity and increased spectrum utilization efficiency. However, one of the main
challenges of FD technology is the mitigation of strong self-interference (SI).
Recent advances in different SI cancellation techniques, such as antenna
cancellation, analog cancellation, and digital cancellation methods, have led to the
feasibility of using FD technology in different wireless applications. Among
potential applications, one important application area is dynamic spectrum sharing
(DSS) in wireless systems particularly 5G networks, where FD can provide
several benefits and possibilities such as concurrent sensing and transmission
(CST), concurrent transmission and reception, improved sensing efficiency and
secondary throughput, and the mitigation of the hidden terminal problem. In this
direction, first, starting with a detailed overview of FD-enabled DSS, we provide a
comprehensive survey of recent advances in this domain. We then highlight
several potential techniques for enabling FD operation in DSS wireless systems.
Subsequently, we propose a novel communication framework to enable CST in
DSS systems by employing a power control-based SI mitigation scheme and carry
out the throughput performance analysis of this proposed framework. Finally, we
discuss some open research issues and future directions with the objective of
stimulating future research efforts in the emerging FD-enabled DSS wireless
systems.clustering is developed. In the stage of mass diagnosis, an ELM classifier
is utilized to classify the benign and malignant breast masses using a fused feature
set, fusing deep features, morphological features, texture features, and density
features. In the process of breast CAD, the choice of features is the key in
determining the accuracy of diagnosis.

Title: Millimeter-wave frequency reconfigurable T-shaped antenna for 5G


networks

Author: Syeda Fizzah Jilani, Syed Muzahir Abbas, Karu P. Esselle

Year: 2015

Millimeter-wave reconfigurable antennas are predicted as a future of next generation wireless


networks with the availability of wide bandwidth. A coplanar waveguide (CPW) fed T-shaped
frequency reconfigurable millimeter-wave antenna for 5G networks is presented. The resonant
frequency is varied to obtain the 10dB return loss bandwidth in the frequency range of 23-
29GHz by incorporating two variable resistors. The radiation pattern contributes two
symmetrical radiation beams at approximately ±30o along the end fire direction. The 3dB
beamwidth remains conserved over the entire range of operating bandwidth. The proposed
antenna targets the applications of wireless systems operating in narrow passages, corridors,
mine tunnels, and person-to-person body centric applications
Title: A Millimeter-Wave Self-Mixing Array With Large Gain and Wide Angular
Receiving Range

Author: Jonas Kornprobst, Thomas J. Mittermaier, homas F. Eibert


Year: 2017

The concept of self-mixing antenna arrays is presented and analyzed with respect to its
beneficial behavior of large gain over a wide angular range. The large gain is attained by an
antenna array with large element spacing, where all array element signals are combined
approximately coherently over the entire angular receiving range. This functionality is
achieved by the self-mixing principle, where an exact description via an intermediate
frequency (IF) array factor is derived. For verification purposes, a 4×2 self-mixing array is
fabricated and measured in the frequency range from 34GHz to 39 GHz. A multiple-
resonances millimeter-wave microstrip patch antenna has been especially developed to
achieve large bandwidth and a wide angular receiving range. The broad beamwidth is
achieved by two parasitic patches and suitable radiation characteristics of the resonant
modes. The self-mixing of the receive signal is realized at each antenna element by a
Schottky diode with an optimized operating point. The down-converted array element signals
are then combined and measured at the IF. The receive power is increased significantly over
a large angular range as compared to conventional array feeding techniques. The simulation
results are verified by measurements, which show very good agreement.
Title: A Wide-Angle Scanning Planar Phased Array with Pattern Reconfigurable
Magnetic Current Element

Author: Xiao Ding, You-Feng Cheng, Wei Shao, Hua Li

Year: 2017

wide-angle scanning planar phased array with magnetic current elements is proposed. A
pattern reconfigurable technique is used to design the element that enhances scanning gain
and decreases the sidelobe level throughout the entire scanning range. The array is comprised
of eight elements in a 2×4 arrangement with uniform spacing. The proposed phased array
operates at 5.8 GHz and can scan with 3 dB beamwidth the entire upper ground elevation
plane from −90° to +90° enabled by a two-step pattern reconfigurability mechanism
consisting of: 1) coarse-angle scanning and 2) fine-angle scanning. Significant outcomes also
include the reduced sidelobe level (less than 7.8 dB) and the particularly small fluctuation
(±0.75 dB) of the gain during scanning over a scanning range of 150° (from −75° to +75° in
the elevation plane). With the absence of any structure above the ground level, the high
efficiency, and the coverage of the entire upper half-space, this proposed antenna array is
very attractive for a variety of phased array.

Title: Broadening the Beam-Width of Microstrip Antenna by the Induced Vertical


Currents

Author: Guangwei Yang, Jianying Li, Du-juan Wei

Year: 2017

method to broaden the beamwidth of an aperture couple feed microstrip antenna is


investigated. Two metal walls are put at the bilateral sides of an original
microstrip antenna in the E-plane (xoz-plane). The vertical current on the metal
walls is induced by E-field from the horizontal current on the radiating patch of
the microstrip antenna. The beamwidth of the antenna is broadened by both the
horizontal and vertical currents. A novel wide beamwidth aperture couple feed
microstrip antenna (WBMA) is designed according to the proposed method. The
simulated and measured reflection coefficients and radiation patterns of the
WBMA are presented and compared with the original microstrip antenna.
Compared with the original microstrip antenna, the half-power beamwidth
(HPBW) is obviously improved, which the HPBW is 236 in the E-plane and 124
in the H-plane at 9.0 GHz. Good radiation characteristics of the new structure
antenna have been obtained in the whole operating frequency band
CHAPTER 3

OUTLINE OF THE PROJECT

3.1 OVERVIEW OF THE SYSTEM


Exploratory information examination (EDA) is essential
improvement which occurs after section arranging and getting information and it
ought to be done before any appearing. This is in light of the way that it is critical
for an information expert to apparently get a handle on the likelihood of the given
dataset without making suppositions. The postponed results of information
examination can be staggeringly valuable in understanding the structure of the
information, the stream of the attributes and the vicinity of extraordinary qualities
and interrelationships with in the given dataset. It utilization of design estimations
and acknowledgments to even more instantly comprehend the information, find
suggests about the leanings of the information, its quality and to figure questions
of examination. Around there of the report will stack in the information, check for
tidiness and sometime later trim from given dataset of restorative office reports of
patients. Assurance that the record of reports carefully to clean choices. The
dataset aggregated may contains missing attributes, copy respects, number of
excellent qualities that may incite arrangement and extension to better possible
results of information to be preprocessed to improve the productivity of the social
event calculation. The variations from the norm must be expelled and in addition
factor change should be finished. Man-made reasoning needs information
gathering have heap of past information information's. Information gathering have
adequate chronicled information and grungy/source information. Before
information pre-dealing with grungy information can't be utilized unequivocally
and it’s used to preprocess by at that point, what sort of depiction framework for
method.
3.1.1 CAUSES

Doctors know that breast cancer occurs when some breast cells begin to
grow abnormally. These cells divide more rapidly than healthy cells do and
continue to accumulate, forming a lump or mass. Cells may spread (metastasize)
through your breast to your lymph nodes or to other parts of your body. Breast
cancer most often begins with cells in the milk-producing ducts (invasive ductal
carcinoma). Breast cancer may also begin in the glandular tissue called lobules
(invasive lobular carcinoma) or in other cells or tissue within the breast.
Researchers have identified hormonal, lifestyle and environmental factors that
may increase your risk of breast cancer. But it's not clear why some people who
have no risk factors develop cancer, yet other people with risk factors never do.
It's likely that breast cancer is caused by a complex interaction of your genetic
makeup and your environment.

3.1.2 INHERITED BREAST CANCER

Doctors estimate that about 5 to 10 percent of breast cancers are linked to


gene mutations passed through generations of a family. A number of inherited
mutated genes that can increase the likelihood of breast cancer have been
identified. The most well-known are breast cancer gene 1 (BRCA1) and breast
cancer gene 2 (BRCA2), both of which significantly increase the risk of both
breast and ovarian cancer. If you have a strong family history of breast cancer or
other cancers, your doctor may recommend a blood test to help identify specific
mutations in BRCA or other genes that are being passed through your family.
Consider asking your doctor for a referral to a genetic counselor, who can review
your family health history. A genetic counselor can also discuss the benefits, risks
and limitations of genetic testing to assist you with shared decision-making.
3.1.3 RISK FACTORS

A breast cancer risk factor is anything that makes it more likely you'll get
breast cancer. But having one or even several breast cancer risk factors doesn't
necessarily mean you'll develop breast cancer. Many women who develop breast
cancer have no known risk factors other than simply being women.

Factors that are associated with an increased risk of breast cancer include:

Being female. Women are much more likely than men are to develop breast
cancer.

Increasing age. Your risk of breast cancer increases as you age.

A personal history of breast conditions. If you've had a breast biopsy that found
lobular carcinoma in situ (LCIS) or atypical hyperplasia of the breast, you have an
increased risk of breast cancer.

A personal history of breast cancer. If you've had breast cancer in one breast,
you have an increased risk of developing cancer in the other breast.

A family history of breast cancer. If your mother, sister or daughter was


diagnosed with breast cancer, particularly at a young age, your risk of breast
cancer is increased. Still, the majority of people diagnosed with breast cancer have
no family history of the disease.

Inherited genes that increase cancer risk. Certain gene mutations that increase
the risk of breast cancer can be passed from parents to children. The most well-
known gene mutations are referred to as BRCA1 and BRCA2. These genes can
greatly increase your risk of breast cancer and other cancers, but they don't make
cancer inevitable.

Radiation exposure. If you received radiation treatments to your chest as a child


or young adult, your risk of breast cancer is increased.
Obesity. Being obese increases your risk of breast cancer.

Beginning your period at a younger age. Beginning your period before age 12
increases your risk of breast cancer.

Beginning menopause at an older age. If you began menopause at an older age,


you're more likely to develop breast cancer.

Having your first child at an older age. Women who give birth to their first
child after age 30 may have an increased risk of breast cancer.

Having never been pregnant. Women who have never been pregnant have a
greater risk of breast cancer than do women who have had one or more
pregnancies.

Postmenopausal hormone therapy. Women who take hormone therapy


medications that combine estrogen and progesterone to treat the signs and
symptoms of menopause have an increased risk of breast cancer. The risk of
breast cancer decreases when women stop taking these medications.

Drinking alcohol. Drinking alcohol increases the risk of breast cancer.

3.1.4 SOME RISK FACTORS FOR BREAST CANCER

The following are some of the known risk factors for breast
cancer. However, most cases of breast cancer cannot be linked to a specific cause.
Talk to your doctor about your specific risk.

Age: The chance of getting breast cancer increases as women age. Nearly 80
percent of breast cancers are found in women over the age of 50.

Personal history of breast cancer: A woman who has had breast cancer in one
breast is at an increased risk of developing cancer in her other breast.
Family history of breast cancer: A woman has a higher risk of breast cancer if
her mother, sister or daughter had breast cancer, especially at a young age (before
40). Having other relatives with breast cancer may also raise the risk.

Genetic factors: Women with certain genetic mutations, including changes to the
BRCA1 and BRCA2 genes, are at higher risk of developing breast cancer during
their lifetime. Other gene changes may raise breast cancer risk as well.

Childbearing and menstrual history: The older a woman is when she has her
first child, the greater her risk of breast cancer. Also, at higher risk are:

o Women who menstruate for the first time at an early age (before 12)
o Women who go through menopause late (after age 55)
o Women who’ve never had children

Some hazard factors most occasions of chest dangerous headway can't be


connected with a specific reason and converse with pro about affirmation
credibility. Tolerant age is the chance of getting chest chance addition as women
age and about 80% of chest tumors are found in women late years old. A woman
has an exceedingly bet patient of chest risk if her mother, young lady or sister had
chest ailment and especially at a vivacious age underneath 40 and undeniable
relatives having chest infection may in like route develop the peril. The innate bits
of women with certain gained changes, including changes to the characteristic’s
types (BRCA1, BRCA2). These traits are at high danger of making chest hazard
in the midst of their wearisome life and other quality changes may raise chest
disease chance as well. Tumor-Node-Metastasis (TNM) mastermind is portrayed
of chest risk patients using TNM system instruments by power's decision. Expert
allocates the season of the chest harmed by setting tumor, center point and
metastasis outlines.

This information picks the finding of each patient and the


huge bit of patients is troubled to get capacity with the positive time of the chest
hurt. Overall ace will express the season of the risk when the testing after
accommodating method is done up, commonly around 5 to 7 days after remedial
structure.

A star proposes gathered three stages like I, IIA, IIB and III. It have to find
Accuracy of the training dataset, Accuracy of the testing dataset, Specification,
False Positive rate, precision and recall by comparing algorithm using python
code. The following Involvement steps are,

o Define a problem
o Preparing data
o Evaluating algorithms
o Improving results
o Predicting results

The steps involved in Building the data model is depicted below.

Data collection (Splitting Training set &


Test set)

Pre Processing (Outlier


Detection)

Building classification
Model

Prediction (patient
stages)

Fig 2: Data Flow Diagram For Machine Learning Model


3.2 MENOPAUSE:

Menopause is a stage in life when a woman stops having


her monthly period. It is a normal part of aging and marks the end of a woman's
reproductive years. Menopause typically occurs in a woman's late 40s to early
50s. However, women who have their ovaries surgically removed undergo
"sudden" menopause. Natural menopause is the permanent ending of menstruation
that is not brought on by any type of medical treatment. For women undergoing
natural menopause, the process is gradual and is described in three stages:

Perimenopause or "menopause transition." Perimenopause


can begin eight to 10 years before menopause, when the ovaries gradually produce
less estrogen. It usually starts in a woman's 40s, but can start in the 30s as well.
Perimenopause lasts up until menopause, the point when the ovaries stop releasing
eggs. In the last one to two years of perimenopause, the drop-in estrogen
accelerates. At this stage, many women can experience menopause symptoms.
Women are still having menstrual cycles during this time, and can get pregnant.

Menopause. Menopause is the point when a woman no


longer has menstrual periods. At this stage, the ovaries have stopped releasing
eggs and producing most of their estrogen. Menopause is diagnosed when a
woman has gone without a period for 12 consecutive months.

Post menopause. These are the years after menopause.


During this stage, menopausal symptoms, such as hot flashes, can ease for many
women. But, as a result of a lower level of estrogen, postmenopausal women are
at increased risk for a number of health conditions, such as osteoporosis and heart
disease. Medication, such as hormone therapy and/or healthy lifestyle changes,
may reduce the risk of some of these conditions. Since every woman's risk is
different, talk to your doctor to learn what steps you can take to reduce your
individual risk.
3.3 TUMOR (T)

Using the TNM system, the “T” plus a letter or number (0 to 4) is used to
describe the size and location of the tumor. Tumor size is measured in centimeters
(cm). A centimeter is roughly equal to the width of a standard pen or pencil. Stage
may also be divided into smaller groups that help describe the tumor in even more
detail.

TX: The primary tumor cannot be evaluated.

T0: (T plus zero): There is no evidence of cancer in the breast.

Tis: Refers to carcinoma in situ. The cancer is confined


within the ducts or lobules of the breast tissue and has not spread into the
surrounding tissue of the breast. There are 2 types of breast carcinoma in situ:
Tis (DCIS): DCIS is a noninvasive cancer, but if not
removed it may develop into an invasive breast cancer later. DCIS means that
cancer cells have been found in breast ducts and have not spread past the layer of
tissue where they began.
Tis (Paget’s): Paget’s disease of the nipple is a rare form of
early, noninvasive cancer that is only in the skin cells of the nipple. Sometimes
Paget’s disease is associated with another, invasive breast cancer. If there is
another invasive breast cancer, it is classified according to the stage of the
invasive tumor.

T1: The tumor in the breast is 20 millimeters (mm) or smaller in size at its widest
area.

This is a little less than an inch. This stage is then broken into 4 substages
depending on the size of the tumor:
T1mi is a tumor that is 1 mm or smaller
T1a is a tumor that is larger than 1 mm but 5 mm or smaller

T1b is a tumor that is larger than 5 mm but 10 mm or smaller

T1c is a tumor that is larger than 10 mm but 20 mm or smaller

T2: The tumor is larger than 20 mm but not larger than 50 mm.

T3: The tumor is larger than 50 mm.

T4: The tumor falls into 1 of the following groups:


T4a means the tumor has grown into the chest wall.

T4b is when the tumor has grown into the skin.

T4c is cancer that has grown into the chest wall and the skin.

T4d is inflammatory breast cancer.

3.4 NODE (N)

The “N” in the TNM staging system stands for lymph nodes. Regional lymph
nodes include:

Lymph nodes located under the arm, called the axillary lymph nodes

Above and below the collarbone

Under the breastbone, called the internal mammary lymph nodes

Lymph nodes in other parts of the body are called distant lymph nodes. As
explained above, if the doctor evaluates the lymph nodes before surgery, based on
other tests and/or a physical examination, a letter “c” for “clinical” staging is
placed in front of the “N.”
If the doctor evaluates the lymph nodes after surgery, which is a more accurate
assessment, a letter “p” for “pathologic” staging is placed in front of the “N.” The
information below describes the pathologic staging.

NX: The lymph nodes were not evaluated.


N0: Either of the following:
No cancer was found in the lymph nodes.

Only areas of cancer smaller than 0.2 mm are in the lymph nodes.

N1: The cancer has spread to 1 to 3 axillary lymph nodes and/or the internal
mammary lymph nodes.
N2: The cancer has spread to 4 to 9 axillary lymph nodes. Or it has spread to the
internal mammary lymph nodes, but not the axillary lymph nodes.
N3: The cancer has spread to 10 or more axillary lymph nodes. Or it has spread to
the lymph nodes located under the clavicle, or collarbone. It may have also spread
to the internal mammary lymph nodes. Cancer that has spread to the lymph nodes
above the clavicle, called the supraclavicular lymph nodes, is also described as
N3.
If there is cancer in the lymph nodes, knowing how many lymph nodes are
involved and where they are helps doctors to plan treatment. The pathologist can
find out the number of axillary lymph nodes that contain cancer after they are
removed during surgery. It is not common to remove the supraclavicular or
internal mammary lymph nodes during surgery. If there is cancer in these lymph
nodes, treatment other than surgery, such as radiation therapy, chemotherapy, and
hormonal therapy are used first.

3.5 METASTASIS (M)

The “M” in the TNM system indicates whether the cancer has spread to
other parts of the body, called distant metastasis. This is no longer considered
early-stage or locally advanced cancer. For more information on metastatic breast
cancer, see the Guide to Metastatic Breast Cancer.

MX: Distant spread cannot be evaluated.


M0: The disease has not metastasized.
M0 (i+): There is no clinical or radiographic evidence of distant metastases.
Microscopic evidence of tumor cells is found in the blood, bone marrow, or other
lymph nodes that are no larger than 0.2 mm.
M1: There is evidence of metastasis to another part of the body, meaning there are
breast cancer cells growing in other organs.

3.6 CANCER STAGE GROUPING

Doctors assign the stage of the cancer by combining the T, N, and M


classifications and the tumor grade and the results of ER/PR and HER2 testing.
This information is used to help determine your prognosis (see Diagnosis). The
simpler approach to explaining the stage of breast cancer is to use the T, N, and M
classifications. This is the approach used below to describe the different stages.
Most patients are anxious to learn the exact stage of the cancer. Your doctor will
generally confirm the stage of the cancer when the testing after surgery is
finalized, usually about 5 to 7 days after surgery. When systemic or whole-body
treatment is given before surgery, called neoadjuvant therapy, the stage of the
cancer is primarily determined clinically. Doctors may refer to stage I to stage IIA
cancer as early stage, and stage IIB to stage III as locally advanced.

Stage 0: Stage zero (0) describes disease that is only in the ducts and lobules of
the breast tissue and has not spread to the surrounding tissue of the breast. It is
alsocallednoninvasivecancer(Tis,N0,M0).

Stage IA: The tumor is small, invasive, and has not spread to the lymph nodes
(T1,N0,M0).

Stage IB: Cancer has spread to the lymph nodes and the cancer in the lymph node
is larger than 0.2 mm but less than 2 mm in size. There is either no evidence of a
tumor in the breast or the tumor in the breast is 20 mm or smaller (T0 or T1, N1,
M0).

Stage IIA: Any 1 of these conditions:


There is no evidence of a tumor in the breast, but the cancer has spread to 1 to 3
axillary lymph nodes. It has not spread to distant parts of the body. (T0, N1, M0).

The tumor is 20 mm or smaller and has spread to the axillary lymph nodes (T1,
N1, M0).

The tumor is larger than 20 mm but not larger than 50 mm and has not spread to
the axillary lymph nodes (T2, N0, M0).

Stage IIB: Either of these conditions:


The tumor is larger than 20 mm but not larger than 50 mm and has spread to 1 to 3
axillary lymph nodes (T2, N1, M0).

The tumor is larger than 50 mm but has not spread to the axillary lymph nodes
(T3, N0, M0).

Stage IIIA: The cancer of any size has spread to 4 to 9 axillary lymph nodes or to
internal mammary lymph nodes. It has not spread to other parts of the body (T0,
T1, T2 or T3, N2, M0). Stage IIIA may also be a tumor larger than 50 mm that
has spread to 1 to 3 axillary lymph nodes (T3, N1, M0).

Stage IIIB: The tumor has spread to the chest wall or caused swelling or
ulceration of the breast or is diagnosed as inflammatory breast cancer. It may or
may not have spread to up to 9 axillary or internal mammary lymph nodes. It has
not spread to other parts of the body (T4; N0, N1 or N2; M0).

Stage IIIC: A tumor of any size that has spread to 10 or more axillary lymph
nodes, the internal mammary lymph nodes, and/or the lymph nodes under the
collarbone. It has not spread to other parts of the body (any T, N3, M0).
Stage IV (metastatic): The tumor can be any size and has spread to other organs,
such as the bones, lungs, brain, liver, distant lymph nodes, or chest wall (any T,
any N, M1). Metastatic cancer found when the cancer is first diagnosed occurs
about 5% to 6% of the time. This may be called de novo metastatic breast cancer.
Most commonly, metastatic breast cancer is found after a previous diagnosis of
early breast cancer. Learn more about metastatic breast cancer.

3.6.1 RECURRENT:

Recurrent cancer is cancer that has come back after treatment, and can be
described as local, regional, and/or distant. If the cancer does return, there will be
another round of tests to learn about the extent of the recurrence. These tests and
scans are often similar to those done at the time of the original diagnosis.

3.7 PREVENTION:

3.7.1 BREAST SELF-EXAM


Making changes in your daily life may help reduce your risk of breast cancer. Try
to:

Ask your doctor about breast cancer screening. Discuss with your doctor when
to begin breast cancer screening exams and tests, such as clinical breast exams
and mammograms. Talk to your doctor about the benefits and risks of screening.
Together, you can decide what breast cancer screening strategies are right for you.

Become familiar with your breasts through breast self-exam for breast
awareness. Women may choose to become familiar with their breasts by
occasionally inspecting their breasts during a breast self-exam for breast
awareness. If there is a new change, lumps or other unusual signs in your breasts,
talk to your doctor promptly.
Breast awareness can't prevent breast cancer, but it may help you to better
understand the normal changes that your breasts undergo and identify any unusual
signs and symptoms.

Drink alcohol in moderation, if at all. Limit the amount of alcohol you drink to
no more than one drink a day, if you choose to drink.

Exercise most days of the week. Aim for at least 30 minutes of exercise on most
days of the week. If you haven't been active lately, ask your doctor whether it's
OK and start slowly.

Limit postmenopausal hormone therapy. Combination hormone therapy may


increase the risk of breast cancer. Talk with your doctor about the benefits and
risks of hormone therapy.

Some women experience bothersome signs and symptoms during menopause and,
for these women, the increased risk of breast cancer may be acceptable in order to
relieve menopause signs and symptoms.

To reduce the risk of breast cancer, use the lowest dose of hormone therapy
possible for the shortest amount of time.

Maintain a healthy weight. If your weight is healthy, work to maintain that


weight. If you need to lose weight, ask your doctor about healthy strategies to
accomplish this. Reduce the number of calories you eat each day and slowly
increase the amount of exercise.

Choose a healthy diet. Women who eat a Mediterranean diet supplemented with
extra-virgin olive oil and mixed nuts may have a reduced risk of breast cancer.
The Mediterranean diet focuses mostly on plant-based foods, such as fruits and
vegetables, whole grains, legumes, and nuts. People who follow the
Mediterranean diet choose healthy fats, such as olive oil, over butter and fish
instead of red meat.
3.7.2 BREAST CANCER RISK REDUCTION FOR WOMEN WITH A
HIGH RISK

If your doctor has assessed your family history and determined that you have
other factors, such as a precancerous breast condition, that increase your risk of
breast cancer, you may discuss options to reduce your risk, such as:

Preventive medications (chemoprevention). Estrogen-blocking medications,


such as selective estrogen receptor modulators and aromatase inhibitors, reduce
the risk of breast cancer in women with a high risk of the disease.

These medications carry a risk of side effects, so doctors reserve these


medications for women who have a very high risk of breast cancer. Discuss the
benefits and risks with your doctor.

Preventive surgery. Women with a very high risk of breast cancer may choose to
have their healthy breasts surgically removed (prophylactic mastectomy). They
may also choose to have their healthy ovaries removed (prophylactic
oophorectomy) to reduce the risk of both breast cancer and ovarian cancer.

3.8 TNM STAGING SYSTEM:

The most commonly used tool that doctors use to describe the stage is the
TNM system. Doctors use the results from diagnostic tests and scans to answer
these questions:

Tumor (T): How large is the primary tumor? Where is it located?


Node (N): Has the tumor spread to the lymph nodes? If so, where and how many?
Metastasis (M): Has the cancer spread to other parts of the body? If so, where
and how much?
The results are combined to determine the stage of cancer for each person.
There are 5 stages: stage 0 (zero), which is noninvasive ductal carcinoma in situ
(DCIS), and stages I through IV (1 through 4), which are used for invasive breast
cancer. The stage provides a common way of describing the cancer, so doctors can
work together to plan the best treatments. Staging can be clinical or pathological.
Clinical staging is based on the results of tests done before surgery, which may
include physical examinations, mammogram, ultrasound, and MRI scans.
Pathologic staging is based on what is found during surgery to remove breast
tissue and lymph nodes. The results are usually available several days after
surgery. In general, pathological staging provides the most information to
determine a patient’s prognosis.

3.8.1 THE SIGNIFICANCE OF THE STAGE OF THE CANCER

The stage of a cancer is a measurement of the extent of the cancer and its
spread. The standard staging system for breast cancer uses a system known as
TNM, where:

T stands for the main (primary) tumor

N stands for spread to nearby lymph nodes

M stands for metastasis (spread to distant parts of the body)

If the stage is based on removal of the cancer with surgery and review by the
pathologist, the letter p (for pathologic) may appear before the T and N letters.

The T category (T0, Tis, T1, T2, T3, or T4) is based on the size of the tumor and
whether or not it has spread to the skin over the breast or to the chest wall under
the breast. Higher T numbers mean a larger tumor and/or wider spread to tissues
near the breast. (Tis is carcinoma in situ.) Since the entire tumor must be removed
to learn the T category, this information is not given for needle biopsies. The N
category (N0, N1, N2, or N3) indicates whether the cancer has spread to lymph
nodes near the breast and, if so, how many lymph nodes are affected. Higher
numbers after the N indicate more lymph node involvement by cancer. If no
nearby lymph nodes were removed to be checked for cancer spread, the report
may list the N category as NX, where the letter X is used to mean that the
information is not available (also see next question). The M category (M0, M1) is
usually based on the results of lab and imaging tests, and is not part of the
pathology report from breast cancer surgery. In a pathology report, the M category
is often left off or listed as MX (again the letter X means that the information is
not available). Once the T, N, and M categories have been determined, this
information is combined to give the cancer an overall stage. Stages are expressed
in Roman numerals from stage I (the least advanced stage) to stage IV (the most
advanced stage). Non-invasive cancer (carcinoma in situ) is listed as stage 0.

3.9 GRADE:

The grade of a tumour indicates what the cells look like and gives an idea of how
quickly the cancer may grow and spread. Tumours are graded between 1 and
3. Grading for non-invasive breast cancerssuch as ductal carcinoma in situ (DCIS)
is different, and is defined as low, medium or high grade rather than 1, 2, or 3.

3.9.1 STAGING AND GRADING

Staging and grading are ways in which healthcare professionals describe the size
of your breast cancer, whether and how far it has spread, and how fast it may
grow (or how ‘aggressive’ it is). Knowing your cancer’s stage and/or grade helps
your breast care team plan the best treatment for you. Staging and grading usually
happens after your breast tumor has been removed by surgery, as
a pathologist will need to test the tissue in a laboratory and examine it under a
microscope. The grade of a tumor indicates what the cells look like and gives an
idea of how quickly the cancer may grow and spread. Tumors are graded between
1 and 3.
Grade 1 – the cancer cells look small and uniform like normal cells, and are
usually slow-growing compared to other grades of breast cancer

Grade 2 – the cancer cells are slightly bigger than normal cells, varying in shape
and are growing faster than normal cells

Grade 3 – the cancer cells look different to normal cells, and are usually faster-
growing than normal cells

Fig 3: Type Of Grade Cells

Grading for non-invasive breast cancers such as ductal carcinoma in situ


(DCIS) is different, and is defined as low, medium or high grade rather than 1, 2,
or 3.

3.9.2 STAGING

Staging is used to assess the size of a tumor, whether it has spread and how
far it has spread. Understanding the stage of the cancer helps doctors to predict the
likely outcome and design a treatment plan for individual patients. The main
method used for defining the stage of a cancer is the TNM (tumors, nodes,
metastasis) system. The TNM system is often used to categories cancers into four
stages.
Stage 1 usually means that a cancer is relatively small and contained within the
breast.

Fig 4: Stage 1

Stage 2 usually means the cancer has not started to spread into surrounding tissue
but the tumor is larger than in Stage 1. Sometimes Stage 2 means that cancer cells
have spread into lymph nodes close to the tumor.

Fig 5: Stage 2
Stage 3 usually means the cancer is larger. It may have started to spread into
surrounding tissues and there are cancer cells in the lymph nodes in the area.

Fig 6: Stage 3

Stage 4 means the cancer has spread from where it started to another body organ.
This is also called secondary or metastatic cancer.

Fig 7: Stage 4

These are special tests that the pathologist sometimes uses to help diagnose
invasive breast cancer or to identify cancer in lymph nodes. Not all cases need
these tests. Whether or not your report mentions these tests has no bearing on the
accuracy of your diagnosis. All of these are terms for non-cancerous (benign)
changes that the pathologist might see under the microscope. They are not
important when seen on a biopsy where there is invasive breast cancer.
3.10 PROJECT GOALS

3.10.1 EXPLORATION DATA ANALYSIS OF


VARIABLE IDENTIFICATION
Loading the given dataset

Import required libraries packages

Analyze the general properties

Find duplicate and missing values

Checking unique and count values

3.10.2 UNI-VARIATE DATA ANALYSIS


Rename, add data and drop the data

To specify data type

Exploration data analysis of bi-variate and multi-variate


Plot diagram of pair plot, heatmap, bar chart and Histogram

3.10.3 METHOD OF OUTLIER DETECTION WITH


FEATURE ENGINEERING
Pre-processing the given dataset

Splitting the test and training dataset

Supervised machine learning algorithm.

3.11 Objectives

This analysis aims to observe which features are most helpful in predicting patient
having breast cancer stages and to see the general trends that may help us in
model selection and hyper parameter selection. To achieve used machine learning
classification methods to fit a function that can predict the discrete class of new
input.
The repository is a learning exercise to:

 Apply the fundamental concepts of machine learning from an available dataset


and Evaluate and interpret my results and justify my interpretation based on
observed dataset.
 Create notebooks that serve as computational records and document my
thought process and investigate applications of statistics for breast cancer to
analyses the data set.

 Evaluate and analyses statistical and visualized results, which find the standard
patterns for all regiments.

3.12 AIM:
The repository is a learning exercise to:

 Apply the fundamental concepts of machine learning from an available dataset


 Evaluate and interpret my results and justify my interpretation based on
observed data set
 Create notebooks that serve as computational records and document my
thought process.

The analysis is divided into four sections, saved in juypter notebooks in this
repository

1. Identifying the problem and Data Sources


2. Exploratory Data Analysis
3. Pre-Processing the Data
4. Build model to predict whether breast cancer stages

3.13 Scope:
The scope of this project is to investigate a dataset of medical patient records for
hospital sector using machine learning technique and to identifying patient having
breast cancer stages from given dataset attributes by prediction result in the form
of accuracy by supervised machine learning algorithm with finding precision,
recall, F1score.
CHAPTER 4

EXISTING AND PROPOSED SYSTEM

4.1 EXISTING SYSTEM


The patterns frequently appearing in the tumors with the same label can be
regarded as a potential diagnostic rule. Subsequently, the diagnostic rules are
utilized to construct component classifiers of the Adaboost algorithm via a novel
rules combination strategy which resolves the problem of classification in
different feature spaces (PC-DFS). Finally, the AdaBoost learning is performed to
discover effective combinations and integrate them into a strong classifier. The
proposed approach has been validated using a large ultrasonic dataset of 1062
breast tumor instances (including 418 benign cases and 644 malignant cases) and
its performance was compared with several conventional approaches. The
experimental results show that the proposed method yielded the best prediction
performance, indicating a good potential in clinical applications.
Boosting-general method of converting rough rules of thumb into highly
accurate predication rule. For a given sufficient data, a boosting algorithm can
provably construct single classifier with very high accuracy and studied some
advantages of boosting advantages- 1- less error based on ensemble method. 2-
suitable if the initial model is pretty bad.

4.1.1 DRAWBACKS:

To know what are the disadvantages of using boosting while creating


classification model.

It cannot work on top features and find-out the accuracy, Recall, Precision,
Confusion matrix and compare it with our old result.
It’s cannot work on using the popular machine learning algorithm to find out the
feature’s importance.
4.2 PROPOSED SYSTEM

4.2.1 EXPLORATORY DATA ANALYSIS

It will be using Jupyter notebook to work on this dataset and will first go
with importing the necessary libraries and import our dataset to Jupyter notebook:

4.2.1.1 SPLITTING THE DATASET

The data use is usually split into training data and test data. The training set
contains a known output and the model learns on this data in order to be
generalized to other data later on. It has the test dataset (or subset) in order to test
our model’s prediction on this subset and it will do this using SciKit-Learn library
in Python using the train_test_split method.

4.2.1.2 DATA WRANGLING


In this section of the report will load in the data, check for cleanliness, and
then trim and clean given dataset for analysis. Make sure that the document steps
carefully and justify for cleaning decisions.

4.2.1.3 DATA COLLECTION


The data set collected for predicting patient is split into Training set and
Test set. Generally, 7:3 ratios are applied to split the Training set and Test set. The
Data Model which was created using supervised machine learning algorithm are
applied on the Training set and based on the test result accuracy, Test set
prediction is done.
4.2.1.4 PREPROCESSING
The data which was collected might contain missing values that may lead to
inconsistency. To gain better results data need to be preprocessed so as to improve
the efficiency of the algorithm. The outliers have to be removed and also variable
conversion need to be done. Based on the correlation among attributes it was
observed that attributes that are significant individually include tnm, stages, grade,
age, which is the strongest among all. Some variables such as applicant income
and co- applicant income are not significant alone, which is strange since by
intuition it is considered as important.

4.3 BUILDING THE CLASSIFICATION MODEL


The predicting the breast cancer problem, decision tree algorithm prediction
model is effective because of the following reasons: It provides better results in
classification problem.
 It is strong in preprocessing outliers, irrelevant variables, and a mix of
continuous, categorical and discrete variables.
 It produces out of bag estimate error which has proven to be unbiased in many
tests and it is relatively easy to tune with.

4.4 CONSTRUCTION OF A PREDICTIVE MODEL

Machine learning needs data gathering have lot of past data’s. Data gathering
have sufficient historical data and raw data. Before data pre-processing, raw data
can’t be used directly. It’s used to preprocess then, what kind of algorithm with
model. Training and testing this model working and predicting correctly with
minimum errors. Tuned model involved by tuned time to time with improving the
accuracy.

Data
Gathering

Data Pre-
Processing
Choose
model

Train
model

Test
model

Tune
model

Predictio
n

Fig 8: Process Of Dataflow Diagram

4.5 TRAINING THE DATASET


The first line imports iris data set which is already
predefined in sklearn module and raw data set is basically a table which contains
information about various varieties.
For example, to import any algorithm and train_test_split
class from sklearn and NumPy module for use in this program.
To encapsulate load data() method in data dataset variable. Further divide the
dataset into training data and test data using train_test_split method. The X prefix
in variable denotes the feature values and y prefix denotes target values.
This method divides dataset into training and test data
randomly in ratio of 67:33 / 70:30. Then we encapsulate any algorithm.
In the next line, we fit our training data into this algorithm so that computer can
get trained using this data. Now the training part is complete.
4.6 TESTING THE DATASET
Now, the dimensions of new features in a NumPy array
called ‘n’ and we want to predict the species of this features and to do using the
predict method which takes this array as input and spits out predicted target value
as output.
So, the predicted target value comes out to be 0. Finally, to find the test score
which is the ratio of no. of predictions found correct and total predictions made
and finding accuracy score method which basically compares the actual values of
the test set with the predicted values.

Patient Details

Test
Data datase
Processing t

Ensembl
Supervised e
Trainin machine Model
g
dataset

Fig 9: Architecture Of Proposed Model

4.7 GENERAL PROPERTIES


Create cells freely to explore the given data and it should not perform too
many operations in each cell. One option that can take with this is to do a lot of
explorations in an initial notebook. These don't have to be organized, but make
sure you use enough comments to understand the purpose of each code cell. Then,
after done with your analysis, create a duplicate notebook where it will trim the
excess and organize steps so that have a flowing, cohesive report and make sure
that informed on the steps that are taking in your investigation. Follow every code
cell, or every set of related code cells, with a markdown cell to describe to the
reader what was found in the preceding cell. Try to make it so that the reader can
then understand what they will be seeing in the following cell.

4.8 ENSEMBLE LEARNING:

Ensemble learning helps improve machine learning results by combining


several models. This approach allows the production of better predictive
performance compared to a single model and it is the art of combining diverse set
of learners (individual models) together to improvise on the stability and
predictive power of the model. In the world of Statistics and Machine Learning,
Ensemble learning techniques attempt to make the performance of the predictive
models better by improving their accuracy. Ensemble Learning is a process using which
multiple machine learning models (such as classifiers) are strategically constructed to
solve a particular problem.

Max Voting: The max voting method is generally used for classification
problems. In this technique, multiple models are used to make predictions for each
data point. The predictions by each model are considered as a ‘vote’. The
predictions which we get from the majority of the models are used as the final
prediction.

Averaging: Similar to the max voting technique, multiple predictions are made
for each data point in averaging. In this method, we take an average of predictions
from all the models and use it to make the final prediction. Averaging can be used
for making predictions in regression problems or while calculating probabilities
for classification problems.

Weighted Average: This is an extension of the averaging method. All models are
assigned different weights defining the importance of each model for prediction.
Data

SV KN N R
D L
M N B F
T R

Vot
e

Fig 10: Ensemble Structure

Important points: (while using ensemble model prediction)

o Better accuracy (low error)


o Higher consistency (Avoids overfitting)
o Reduces bias and variance errors
o Single model overfits
o Results worth the extra training

It can be used for classification as well as regression

4.9 APPLICATIONS OF ENSEMBLE METHODS:

Ensemble methods can be used as overall diagnostic procedures for a more


conventional model building. The larger the difference in fit quality between one
of the stronger ensemble methods and a conventional statistical model, the more
information that the conventional model is probably missing.
Ensemble methods can be used to evaluate the
relationships between explanatory variables and the response in conventional
statistical models. Predictors or basis functions overlooked in a conventional
model may surface with an ensemble approach.
With the help of the ensemble method, the selection process
could be better captured and the probability of membership in each treatment
group estimated with less bias.
One could use ensemble methods to implement the
covariance adjustments inherent in multiple regression and related procedures. One
would “residualized” the response and the predictors of interest with ensemble
methods.

4.10 VOTING BASED ENSEMBLE LEARNING:

Voting is one of the most straightforward Ensemble learning techniques in


which predictions from multiple models are combined. The method starts with
creating two or more separate models with the same dataset. Then a Voting based
Ensemble model can be used to wrap the previous models and aggregate the
predictions of those models. After the Voting based Ensemble model is
constructed, it can be used to make a prediction on new data. The predictions
made by the sub-models can be assigned weights. Stacked aggregation is a
technique which can be used to learn how to weigh these predictions in the best
possible way.

4.10.1 ADVANTAGES

1. The goal of this problem is to predict the status of detecting the


patient having breast cancer or not by prediction accuracy results of test dataset.

2. To reduce doctor risk.


CHAPTER 5

HARDWARE AND SOFTWARE REQUIREMENTS

5.1 GENERAL:

Requirements are the basic constrains that are required to develop a


system. Requirements are collected while designing the system. The following are
the requirements that are to be discussed.

1. Functional requirements

2. Non-Functional requirements

3. Environment requirements

A. Hardware requirements

B. software requirements

5.1.1 FUNCTIONAL REQUIREMENTS:

The software requirements specification is a technical specification of


requirements for the software product. It is the first step in the requirements
analysis process. It lists requirements of a particular software system. The
following details to follow the special libraries like sk-learn, pandas, numpy,
matplotlib and seaborn.

5.1.2 NON-FUNCTIONAL REQUIREMENTS:

Process of functional steps,

1. Problem define

2. Preparing data

3. Evaluating algorithms

4. Improving result
5. Prediction the result

5.1.3 ENVIRONMENTAL REQUIREMENTS:

1. Software Requirements:

Operating System :

Windows

Tool : CST

2. Hardware requirements:

Processor : I3 Processor

Hard disk : minimum 40 GB

RAM : minimum 2 GB

5.2 SOFTWARE DESCRIPTION

Anaconda is a free and open-source distribution of


the Python and R programming languages for scientific computing (data
science, machine learning applications, large-scale data processing, predictive
analytics, etc.), that aims to simplify package management and deployment.
Package versions are managed by the package management system “Conda”. The
Anaconda distribution is used by over 12 million users and includes more than
1400 popular data-science packages suitable for Windows, Linux, and MacOS.
So, Anaconda distribution comes with more than 1,400 packages as well as
the Conda package and virtual environment manager called Anaconda
Navigator and it eliminates the need to learn to install each library independently.
The open source packages can be individually installed from the Anaconda
repository with the conda install command or using the pip install command that

is installed with Anaconda. Pip packages provide many of the features of conda
packages and in most cases they can work together. Custom packages can be
made using the conda build command, and can be shared with others by
uploading them to Anaconda Cloud, PyPI or other repositories. The default
installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python
3.7. However, you can create new environments that include any version of
Python packaged with conda.

5.2.1 ANACONDA NAVIGATOR

Anaconda Navigator is a desktop graphical user interface (GUI) included in


Anaconda distribution that allows users to launch applications and manage conda
packages, environments and channels without using command-line commands.
Navigator can search for packages on Anaconda Cloud or in a local Anaconda
Repository, install them in an environment, run the packages and update them. It
is available for Windows, macOS and Linux.

The following applications are available by default in Navigator:

 JupyterLab
 Jupyter Notebook
 QtConsole
 Spyder
 Glueviz
 Orange
 Rstudio
 Visual Studio

Code Conda:

Conda is an open source, cross-platform, language-agnostic package manager and


environment management system that installs, runs and updates packages and
their dependencies. It was created for Python programs, but it can package and
distribute software for any language (e.g., R), including multi-languages. The
Conda package and environment manager is included in all versions of Anaconda,
Miniconda, and Anaconda Repository.
5.2.2 THE JUPYTER NOTEBOOK

The Jupyter Notebook is an open-source web application that allows you to create
and share documents that contain live code, equations, visualizations and narrative
text. Uses include: data cleaning and transformation, numerical simulation,
statistical modeling, data visualization, machine learning, and much more.

Notebook document:
Notebook documents (or “notebooks”, all lower case) are documents produced by
the Jupyter Notebook App, which contain both computer code (e.g. python) and
rich text elements (paragraph, equations, figures, links, etc…). Notebook
documents are both human-readable documents containing the analysis
description and the results (figures, tables, etc.) as well as executable documents
which can be run to perform data analysis.

Jupyter Notebook App:


The Jupyter Notebook App is a server-client application that allows editing and
running notebook documents via a web browser. The Jupyter Notebook App can
be executed on a local desktop requiring no internet access (as described in this
document) or can be installed on a remote server and accessed through the
internet. In addition to displaying/editing/running notebook documents,
the Jupyter Notebook App has a “Dashboard” (Notebook Dashboard), a “control
panel” showing local files and allowing to open notebook documents or shutting
down their kernels.

kernel:
A notebook kernel is a “computational engine” that executes the code contained in
a Notebook document. The ipython kernel, referenced in this guide, executes
python code. Kernels for many other languages exist (official kernels). When you
open a Notebook document, the associated kernel is automatically launched.
When the notebook is executed (either cell-by-cell or with menu Cell -> Run All),
the kernel performs the computation and produces the results. Depending on the
type of computations, the kernel may consume significant CPU and RAM. Note
that the RAM is not released until the kernel is shut-down.

Notebook Dashboard:
The Notebook Dashboard is the component which is shown first when you
launch Jupyter Notebook App. The Notebook Dashboard is mainly used to open
notebook documents, and to manage the running kernels (visualize and
shutdown). The Notebook Dashboard has other features similar to a file manager,
namely navigating folders and renaming/deleting files.

5.3 WORKING PROCESS:

1.Download and install anaconda and get the most useful


package for machine learning in Python.
2.Load a dataset and understand its structure using
statistical summaries and data visualization. machine learning models, pick the
best and build confidence that the accuracy is reliable.
3.Python is a popular and powerful interpreted language.
Unlike R, Python is a complete language and platform that you can use for both
research and development and developing production systems. There are also a lot
of modules and libraries to choose from, providing multiple ways to do each task.
It can feel overwhelming.

The best way to get started using Python for machine learning is to complete a
project.

It will force you to install and start the Python interpreter (at the very least).
It will give you a bird’s eye view of how to step through a small project.
It will give you confidence, maybe to go on to your own small projects.
When you are applying machine learning to your own datasets, you are working
on a project. A machine learning project may not be linear, but it has a number of
well-known steps:

o Define Problem.
o Prepare Data.
o Evaluate Algorithms.
o Improve Results.
o Present Results.

The best way to really come to terms with a new platform or tool is to work
through a machine learning project end-to-end and cover the key steps. Namely,
from loading data, summarizing data, evaluating algorithms and making some
predictions.

Here is an overview of what we are going to cover:

1. Installing the Python anaconda


platform. 2.Loading the dataset.
3.Summarizing the dataset.
4.Visualizing the dataset.
5.Evaluating some algorithms.
6.Making some predictions
CHAPTER 6

DESIGN AND BLOCK DIAGRAMS

6.1 SYSTEM ARCHITECTURE – PHASE I

Fig 11: Phase 1 Flow Diagram

6.2 DESIGN ARCHITECTURE / SYSTEM ARCHITECTURE / BUSINESS


DIAGRAM: - PHASE - II

Past dataset

Machine learning

Given attributes Pre-processing Train model


Ensemble learning method
Accuracy

Prediction results by average voting values

Fig 12: Phase 2 Flow Diagram

6.3 WORK FLOW DIAGRAM

Source
Data

Data Processing and


Cleaning

Training Testing
Dataset Dataset

Supervised Find stages of TNM and


Grade
machine
of cells

Prediction of Patient
stages
Fig 13: Workflow Diagram
6.4 USE CASE DIAGRAM

Use case diagrams are considered for high level requirement analysis of a system.
So when the requirements of a system are analyzed the functionalities are
captured in use cases. So, it can say that uses cases are nothing but the system
functionalities written in an organized manner. Now the second things which are
relevant to the use cases are the actors (Patient/Doctor).
6.5 Class Diagram:

Fig 14: Class Diagram

Class diagram is basically a graphical representation of the static view of the


system and represents different aspects of the application. So a collection of class
diagrams represent the whole system. The name of the class diagram should be
meaningful to describe the aspect of the system. Each element and their
relationships should be identified in advance Responsibility (attributes and
methods) of each class should be clearly identified for each class minimum
number of properties should be specified and because, unnecessary properties will
make the diagram complicated. Use notes whenever required to describe some
aspect of the diagram and at the end of the drawing it should be understandable to
the developer/coder. Finally, before making the final version, the diagram should
be drawn on plain paper and rework as many times as possible to make it correct.
6.6 ACTIVITY DIAGRAM:

Fig 15: Activity Diagram

Activity is a particular operation of the system. Activity diagrams are not only
used for visualizing dynamic nature of a system but they are also used to construct
the executable system by using forward and reverse engineering techniques. The
only missing thing in activity diagram is the message part. It does not show any
message flow from one activity to another. Activity diagram is some time
considered as the flow chart. Although the diagrams looks like a flow chart but it
is not. It shows different flow like parallel, branched, concurrent and single.
6.7 SEQUENCE DIAGRAM:

Fig 16: Sequence Diagram

Sequence diagrams model the flow of logic within your system in a visual
manner, enabling you both to document and validate your logic, and are
commonly used for both analysis and design purposes. Sequence diagrams are the
most popular UML artifact for dynamic modeling, which focuses on identifying
the behavior within your system. Other dynamic modeling techniques
include activity diagramming, communication diagramming, timing diagramming,
and interaction overview diagramming. Sequence diagrams, along with class
diagrams and physical data models are in my opinion the most important design-
level models for modern business application development.
6.8 OVERALL PROJECT SEPARATION:

6.8.1 PHASE-I working

process 1.Module 1

2.Module 2

3.Module 3

4.Module 4

5.Module 5

6.8.2 PHASE-II (MAJOR PROJECT) -working

process 1.Module 1

2.Module 2

3.Module 3

4.Module 4

5.Module 5

6.Module 6

7.Module 7

6.9 MODULES: (PHASE I - WORKING PROCESS)


 Data validation process (Module-01)
 Exploration data analysis of preprocessing technique by given attributes (Module-
02)
 Performance measurements of logistic regression and decision tree algorithms
(Module-03)
 Performance measurements of Support vector classifier and Random forest
(Module-04)
 GUI based breast cancer stages of symptoms using LR (Module-05)

6.10 MODULES: (FOR MAJOR PROJECT ONLY) – 100% WORKING


PROCESS
 Data validation and pre-processing technique (Module-01)
 Exploration data analysis of visualization and training a model by given attributes
(Module-02)
 Performance measurements of logistic regression and decision tree algorithms
(Module-03)
 Performance measurements of Support vector classifier and Random forest
(Module-04)
 Performance measurements of KNN and Naive Bayes (Module-05)
 Improvisation of ML by ensemble learning method using voting classifier
(Module-06)
 GUI based predict the patient stages of breast cancer (Module-07)

6.11 PHASE I – WORKING DESCRIPTION

6.11.1 MODULE-01:

6.11.1.1 VARIABLE IDENTIFICATION PROCESS / DATA VALIDATION


PROCESS:

Validation techniques in machine learning are used to get the error rate of the
Machine Learning (ML) model, which can be considered as close to the true error
rate of the dataset. If the data volume is large enough to be representative of the
population, you may not need the validation techniques. However, in real-world
scenarios, to work with samples of data that may not be a true representative of
the population of given dataset. To finding the missing value, duplicate value and
description of data type whether it is float variable or integer. The sample of data
used to provide an unbiased evaluation of a model fit on the training dataset while
tuning model hyper parameters. The evaluation becomes more biased as skill on
the validation dataset is incorporated into the model configuration. The validation
set is used to evaluate a given model, but this is for frequent evaluation. It as
machine learning engineers uses this data to fine-tune the model hyper parameters.
Data collection, data analysis, and the process of addressing data content, quality,
and structure can add up to a time-consuming to-do list. During the process of
data identification, it helps to understand your data and its properties; this
knowledge will help you choose which algorithm to use to build your model. For
example, time series data can be analyzed by regression algorithms; classification
algorithms can be used to analyze discrete data. (For example to show the data
type format of given dataset)

Fig 17: Given Data Frame

6.11.1.2 DATA VALIDATION/ CLEANING/PREPARING PROCESS:


Importing the library packages with loading given dataset.
To analyzing the variable identification by data shape, data type and evaluating
the missing values, duplicate values. A validation dataset is a sample of data held
back from training your model that is used to give an estimate of model skill while
tuning model's and procedures that you can use to make the best use
of validation and test datasets when evaluating your models. Data cleaning /
preparing by rename the given dataset and drop the column etc. to analyze the uni-
variate, bi-variate and multi-variate process. The steps and techniques for data
cleaning will vary from dataset to dataset. The primary goal of data cleaning is to
detect and remove errors and anomalies to increase the value of data in analytics
and decision making.
Fig 18: To Validate Patient Ages

6.11.1.3 DATA PRE-PROCESSING:

Pre-processing refers to the transformations applied to our data before feeding it


to the algorithm. Data Preprocessing is a technique that is used to convert the raw
data into a clean data set. In other words, whenever the data is gathered from
different sources it is collected in raw format which is not feasible for the analysis.
To achieving better results from the applied model in Machine Learning method
of the data has to be in a proper manner. Some specified Machine Learning model
needs information in a specified format; for example, Random Forest algorithm
does not support null values. Therefore, to execute random forest algorithm null
values have to be managed from the original raw data set.

6.11.2 MODULE-02:

6.11.2.1 EXPLORATION DATA ANALYSIS OF VISUALIZATION:

Data visualization is an important skill in applied statistics and machine learning.


Statistics does indeed focus on quantitative descriptions and estimations of data.
Data visualization provides an important suite of tools for gaining a qualitative
understanding. This can be helpful when exploring and getting to know a dataset
and can help with identifying patterns, corrupt data, outliers, and much more.
With a little domain knowledge, data visualizations can be used to express and
demonstrate key relationships in plots and charts that are more visceral and
stakeholders than measures of association or significance. Data visualization and
exploratory data analysis are whole fields themselves and it will recommend a
deeper dive into some the books mentioned at the end.

Sometimes data does not make sense until it can look at in a visual form, such as
with charts and plots. Being able to quickly visualize of data samples and others is
an important skill both in applied statistics and in applied machine learning. It will
discover the many types of plots that you will need to know when visualizing data
in Python and how to use them to better understand your own data.

 How to chart time series data with line plots and categorical quantities with bar
charts.
 How to summarize data distributions with histograms and box plots.
 How to summarize the relationship between variables with scatter plots.

Fig 19: Categories Of Patient’s Ages

Many machine learning algorithms are sensitive to the range and distribution of
attribute values in the input data. Outliers in input data can skew and mislead the
training process of machine learning algorithms resulting in longer training times,
less accurate models and ultimately poorer results.
Even before predictive models are prepared on training data, outliers can result in
misleading representations and in turn misleading interpretations of collected data.
Outliers can skew the summary distribution of attribute values in descriptive
statistics like mean and standard deviation and in plots such as histograms and
scatterplots, compressing the body of the data. Finally, outliers can represent
examples of data instances that are relevant to the problem such as anomalies in
the case of fraud detection and computer security.
It couldn’t fit the model on the training data and can’t say that the model will
work accurately for the real data. For this, we must assure that our model got the
correct patterns from the data, and it is not getting up too much noise. Cross-
validation is a technique in which we train our model using the subset of the data-
set and then evaluate using the complementary subset of the data-set.

The three steps involved in cross-validation are as follows:

1. Reserve some portion of sample data-set.


2. Using the rest data-set train the model.
3. Test the model using the reserve portion of the data-set.

6.11.2.2 ADVANTAGES OF TRAIN/TEST SPLIT:

1. This runs K times faster than Leave One Out cross-validation because K-fold
cross-validation repeats the train/test split K-times.
2. Simpler to examine the detailed results of the testing process.

6.11.2.3 ADVANTAGES OF CROSS-VALIDATION:

1. More accurate estimate of out-of-sample accuracy.


2. More “efficient” use of data as every observation is used for both training and
testing.

6.11.2.4 TRAINING THE DATASET:


 The first line imports iris data set which is already predefined in sklearn
module and raw data set is basically a table which contains information about
various varieties.
 For example, to import any algorithm and train_test_split class from sklearn
and numpy module for use in this program.
 To encapsulate load_data() method in data_dataset variable. Further divide the
dataset into training data and test data using train_test_split method. The X prefix
in variable denotes the feature values and y prefix denotes target values.
 This method divides dataset into training and test data randomly in ratio of
67:33 / 70:30. Then we encapsulate any algorithm.
 In the next line, we fit our training data into this algorithm so that computer can
get trained using this data. Now the training part is complete.

6.11.2.5 TESTING THE DATASET:


 Now, the dimensions of new features in a numpy array called ‘n’ and it want to
predict the species of this features and to do using the predict method which takes
this array as input and spits out predicted target value as output.
 So, the predicted target value comes out to be 0. Finally to find the test score
which is the ratio of no. of predictions found correct and total predictions made
and finding accuracy score method which basically compares the actual values of
the test set with the predicted values.

Fig 20: Position Of Breast Cancer


6.11.3 MODULE-03:

6.11.3.1 LOGISTIC REGRESSION:

It is a statistical method for analyzing a data set in which there are one or more
independent variables that determine an outcome. The outcome is measured with a
dichotomous variable (in which there are only two possible outcomes). The goal
of logistic regression is to find the best fitting model to describe the relationship
between the dichotomous characteristic of interest (dependent variable = response
or outcome variable) and a set of independent (predictor or explanatory) variables.
Logistic regression is a Machine Learning classification algorithm that is used to
predict the probability of a categorical dependent variable. In logistic regression,
the dependent variable is a binary variable that contains data coded as 1 (yes,
success, etc.) or 0 (no, failure, etc.).

In other words, the logistic regression model predicts P(Y=1) as a function of X.


Logistic regression Assumptions:

 Binary logistic regression requires the dependent variable to be binary.

 For a binary regression, the factor level 1 of the dependent variable should
represent the desired outcome.

 Only the meaningful variables should be included.

 The independent variables should be independent of each other. That is, the
model should have little.

 The independent variables are linearly related to the log odds.

 Logistic regression requires quite large sample sizes.


6.11.3.2 DECISION TREE:

It is one of the most powerful and popular algorithms. Decision-tree algorithm


falls under the category of supervised learning algorithms. It works for both
continuous as well as categorical output variables. Assumptions of Decision tree:

 At the beginning, we consider the whole training set as the root.


 Attributes are assumed to be categorical for information gain, attributes are
assumed to be continuous.
 On the basis of attribute values records are distributed recursively.
 We use statistical methods for ordering attributes as root or internal node.

Decision tree builds classification or regression models in the form of a tree


structure. It breaks down a data set into smaller and smaller subsets while at the
same time an associated decision tree is incrementally developed. A decision node
has two or more branches and a leaf node represents a classification or decision.
The topmost decision node in a tree which corresponds to the best predictor called
root node. Decision trees can handle both categorical and numerical data.
Decision tree builds classification or regression models in the form of a tree
structure. It utilizes an if-then rule set which is mutually exclusive and exhaustive
for classification. The rules are learned sequentially using the training data one at
a time. Each time a rule is learned, the tuples covered by the rules are removed.

This process is continued on the training set until meeting a termination condition.
It is constructed in a top-down recursive divide-and-conquer manner. All the
attributes should be categorical. Otherwise, they should be discretized in advance.
Attributes in the top of the tree have more impact towards in the classification and
they are identified using the information gain concept. A decision tree can be
easily over-fitted generating too many branches and may reflect anomalies due to
noise or outliers.
6.11.4 MODULE-04:

6.11.4.1 SUPPORT VECTOR MACHINES (SVM):

A classifier that categorizes the data set by setting an optimal hyper plane
between data. I chose this classifier as it is incredibly versatile in the number of
different kernelling functions that can be applied and this model can yield a high
predictability rate. Support Vector Machines are perhaps one of the most popular
and talked about machine learning algorithms. They were extremely popular
around the time they were developed in the 1990s and continue to be the go-to
method for a high-performing algorithm with little tuning.

 How to disentangle the many names used to refer to support vector


machines.
 The representation used by SVM when the model is actually stored on disk.
 How a learned SVM model representation can be used to make predictions for
new data.
 How to learn an SVM model from training data.
 How to best prepare your data for the SVM algorithm.
 Where you might look to get more information on SVM.

6.11.4.2 RANDOM FOREST:

Random forests or random decision forests are an ensemble learning method for
classification, regression and other tasks, that operate by constructing a multitude
of decision trees at training time and outputting the class that is the mode of the
classes (classification) or mean prediction (regression) of the individual
trees. Random decision forests correct for decision trees’ habit of over fitting to
their training set. Random forest is a type of supervised machine learning
algorithm based on ensemble learning. Ensemble learning is a type of learning
where you join different types of algorithms or same algorithm multiple times to
form a more powerful prediction model. The randomforest algorithm combines
multiple algorithm of the same type i.e. multiple decision trees, resulting in
a forest of trees, hence the name "Random Forest". The random forest algorithm
can be used for both regression and classification tasks.
The following are the basic steps involved in performing the random forest
algorithm:

 Pick N random records from the dataset.


 Build a decision tree based on these N records.
 Choose the number of trees you want in your algorithm and repeat steps 1 and
2.
 In case of a regression problem, for a new record, each tree in the forest
predicts a value for Y (output). The final value can be calculated by taking the
average of all the values predicted by all the trees in forest. Or, in case of a
classification problem, each tree in the forest predicts the category to which the
new record belongs. Finally, the new record is assigned to the category that wins
the majority vote.

6.11.5 MODULE-05:

Tkinter is a python library for developing GUI (Graphical User Interfaces). We


use the tkinter library for creating an application of UI (User Interface), to create
windows and all other graphical user interface and Tkinter will come with Python
as a standard package, it can be used for security purpose of each users or
accountants. There will be two kinds of pages like registration user purpose and
login entry purpose of users.

Parameter calculations:

Accuracy calculation:

False Positives (FP): A person who will pay predicted as defaulter. When actual
class is no and predicted class is yes. E.g. if actual class says this passenger did
not survive but predicted class tells you that this passenger will survive.
False Negatives (FN): A person who default predicted as payer. When actual
class is yes but predicted class in no. E.g. if actual class value indicates that this
passenger survived and predicted class tells you that passenger will die.

True Positives (TP): A person who will not pay predicted as defaulter. These are
the correctly predicted positive values which means that the value of actual class
is yes and the value of predicted class is also yes. E.g. if actual class value
indicates that this passenger survived and predicted class tells you the same thing.

True Negatives (TN): A person who default predicted as payer. These are the
correctly predicted negative values which means that the value of actual class is
no and value of predicted class is also no. E.g. if actual class says this passenger
did not survive and predicted class tells you the same thing.

It achieved precision, recall, true positive rate (TPR), and false positive rate
(FPR) for each classification techniques as it is shown in the above tables and also
achieved different interesting confusion matrix for each classification techniques
and we can see the classification performance of each classifiers by the help of
confusion matrix. We use a confusion matrix to compute the accuracy rate of each
severity class. For each class, it demonstrates how instances from that class
receive the various classifications. Here in the next table we have shown instances
that are correctly classified and incorrectly classified in accordance with overall
accuracy of each classification techniques. All classifiers perform similarly well
with respect to the number of correctly classified instances.

Comparing Algorithm with prediction in the form of best accuracy result:

It is important to compare the performance of multiple different machine learning


algorithms consistently and it will discover to create a test harness to compare
multiple different machine learning algorithms in Python with scikit-learn. It can
use this test harness as a template on your own machine learning problems and
add more and different algorithms to compare. Each model will have different
performance characteristics. Using resampling methods like cross validation, you
can get an estimate for how accurate each model may be on unseen data. It needs
to be able to use these estimates to choose one or two best models from the suite
of models that you have created. When have a new dataset, it is a good idea to
visualize the data using different techniques in order to look at the data from
different perspectives. The same idea applies to model selection. You should use a
number of different ways of looking at the estimated accuracy of your machine
learning algorithms in order to choose the one or two to finalize. A way to do
this is to use different visualization methods to show the average accuracy,
variance and other properties of the distribution of model accuracies.

Fig 21: Type Of Patients Vs Tumor Size Of Each Patient

In the next section you will discover exactly how you can do that in Python with
scikit-learn. The key to a fair comparison of machine learning algorithms is
ensuring that each algorithm is evaluated in the same way on the same data and it
can achieve this by forcing each algorithm to be evaluated on a consistent test
harness.

In the example below 4 different algorithms are compared:

 Logistic Regression
 Random Forest
 Decision tree
 Support Vector Machines
 Now, the dimensions of new features in a numpy array called ‘n’ and it want to
predict the species of this features and to do using the predict method which takes
this array as input and spits out predicted target value as output.
 So, the predicted target value comes out to be 0. Finally to find the test score
which is the ratio of no. of predictions found correct and total predictions made
and finding accuracy score method which basically compares the actual values of
the test set with the predicted values.

6.11.5.1 SENSITIVITY:

Sensitivity is a measure of the proportion of actual positive cases that got


predicted as positive (or true positive). Sensitivity is also termed as Recall. This
implies that there will be another proportion of actual positive cases, which would
get predicted incorrectly as negative (and, thus, could also be termed as the false
negative). This can also be represented in the form of a false negative rate. The
sum of sensitivity and false negative rate would be 1. Let's try and understand this
with the model used for predicting whether a person is suffering from the disease.
Sensitivity is a measure of the proportion of people suffering from the disease
who got predicted correctly as the ones suffering from the disease. In other words,
the person who is unhealthy actually got predicted as unhealthy.

Mathematically, sensitivity can be calculated as the following:

Sensitivity = (True Positive) / (True Positive + False Negative)

The following is the details in relation to True Positive and False Negative used in
the above equation.

 True Positive = Persons predicted as suffering from the disease (or unhealthy)
are actually suffering from the disease (unhealthy); In other words, the true
positive represents the number of persons who are unhealthy and are predicted as
unhealthy.
 False Negative = Persons who are actually suffering from the disease (or
unhealthy) are actually predicted to be not suffering from the disease (healthy). In
other words, the false negative represents the number of persons who are
unhealthy and got predicted as healthy. Ideally, we would seek the model to have
low false negatives as it might prove to be life-threatening or business threatening.

The higher value of sensitivity would mean higher value of true positive and
lower value of false negative. The lower value of sensitivity would mean lower
value of true positive and higher value of false negative. For healthcare and
financial domain, models with high sensitivity will be desired.

6.11.5.2 SPECIFICITY:

Specificity is defined as the proportion of actual negatives, which got predicted as


the negative (or true negative). This implies that there will be another proportion
of actual negative, which got predicted as positive and could be termed as false
positives. This proportion could also be called a false positive rate. The sum of
specificity and false positive rate would always be 1. Let's try and understand this
with the model used for predicting whether a person is suffering from the disease.
Specificity is a measure of the proportion of people not suffering from the disease
who got predicted correctly as the ones who are not suffering from the disease. In
other words, the person who is healthy actually got predicted as healthy is
specificity.

Mathematically, specificity can be calculated as the following:

Specificity = (True Negative) / (True Negative + False Positive)

The following is the details in relation to True Negative and False Positive used in
the above equation.

 True Negative = Persons predicted as not suffering from the disease (or
healthy) are actually found to be not suffering from the disease (healthy); In other
words, the true negative represents the number of persons who are healthy and are
predicted as healthy.
 False Positive = Persons predicted as suffering from the disease (or unhealthy)
are actually found to be not suffering from the disease (healthy). In other words,
the false positive represents the number of persons who are healthy and got
predicted as unhealthy.

The higher value of specificity would mean higher value of true negative and
lower false positive rate. The lower value of specificity would mean lower value
of true negative and higher value of false positive.

6.11.5.3 PREDICTION RESULT BY ACCURACY:

Logistic regression algorithm also uses a linear equation with independent


predictors to predict a value. The predicted value can be anywhere between
negative infinity to positive infinity. We need the output of the algorithm to be
classified variable data. Higher accuracy predicting result is logistic regression
model by comparing the best accuracy.

True Positive Rate(TPR) = TP / (TP + FN)

False Positive rate(FPR) = FP / (FP + TN)

Accuracy: The Proportion of the total number of predictions that is correct


otherwise overall how often the model predicts correctly defaulters and non-
defaulters.

Accuracy calculation:

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Accuracy is the most intuitive performance measure and it is simply a ratio of


correctly predicted observation to the total observations. One may think that, if we
have high accuracy then our model is best. Yes, accuracy is a great measure but
only when you have symmetric datasets where values of false positive and false
negatives are almost same.

Precision: The proportion of positive predictions that are actually correct. (When
the model predicts default: how often is correct?)

Precision = TP / (TP + FP)

Precision is the ratio of correctly predicted positive observations to the total


predicted positive observations. The question that this metric answer is of all
passengers that labeled as survived, how many actually survived? High precision
relates to the low false positive rate. We have got 0.788 precision which is pretty
good.

Recall: The proportion of positive observed values correctly predicted. (The


proportion of actual defaulters that the model will correctly predict)

Recall = TP / (TP + FN)

Recall (Sensitivity) - Recall is the ratio of correctly predicted positive


observations to the all observations in actual class - yes.

F1 Score is the weighted average of Precision and Recall. Therefore, this score
takes both false positives and false negatives into account. Intuitively it is not as
easy to understand as accuracy, but F1 is usually more useful than accuracy,
especially if you have an uneven class distribution. Accuracy works best if false
positives and false negatives have similar cost. If the cost of false positives and
false negatives are very different, it’s better to look at both Precision and Recall.

General Formula:

F- Measure = 2TP / (2TP + FP + FN)

F1-Score Formula:

F1 Score = 2*(Recall * Precision) / (Recall + Precision)


CHAPTER 7

Applications

7.1 SOFTWARE INVOLVEMENT STEPS:


Mobile and satellite communication application:

Mobile communication requires small, low-cost, low profile antennas. Microstrip


patch antenna meets all requirements and various types of microstrip antennas have been
designed for use in mobile communication systems. In case of satellite communication
circularly polarized radiation patterns are required and can be realized using either square or
circular patch with one or two feed points.

Global Positioning System applications:

Nowadays microstrip patch antennas with substrate having high permittivity sintered
material are used for global positioning system. These antennas are circularly polarized, very
compact and quite expensive due to its positioning. It is expected that millions of GPS
receivers will be used by the general population for land vehicles, aircraft and maritime
vessels to find there position accurately

Radio Frequency Identification (RFID):

RFID uses in different areas like mobile communication, logistics, manufacturing,


transportation and health care [2]. RFID system generally uses frequencies between 30 Hz
and 5.8 GHz depending on its applications. Basically RFID system is a tag or transponder
and a transceiver or reader.

Worldwide Interoperability for Microwave Access (WiMax):

The IEEE 802.16 standerd is known as WiMax. It can reach upto 30 mile radius
theoretically and data rate 70 Mbps. MPA generates three resonant modes at 2.7, 3.3 and 5.3
GHz and can, therefore, be used in WiMax compliant communication equipment.

Radar Application:

Radar can be used for detecting moving targets such as people and vehicles. It
demands a low profile, light weight antenna subsystem, the microstrip antennas are an ideal
choice. The fabrication technology based on photolithography enables the bulk production of
microstrip antenna with repeatable performance at a lower cost in a lesser time frame as
compared to the conventional antennas.

Rectenna Application:

Rectenna is a rectifying antenna, a special type of antenna that is used to directly


convert microwave energy into DC power. Rectenna is a combination of four subsystems i.e.
Antenna, ore rectification filter, rectifier, post rectification filter. in rectenna application, it is
necessary to design antennas with very high directive characteristics to meet the demands of
long-distance links. Since the aim is to use the rectenna to transfer DC power through
wireless links for a long distance, this can only be accomplished by increasing the electrical
size of the antenna.

Telemedicine Application:

In telemedicine application antenna is operating at 2.45 GHz. Wearable microstrip


antenna is suitable for Wireless Body Area Network (WBAN). The proposed antenna
achieved a higher gain and front to back ratio compared to the other antennas, in addition to
the semi directional radiation pattern which is preferred over the omni-directional pattern to
overcome unnecessary radiation to the user's body and satisfies the requirement for on-body
and off-body applications. A
antenna having gain of 6.7 dB and a F/B ratio of 11.7 dB and resonates at 2.45GHz is
suitable for telemedicine applications.

Medicinal applications of patch:

It is found that in the treatment of malignant tumors the microwave energy is said to
be the most effective way of inducing hyperthermia. The design of the particular radiator
which is to be used for this purpose should posses light weight, easy in handling and to be
rugged. Only the patch radiator fulfils these requirements. The initial designs for the
Microstrip radiator for inducing hyperthermia was based on the printed dipoles and annular
rings which were designed on S-band. And later on the design was based on the circular
microstrip disk at L-band. There is a simple operation that goes on with the instrument; two
coupled Microstrip lines are separated with a flexible separation which is used to measure
the temperature inside the human body. A flexible patch applicator can be seen in
the figure below which operates at 430 MHz.
CHAPTER 7

RESULT AND DISCUSSION

7.2 SOFTWARE INVOLVEMENT STEPS:


Fig 22: Open The Anaconda Navigator

Fig 23: Launch The Jupyter Notebook Platform


Fig 24: Open The Correspondent Result Folder

7.3 OUTPUT SCREENSHOTS:

7.3.1 INPUT:
7.3.2 OUTPUT:
Test-01:
Test-02:

Test-03:
CHAPTER 8

CONCLUSION AND FUTURE WORK

8.1 CONCLUSION
The analytical process started from data cleaning and processing, missing
value, exploratory analysis and finally model building and evaluation. Finding
the patient stages and grade with parameter like accuracy, classification report
and confusion matrix on public test set of given attributes by supervised machine
learning algorithm method.

8.2 FUTURE WORK

 Remaining SMLT algorithms will be involve to finding the best accuracy with
applying ensemble method to predict the patient stages
 Hospital wants to automate the detecting of the breast cancer from eligibility
process (real time) based on the account detail.
 To automate this process by show the prediction result in web application or
desktop application.
 To optimize the work to implement in Artificial Intelligence environment.
CHAPTER 9

REFERENCES

(1). W. A. Awan, "Very small form factor with ultra wide band rectangular patch
antenna for 5G applications," 2018 International Conference on Computing,
Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan.
(2). S. K. Sharma, T. E. Bogale, L. B. Le, S. Chatzinotas, X. Wang and B. Ottersten,
"Dynamic Spectrum Sharing in 5G Wireless Networks With Full-Duplex
Technology: Recent Advances and Research Challenges," in IEEE
Communications Surveys & Tutorials, vol. 20, no. 1, pp. 674707, Firstquarter
2018.
(3). W. A. Awan, Halima, A. Zaidi, N. Hussain, S. Khalid and A. Baghdad,
“Characterization of dual band MIMO antenna for 25 GHz and 50 GHz
applications,” 2018 International Conference on Computing, Electronic and
Electrical Engineering (ICE Cube), Quetta, 2018, pp. 1-4.
(4). W. A. Awan, A. Zaidi, N. Hussain, S. Khalid, Halima, and A. Baghdad,
"Frequency Reconfigurable patch antenna for millimeter wave applications,"
2019 International Conference on Computing, Mathematics and Engineering
Technologies (iCoMET), Sukkur, 2019, pp. 1-5
(5). M. K. Shereen, M. I. Khattak, M. Shafi and N. Saleem, "Slotted Yshaped
millimeter wave reconfigurable antenna for 5G applications," 2018 International
Conference on Computing, Mathematics and Engineering Technologies
(iCoMET), Sukkur, 2018, pp. 1-5.
(6). CST Microwave Studio, ver. 2017, CST, Framingham, MA, USA, 2017
(7). Zaidi, A. Baghdad, A. Ballouk and A. Badri, "Design and optimization of an
inset fed circular microstrip patch antenna using DGS structure for applications
in the millimeter wave band," 2016 International Conference on Wireless
Networks and Mobile Communications (WINCOM), Fez, 2016, pp. 99-10
(8). James j., and P.S. Hall (Eds), Handbook of microstrip antenna, Peter Peregrinus,
London, UK, 1989.
(9). Ramesh Garg, Prakash Bartia, Inder Bahl, Apisak Ittipiboon, ‘’Microstrip
Antenna Design Handbook’’, 2001, pp 1‐68, 253‐316 Artech House Inc.
Norwood, MA.
(10). Wentworth M. Stuart (2005), ‘’Fundamentals of Electromagnetic with
Engineering Applications’’, pp 442‐445, John Wiley & Sons, NJ, USA.
(11). J. D. Kraus, R. J. Marhefka, “Antenna for all applications” 3rd Ed.,
McGraw- Hill, 2002.
(12). Robert A. Sainati, CAD of Microstrip Antennas for Wireless Applications,
Artech House Inc, Norwood, MA, 1996.
(13). C.A. Balanis, Antenna theory: analysis and design, 2nd ed., John Willey and
& Son, Inc., 1997
(14). Y. Schols and G. A. E. Vandenbosch, “Separation of horizontal and vertical
dependencies in a surface/volume integral equation approach to model quasi 3-D
structures in multilayered media ”, IEEE Trans. Antennas Propagat., vol. 55, no.
4, pp. 1086-1094, April 2007.
(15). G. D. Braaten, R. M. Nelson, M. A. Mohammed, “Electric field integral
equations for electromagnetic scattering problems with electrically small and
electrically large regions“, IEEE Trans Antennas Propag, Vol. 56, No. 1, pp.
142-150, Jan. 2008.
. Conclusion
A theoretical survey on microstrip patch antenna is presented in this paper. Some effect of
disadvantages can be minimized. Lower gain and low power handling capacity can be
overcome through an array configuration. Some factors are involved in the selection of
feeding technique. Particular microstrip patch antenna can be designed for each
application and different merits are compared with conventional microwave antenna.
5. CONCLUSION Awideband E-shaped microstrip patch antenna has been designed for high-speed wireless
communication systems. The reflection coefficient is below −10 dB from 5.05 GHz to 5.88 GHz. The
performance is more than meeting the demanding bandwidth specification to cover the 5.15– 5.825 GHz
frequency band. At the same time, the antenna is thin and compact with the use of low dielectric constant
substrate material. These features are very useful for worldwide portability of wireless communication
equipment. The parametric study provides a good insight on the effects of various dimensional parameters. It
provides guidance on the design and optimization of E-shaped microstrip patch antenna. By locating the feed
point at the base rather than the tip of the center arm, the resonant frequency of the second resonant mode can
be tuned without affecting the resonant frequency of the fundamental resonant mode. The bandwidth can be
easily tuned by trimming the length of the center arm. Excellent agre

You might also like