0% found this document useful (0 votes)
14 views63 pages

Data Science Ml

Uploaded by

laxmiparthive
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views63 pages

Data Science Ml

Uploaded by

laxmiparthive
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

DATA SCIENCE

ARTIFICIAL INTELLIGENCE
MACHINE LEARNING
DATA SCIENCE
3

WHAT IS DATA SCIENCE?


Data science is an interconnected field that involves
the use of statistical and computational methods to
extract insightful information and knowledge from
data. Data Science is simply the application of
specific principles and analytic techniques to extract
information from data used in planning, strategic ,
decision making, etc.
DATA SCIENCE IS THE PROCESS OF USING DATA TO FIND SOLUTIONS/TO PREDICT
OUTCOMES FOR A PROBLEM STATEMENT
4
The term “data science” combines two key elements: “data” and “science.”

The term “data science” combines two key elements: “data” and “science.”
1.Data: It refers to the raw information that is collected, stored, and processed. In
today’s digital age, enormous amounts of data are generated from various sources
such as sensors, social media, transactions, and more. This data can come in
structured formats (e.g., databases) or unstructured formats (e.g., text, images,
videos).
2.Science: It refers to the systematic study and investigation of phenomena using
scientific methods and principles. Science involves forming hypotheses, conducting
experiments, analyzing data, and drawing conclusions based on evidence

When we put these two elements together, “data+science” refers to the


scientific study of data
Essentially, data science is about using scientific methods to unlock the potential of
data, uncover patterns, make predictions, and drive informed decision-making
across various domains and industries.
5

EXAMPLE
Imagine you’re scrolling through your favorite social media platform, and you
notice that certain types of posts always seem to grab your attention. Maybe it’s
cute animal videos, delicious food recipes, or inspiring travel photos.
Now, from the platform’s perspective, they want to keep you engaged and coming
back for more. This is where data science comes into play. They collect a ton of
information about what you like, share, and comment on. They use data science
techniques to analyze all this information to understand your preferences better.
For instance, they might notice that you spend more time watching animal videos
than looking at food recipes. Armed with this knowledge, they can then customize
your feed to show you more of what you love – adorable pets! They might even
predict what type of pet video you’re likely to enjoy next based on your past
behavior.
In this scenario, data science is like the magic behind the scenes that helps social
media platforms understand your interests and tailor your experience to keep you
engaged. It’s all about using data to make your online experience more
personalized and enjoyable.
6
Real-world Applications of Data Science

1. In Search Engines
The most useful application of Data Science is Search Engines. As we know when we want
to search for something on the internet, we mostly use Search engines like Google, Yahoo,
DuckDuckGo and Bing, etc. So Data Science is used to get Searches faster.
For Example, When we search for something suppose “Data Structure and algorithm
courses ” then at that time on Internet Explorer we get the first link of most visited
websites. This happens because that website is visited most in order to get information
regarding Data Structure courses and Computer related subjects. So this analysis is done
using Data Science, and we get the Topmost visited Web Links.

2. In Transport
Data Science is also entered in real-time such as the Transport field like Driverless Cars.
With the help of Driverless Cars, it is easy to reduce the number of Accidents.
For Example, In Driverless Cars the training data is fed into the algorithm and with the
help of Data Science techniques, the Data is analyzed like what as the speed limit in
highways, Busy Streets, Narrow Roads, etc. And how to handle different situations while
driving etc.
7
3. In E-Commerce

E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a better
user experience with personalized recommendations.
For Example, When we search for something on the E-commerce websites we get
suggestions similar to choices according to our past data and also we get
recommendations according to most buy the product, most rated, most searched, etc.
This is all done with the help of Data Science.

4. Data Science in Gaming


In most of the games where a user will play with an opponent i.e. a Computer
Opponent, data science concepts are used with machine learning where with the help
of past data the Computer will improve its performance. There are many games like
Chess, EA Sports, etc. will use Data Science concepts.

8. Airline Routing Planning


With the help of Data Science, Airline Sector is also growing like with the help of it, it
becomes easy to predict flight delays. It also helps to decide whether to directly land
into the destination or take a halt in between like a flight can have a direct route from
Delhi to the U.S.A or it can halt in between after that reach at the destination.
8

LYFE CYCLE
9

BENEFITS

 BETTER DECISION MAKING


 IMPROVED CUSTOMER EXPERIENCE
 INCREASED REVENUE
 BETTER FRAUD DETECTION
 IMPROVED HEALTHCARE OUTCOMES
 BETTER ENVIRONMENTAL PROTECTION
 INCREASED EFFECIENCY
10
11

DATA SCIENCE AND ARTIFICIAL INTELLIGENCE

DATA
SCIENCE ML/DL AI

“Data science produces insights. Machine


learning produces predictions”
ARTIFICIAL
INTELLIGENCE
13

WHAT IS ARTIFICIAL INTELLIGENCE(AI)?


Artificial intelligence (AI) is the simulation of human
intelligence in machines that are programmed to think and
act like humans. Learning, reasoning, problem-solving,
perception, and language comprehension are all examples.
Artificial Intelligence is a method of making a computer, a
computer-controlled robot, or a software think intelligently
like the human mind. AI is accomplished by studying the
patterns of the human brain and by analyzing the cognitive
process.
14

TYPES OF ARTIFICIAL INTELLIGENCE


1. Purely Reactive
These machines do not have any memory or data to work with, specializing in just
one field of work. For example, in a chess game, the machine observes the moves
and makes the best possible decision to win.

2. Limited Memory
These machines collect previous data and continue adding it to their memory.
They have enough memory or experience to make proper decisions, but memory
is minimal. For example, this machine can suggest a restaurant based on the
location data that has been gathered.
15

3. Theory of Mind
This kind of AI can understand thoughts and emotions, as well as interact socially.
However, a machine based on this type is yet to be built.

4. Self-Aware
Self-aware machines are the future generation of these new technology. They will
be intelligent, sentient, and conscious.
16

ADVANTAGES DISADVANTAGES
It reduces human error It’s costly to implement
It never sleeps, so it’s It can’t duplicate human
available 24x7 creativity
It never gets bored, so it It will definitely replace some
easily handles repetitive tasks jobs, leading to unemployment
It’s fast People can become overly
reliant on it
17

APPLICATIONS OF AI
 SPEECH RECOGNITION
 IMAGE RECOGNITION
 FRAUD DETECTION
 AUTOMATION
 RECOMENDATIONS
18

EXAMPLES
ChatGPT
ChatGPT is an advanced language model developed by OpenAI, capable of
generating human-like responses and engaging in natural language conversations.
It uses deep learning techniques to understand and generate coherent text,
making it useful for customer support, chatbots, and virtual assistants.

Google Maps
Google Maps utilizes AI algorithms to provide real-time navigation, traffic updates,
and personalized recommendations. It analyzes vast amounts of data, including
historical traffic patterns and user input, to suggest the fastest routes, estimate
arrival times, and even predict traffic congestion.
19

Smart Assistants
Smart assistants like Amazon's Alexa, Apple's Siri, and Google Assistant employ AI
technologies to interpret voice commands, answer questions, and perform tasks.
These assistants use natural language processing and machine learning
algorithms to understand user intent, retrieve relevant information, and carry out
requested actions.

Self-Driving Cars
Self-driving cars rely heavily on AI for perception, decision-making, and control.
Using a combination of sensors, cameras, and machine learning algorithms, these
vehicles can detect objects, interpret traffic signs, and navigate complex road
conditions autonomously, enhancing safety and efficiency on the roads.
20

HOW IS AI USED TODAY?

Machines today can learn from experience, adapt to new inputs,


and even perform human-like tasks with help from artificial
intelligence (AI). Artificial intelligence examples today, from
chess-playing computers to self-driving cars, are heavily based
on deep learning and natural language processing. There are
several examples of AI software in use in daily life, including
voice assistants, face recognition for unlocking mobile phones
and machine learning-based financial fraud detection. AI
software is typically obtained by downloading AI-capable
software from an internet marketplace, with no additional
hardware required.
21

3 TYPES OF AI
1. Artificial Narrow Intelligence (ANI): Also known as Weak AI, it specializes in
performing specific tasks and lacks general cognitive abilities.

2. Artificial General Intelligence (AGI): Refers to Strong AI capable of


understanding, learning, and applying knowledge across various domains, similar
to human intelligence.

3. Artificial Superintelligence (ASI): Hypothetical AI surpassing human intelligence


in all aspects, potentially capable of solving complex problems and making
advancements beyond human comprehension.
WHAT ARE THE 7 MAIN AREAS OF AI? 22

 Machine Learning (ML):


•Supervised Learning: Algorithms that learn from labeled data to make predictions.
•Unsupervised Learning: Algorithms that find hidden patterns in unlabeled data.
•Reinforcement Learning: Algorithms that learn optimal actions through trial and error.

 Natural Language Processing (NLP):


•Text Analysis: Understanding and processing human language in text form.
•Speech Recognition: Converting spoken language into text.
•Machine Translation: Translating text from one language to another.

 Computer Vision:
•Image Recognition: Identifying objects, people, or scenes in images.
•Image Generation: Creating new images from scratch or based on existing
ones.
•Video Analysis: Understanding and processing video content.
Expert Systems: 23

•Rule-Based Systems: Using predefined rules to make decisions or solve problems.


•Knowledge Representation: Storing and manipulating knowledge about the world

 Neural Networks and Deep Learning:


•Convolutional Neural Networks (CNNs): Primarily used in image processing tasks.
•Recurrent Neural Networks (RNNs): Used for sequence data like time series or natural
language.

 Planning and Scheduling:


•Task Scheduling: Allocating resources and planning tasks to achieve specific goals.
•Path Planning: Finding the optimal route or sequence of actions to reach a destination.

 Robotics:
•Autonomous Navigation: Enabling robots to navigate environments without human
intervention.
•Manipulation and Control: Allowing robots to interact with and manipulate objects.
•Human-Robot Interaction: Enhancing the ways robots and humans work together.
MACHINE LEARNING
WHAT IS MACHINE 25

LEARNING(ML)?
MACHINE LEARNING IS AN APPLICATION OF AI THAT PROVIDES SYSTEMS THE ABILITY TO
AUTOMATICALLY LEARN AND IMPROVE FROM EXPERIENCE WITHOUT BEING EXPLICITY
PROGRAMMED
LEARNS
MACHINE + LEARNING

ORDINARY MACHINE
WITH AI PREDICTS
SYSTEM LEARNING

IMPROVES
26

MACHINE LEARNING PROCESS

INPUT DATA ANALYZE DATA FIND PATTERNS

LEARNS FROM
PREDICTIONS/
THE
DECISION
FEEDBACK
TYPES OF ML MACHINE LEARNING
27

REINFORCEMEN
SUPERVISEED UNSUPERVISE SEMI
T
LEARNING D LEARNING SUPERVISED
LEARNING
THE MACHINE NON-LABELED THE MACHINE LEARNING
LEARNS FROM THE TRAINING DATA LEARNS ON ITS
IS A BRANCH OF ML
TRAINING DATA THAT OWN
THAT COMBINES
IS LABELED
SUPERVISED &
CLASSIFICAT UNSUPERVISED BY
REGRESSION CLUSTERING USING BOTH
ION LABELED AND
UNLABELED DATA
KNN TO TRAIN AI FOR
DECISION TREE LINEAR REGRESSION K-MEAN
CLASSIFICATION &
REGRESSION
LOGISTIC REGRESSION
RANDOM FOREST
SUPPORT VECTOR MACHINE
28

SUPERVISED LEARNING
NEW
KNOWN DATA RESPONSE

ITS AN
APPLE
MODEL

KNOWN
RESPONSE

THESE ARE
APPLES

NEW DATA
UNSUPERVISED LEARNING 29

I CAN
SEEN A
PATTERN

MODEL

INPUT DATA
REINFORCEMENT LEARNING REINFORCED
30

RESPONSE

WRONG! IT’S AN
IT’S A ITS AN NOTED! APPLE
MANGO APPLE

INPUT
RESPONSE FEEDBACK LEARNS

INPUT
SUPERVISED VS UNSUPERVISED
LABELED DATA

DIRECT FEEDBACK

SUPERVISED
PREDICT OUTPUT

UNSUPERVISED

NON-LABELED DATA

NO FEEDBACK

FIND HIDDED
STRUCTURE IN DATA
32

TYPES OF SUPERVISED LEARNING


1. CLASSIFICATION - USED WHEN THE OUTPUT IS CATEGORICAL LIKE “YES”
OR “NO”.
IS LIKE PREDICTING ITEMS INTO DIFFERENT
CATEGORIES OR
CLASSES.THE OUTPUT IS TYPICALLY A LABEL OR
CATEGORY
DISCRETE – FINITE NUMBER OFABOVE
PREDICTIONS
AVERAGE
THAT CAN MADE SO
EXAMPLE : SEXBELOW AVERAGE
- FEMALE OR MEN AVERAGE
33

A. K NEAREST NEIGHBOURS
(KNN)

KNN ALGORITHM WORKS IN A WAY THAT A NEW DATA POINTS IS ASSIGNED TO A


NEIGHBOUR GROUP IT IS MOST SIMILAR TO
IN KNN ,K CAN BE AN INTEGER GREATER THAN 1 , SO FOR EVERY NEW DATA POINT
WE WANT TO CLASSIFY, WE COMPUTE TO WHICH NEIGHBOURING GROUP IT IS
CLOSEST TO
EXAMPLE: 34

NOW WE HAVE A
NEW DATA
POINT(BLACK
BALL)
COST

COST
WE WILL NOW
TRY TO
CLASSIFY THIS
USING KNN

DURABILITY
DURABILITY
CONSIDER THERE ARE 3 LET K = 5
CLUSTERS SO THESE GLOWING ONES ARE THE CLOSEST 5
1.FOOTBALL BALLS TO OUR NEW DATA POINT(BLACK BALL)
THUS WE WILL CLASSIFY THE BLACK BALL AS A
2.BASKETBALL TENNIS BALL
3.TENNISBALL THUS OUR NEW DATA POINT IS A TENNIS BALL
B.DECISION TREE 35

A DECISION TREE IS A GRAPH THAT USES A BRANCHING METHOD TO ILLUSTRATE


EVERY POSSIBLE OUTCOME OF A DECISION

IS IT SUNNY

YE NO
S
IS IT
GO SWIM
RAINING

YES NO
STAY WALK THE
INDOORS DOG
36

D. LOGISTIC REGRESSION
LOGISTIC REGRESSION IS USED FOR BINARY CLASSIFICATION WHERE WE USE SIGMOID
FUNCTION, THAT TAKES INPUT AS INDEPENDENT VARIABLES AND PRODUCES A
PROBABILITY VALUE BETWEEN 0 AND 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 then it belongs to Class 1 otherwise it belongs
to Class 0. It’s referred to as regression because it is the extension of linear
regression but is mainly used for classification problems.
EXAMPLE 1 : WHETHER A DISEASES IS PRESENT OR NOT?

AGE WE COULD STUDY THE


INFLUENCE OF
GENDER AGE,GENDER,SMOKIN

SMOKING
SO,WITH THE HELP OF LOGISTIC
REGRESSION,WE CAN DETERMINE
WHAT HAS AN INFLUENCE ON
WHETHER A CERTAIN DISEASES IS
EXAMPLE 2 : WHETHER A PERSON BUYS OR DOESN’T BUY A PARTICULAR 37

PRODUCT?
YES NO

KEY POINTS:
•LOGISTIC REGRESSION PREDICTS THE OUTPUT OF A CATEGORICAL DEPENDENT
VARIABLE. THEREFORE, THE OUTCOME MUST BE A CATEGORICAL OR DISCRETE VALUE.
[exact value , the number of students in a classroom, the roll of a die, or the number of
cars in a parking lot.]
•IT CAN BE EITHER YES OR NO, 0 OR 1, TRUE OR FALSE, ETC. BUT INSTEAD OF GIVING
THE EXACT VALUE AS 0 AND 1, IT GIVES THE PROBABILISTIC VALUES WHICH LIE
BETWEEN 0 AND 1.
•IN LOGISTIC REGRESSION, INSTEAD OF FITTING A REGRESSION LINE, WE FIT AN “S”
SHAPED LOGISTIC FUNCTION, WHICH PREDICTS TWO MAXIMUM VALUES (0 OR 1).
38

TYPES OF LOGISTIC REGRESSION

ON THE BASIS OF THE CATEGORIES, LOGISTIC REGRESSION CAN BE CLASSIFIED INTO


THREE TYPES:
1.BINOMIAL: IN BINOMIAL LOGISTIC REGRESSION, THERE CAN BE ONLY TWO POSSIBLE
TYPES OF THE DEPENDENT VARIABLES, SUCH AS 0 OR 1, PASS OR FAIL, ETC.
2.MULTINOMIAL: IN MULTINOMIAL LOGISTIC REGRESSION, THERE CAN BE 3 OR MORE
POSSIBLE UNORDERED TYPES OF THE DEPENDENT VARIABLE, SUCH AS “CAT”, “DOGS”,
OR “SHEEP”
3.ORDINAL: IN ORDINAL LOGISTIC REGRESSION, THERE CAN BE 3 OR MORE POSSIBLE
ORDERED TYPES OF DEPENDENT VARIABLES, SUCH AS “LOW”, “MEDIUM”, OR “HIGH”.
39
LET SEE OUR GOAL IS TO PREDICT IF THE PERSON IS HAVING A HEART DISEASE OR
NOT BASED ON OUR INPUT FEATURE WHICH IS AGE
So ,now how can you make new predictions based on this
1 given data set one thing we can do is draw an approximate
Heart diseases

straight line that fits this data set and we can consider that
straight line as the probability if the person is having a
0.5 heart diseases or not .Based on this straight line what we
can do is we can predict if the person is having heart
diseases or not. for a new input features for lets say if you
want to find if the person having 75 age has a diseases or
not then what we can do is we can go to this line (mark
0 point from x -axis pink colour) and see what is the value of
this (mark point from y -axis pink colour). so this value will
25 50 serve as the probability for us (mark point blue colour), so
75 age this much is the probability the person having heart
1 diseases and if this probability is about 0.5 then we will
classify it as 1 or if the probability below 0.5 we will classify
it as 0.But you can see that the straight line cannot actually
diseases

0.5 fit properly to this data set you will see a lot of gaps.Also
lets say if you want to predict if the person having 50 age
Heart

has a heart diseases or not then this method will give us


actually the wrong value because its classifying 0 .how to
overcome from this... One thing we can do is to draw a
0 curve that actually fits appropriately to this data set .So
age you can see that this curve actually fits more to the data
set and it is more likely to give us the right predictions but
40
SIGMOID CURVE
Sigmoid function is a mathematical function that
has an “S”-shaped curve (sigmoid curve). It is
widely used in various fields, including machine
learning, statistics, and artificial intelligence,
particularly for its smooth and bounded nature.
The sigmoid function is often used to introduce
non-linearity in models, especially in neural
networks.

FORMULA FOR THIS CURVE:

STRAIGHT LINE FORMULA: APPLYING


ON SIGMOID CURVE:
41

E.RANDOM FOREST
EXAMPLE : You are a student who wants to choose A college graduation course
RIGHT after your high school now you are confused about which course to take
based on your strength and interest. So you ask for advice from different people
like teachers, parents, degree student and working people. Now one person with
their on way of thinking might focus on one aspect while other look at the other
factors this group of people each providing relating their own individual opinion,
can help you make a more balanced & reliable decision

RANDOM FOREST ALGORITHM IS LIKE THAT GROUP OF PEOPLE


OR
RANDOM FOREST ALGORITHM COMBINES INDIVIDUAL OPINION OF
COLLECTIVE
GROUP OF PEOPLE AND BASED ON THIS MAKES BETTER
PREDICTION
42

TRAINING SET
TRAINING
DATA 1
TRAINING
DATA 2 ... TRAINING
DATA n

DECISION
TREE 1
DECISION
TREE 2 ... DECISION
TREE n

Classification problem – VOTING


majority (AVERAGING)
Regression problem -
consider the average of
these particular probabilities
PREDICTION
43

STEPS:

Tree-1 Tree – 2
Tree-3

CLASS-
CLASS-A CLASS-A
B
MAJORITY-VOTING

FINAL-CLASS
44

STRENGTHS WEEKNESS
 It take less training time as  Random forest is that when used for
compared to other algorithms’ regression they cannot predict
beyond the range in the training
 it predit output with high data, and that they may over-fit
accuracy, even for the large datasets that are particularly noisy
dataset it runs efficiently.  The sizes of the model created by
 It can also maintain accuracy random forests may be very large. It
may take hundreds of megabytes of
when a last proportion of data memory and maybe slow to
is missing evaluate
45

F.SUPPORT VECTOR MACHINE


 Support vector machine or SVM is one of the most popular supervised learning
algorithmS. which is used for Classification as well as Regression problems.
 However, primarily it is used for classification problem in machine learning.
 The goal of the SVM algorithm is to create the best line of decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future .
 SVM algorithm can be used for face detection,image classification,text
categorization etc.
46

KEY CONCEPTS OF SVM:


SUPPORT VECTORS - Data points that are closest to the hyperplane is called
support vector separating line will be defined with the help of the data points.
HYPERPLANE - It is a decision plane or space which is divide between a set of
objects having different classes.
Margin - It may be define as the gap between two lines on the closest data
points of different classes. It can be calculated as the perpendicular distance
from the line to the support vectors. LARGE MARGIN is considered as a Good
Margin and SMALL MARGIN is considered as a BAD MARGIN
47

TYPES OF SVM
 LINEAR SVM - Linear SVM is used for linearly separable data
which means if a dataset can be classified into two classes by
using a single straight line, then such data is termed as linearly
separable data and classifier is used called as LINEAR SVM
CLASSIFIER
 NON-LINEAR SVM - Non linear SVM is used for non linearly
separated data, which means if a dataset cannot be classified by
using a straight line, then such data is termed as non-linear data
and classifier used is called as NON-LINEAR SVM CLASSIFIER
DATA IS LINEARLY SEPARABLE NON-LINEAR SEPARABLE
48

IN THIS CAS WE CANNOT FIND A STRAIGHT


LINE TO SEPARATE.
WE WILL USE THE KERNAL TRICK

KERNAL

IT CONVERTS LOW DIMENSION SPACE INTO HIGH


DIMENSION SPACE.
TO SEPARATE NON-LINEAR DATA KERNAL IS
USED
49

2. REGRESSION – USED WHEN A VALUE NEEDS TO BE PREDICTED LIKE THE “STOCK


PRICES”.
IS LIKE ESTIMATING A QUANTITY OR A VALUE
CONTINOUS – NUMERIC AND COULD TAKE INFINITE VALUE
EXAMPLE : AGE – 0.5YEAR
$289,00

$ 284,00 $304,00
50

A. LINEAR REGRESSION

LINEAR REGRESSION IS USED TO PREDICT THE VALUE OF A VARIABLE BASED ON


THE VALUE OF ANOTHER VARIABLE. THE VARIABLE YOU WANT TO PREDICT IS
CALLED THE DEPENDENT VARIABLE. THE VARIABLE YOU ARE USING TO PREDICT
THE OTHER VARIABLE'S VALUE IS CALLED THE INDEPENDENT VARIABLE
REGRESSION IS A PROCESS OF DTERMINING A RELATIONSHIP BETWEEN ONE OR
MORE INDEPENDENT VARIABLE AND ONE DEPENDENT OR OUTPUT VARIABLE

THE GOAL OF LINEAR REGRESSION IS TO FIND A STRAIGHT LINE


51
EXAMPLE

 LET US CONSIDER AN EXAMPLE WHERE THE FIVE WEEK’S SALES DATA IS GIVEN
AS SHOWN IN TABLE
 APPLY LINEAR REGRESSION TECHNIQUE TO PREDICT THE 7TH AND 12TH WEEK
SAKLES
4 .
Y - INDEPENDENT

3.5
. (Week) (Sales In
Thousand)
2.5
. 1
2
1.2
1.8
2
. 3 2.6
1.5 4 3.2
1 5 3.8
1 2 3
4 X-5INDEPENDENT
52
LINEAR REGRESSION EQUATION IS GIVEN BY :

WHERE,

(Sales In
(Week) Thousan
d)
1 1.2 1 1.2
2 1.8 4 3.6
3 2.6 9 7.8
4 3.2 16 12.8
5 3.8 25 19
SUM 15 12.6 55 44.4
AVERAGE =3 = 2.56 = 11 = 8.88
53

𝑥=3 𝑦 =2.52 𝑥𝑦= 8.88 𝑥2 =11

( 𝑥𝑦 ) − ( 𝑥 ) ( 𝑦 ) REGRESSION EQUATION
𝑎 1= = 88.8 – 3 * = 0.66
2
𝑥 −𝑥
2 IS:
2.5211 − 3
2

𝑎 0= 𝑦 −𝑎 1 ∗ 𝑥 = 2.52 – 0.66 * 3 = 𝑦 =0.54 + 0.66 ∗ 𝑥


0.54

THE PREDICTED 7TH WEEK SALE (WHEN X =


7)𝑦IS ,
=0.54 + 0.66 ∗7=5.16

THE PREDICTED 12TH WEEK SALE (WHEN X =


12) IS ,
54

2 TYPES OF LINEAR REGRESSION


1.SIMPLE LINEAR REGRESSION : THE SIMPLE LINEAR
REGRESSION USES ONLY ONE INDEPENDENT VARIABLE TO
INFER OR PREDICT THE DEPENDENT VARIABLE

INDEPENDENT VARIABLE SALARY GIVING


BASED ON WORKING
HOURS
(1 VARIABLE)

DEPENDENT VARIABLE
55

2. MULTIPLE LINEAR REGRESSION : IN A MULTIPLE LINEAR


REGRESSION SEVERAL INDEPENDENT VARIABLES ARE USED
TO INFER OR PREDICT THE DEPENDENT VARIABLE
INDEPENDENT VARIABLE

SALARY GIVING BASED


ON WORKING HOURS
AND AGE
(2 VARIABLE)

DEPENDENT VARIABLE
56
REGRESSION VALUE IS NOT REVERSIBLE : REGRESSION
EQUATION USED TO PREDICT VALUE OF Y FROM A GIVEN VALUE OF
X CANNOT BE USED TO PREDICT THE VALUES OF X FROM GIVEN
VALUES OF Y

INDEPENDENT/EXPLANATORY/PREDICTOR VARIABLE
USUALLY DENOTED BY X.(THE VARIABLE WE USE FOR PREDICTIONS)
IT HELPS IN PREDICTING THE VALUE OF OTHER VARIABLES

DEPENDENT/RESPONSE/PREDICTED VARIABLE
USUALLY DENOTED BY Y.(THE VARIABLE WE WANT TO INFER OR
PREDICT).IT IS THE ONE WHICH IS BEING PREDICTED BY OTHER
VARIABLE
57

EXAMPLE:
 THE YIELD OF WHEAT DEPENDS ON AMOUNT OF FERTILIZER
USED. HERE FERTILIZER IS INDEPENDENT VARIABLE AND
YIELD OF WHEAT IS DEPENDENT VARIABLE.
 FIELDS OF PRODUCT DEPENDS ON AMOUNT OF
ADVERTISING. HERE ADVERTISING IS INDEPENDENT
VARIABLE AND SALES IS DEPENDENT VARIABLE
58

TYPES OF UNSUPERVISED
LEARNING
CLUSTERING : THE TASK OF GROUPING DATA POINTS
BASED ON THEIR SIMILARITY WITH EACH OTHER IS CALLED
CLUSTERING.
59

A.K MEAN CLUSTER


• K MEAN CLUSTERING IS AN UNSUPERVISED LEARNING ALGORITHM
WHICH GROUPS THE UNLABLED DATASET INTO DIFFERENT CLUSTERS,
HERE K DEFINES THE NUMBER OF PRE-DEFINED CLUSTERS THAT NEED
TO BE CREATED IN THE PROCESS, AS IF K = 2 THEY WILL BE TWO
CLUSTERS AND
K = 3, THERE WILL BE THREE CLUSTERS, AND SO ON.
• IT IS CENTROID – BASED ALGORITHM , WHERE EACH CLUSTER IS
ASSOCIATED WITH A CENTROID
60

K MEAN CLUSTERING ALGORITHM


STEP – 1 : Select the number of k to decide the number of clusters.
STEP – 2 : Select random k points or centroids(It can be other from the
input data set).
STEP – 3 : Assign each data Point to the closest cendroid which will form
the predefined K
clusters .
STEP – 4 : Calculate the mean and place a new centroid of each cluster.
STEP – 5 : Repeat the third steps, which means reassign each datapoint to
the new Closest
Cendroid of each clusters.
STEP – 6 : If any reassignment to occur, then go to step 4 else go to finish.
STEP – 7 : The model is ready.
K = 2 ( WE CAN TAKE ANY NUMBER) 61
Example
d=
NO HEIGH WEIGH
T T
( 𝑥 1 , 𝑦 1 ) = (185,72)
1 185 72 √ ( 185 −168 ) + ( 72− 60 )
2 2
= which
2 170 56 ( 𝑥 2 , 𝑦 2 )= (168,60) 20.80 one have
8 less
3 168 60
(179𝑥 𝑦 )2 distance
4 68 1 , = 1
will come
(182𝑥 , (170,56)
𝑦
=(168,6 √ ( 12
70− 168 ) + ( 56 − 60 )
)
2
= on the
5 72 2
0) 4.47 cluster
6 188 77
7 180 71 New cendroid for cluster 1 (185,72)
8 180 70
New cendroid for cluster 2 (170+168) , (56 +
9 183 84 60) = (169,58)
10 180 88 2

(185, 72)
2
√ (=1(179,68)
= (185,72)
85 − 1 79 )
2
+ ( 72 =
− 68)
7.211
2

(179, 68) (170,56)


(168,60)
√ = (169,58)
( 1 69 −
= (179,68)
1 79 )
2
+ ( 58 =− 68
14.14 )
2

(180,88)
2
New cendroid for cluster 1 (185+179)+(72+68) = 182.70

New cendroid for cluster 2 (170+168)+(56+60) = 169.58

= (182,72) = =2

= (182,72) = = 19.10

New cendroid for cluster 1 (185+179+182) + (72+68+72) = 379

New cendroid for cluster 2 (170+168)+(56+60) = (169.58)

Do the rest………
Thank You

You might also like