Data Science Ml
Data Science Ml
ARTIFICIAL INTELLIGENCE
MACHINE LEARNING
DATA SCIENCE
3
The term “data science” combines two key elements: “data” and “science.”
1.Data: It refers to the raw information that is collected, stored, and processed. In
today’s digital age, enormous amounts of data are generated from various sources
such as sensors, social media, transactions, and more. This data can come in
structured formats (e.g., databases) or unstructured formats (e.g., text, images,
videos).
2.Science: It refers to the systematic study and investigation of phenomena using
scientific methods and principles. Science involves forming hypotheses, conducting
experiments, analyzing data, and drawing conclusions based on evidence
EXAMPLE
Imagine you’re scrolling through your favorite social media platform, and you
notice that certain types of posts always seem to grab your attention. Maybe it’s
cute animal videos, delicious food recipes, or inspiring travel photos.
Now, from the platform’s perspective, they want to keep you engaged and coming
back for more. This is where data science comes into play. They collect a ton of
information about what you like, share, and comment on. They use data science
techniques to analyze all this information to understand your preferences better.
For instance, they might notice that you spend more time watching animal videos
than looking at food recipes. Armed with this knowledge, they can then customize
your feed to show you more of what you love – adorable pets! They might even
predict what type of pet video you’re likely to enjoy next based on your past
behavior.
In this scenario, data science is like the magic behind the scenes that helps social
media platforms understand your interests and tailor your experience to keep you
engaged. It’s all about using data to make your online experience more
personalized and enjoyable.
6
Real-world Applications of Data Science
1. In Search Engines
The most useful application of Data Science is Search Engines. As we know when we want
to search for something on the internet, we mostly use Search engines like Google, Yahoo,
DuckDuckGo and Bing, etc. So Data Science is used to get Searches faster.
For Example, When we search for something suppose “Data Structure and algorithm
courses ” then at that time on Internet Explorer we get the first link of most visited
websites. This happens because that website is visited most in order to get information
regarding Data Structure courses and Computer related subjects. So this analysis is done
using Data Science, and we get the Topmost visited Web Links.
2. In Transport
Data Science is also entered in real-time such as the Transport field like Driverless Cars.
With the help of Driverless Cars, it is easy to reduce the number of Accidents.
For Example, In Driverless Cars the training data is fed into the algorithm and with the
help of Data Science techniques, the Data is analyzed like what as the speed limit in
highways, Busy Streets, Narrow Roads, etc. And how to handle different situations while
driving etc.
7
3. In E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a better
user experience with personalized recommendations.
For Example, When we search for something on the E-commerce websites we get
suggestions similar to choices according to our past data and also we get
recommendations according to most buy the product, most rated, most searched, etc.
This is all done with the help of Data Science.
LYFE CYCLE
9
BENEFITS
DATA
SCIENCE ML/DL AI
2. Limited Memory
These machines collect previous data and continue adding it to their memory.
They have enough memory or experience to make proper decisions, but memory
is minimal. For example, this machine can suggest a restaurant based on the
location data that has been gathered.
15
3. Theory of Mind
This kind of AI can understand thoughts and emotions, as well as interact socially.
However, a machine based on this type is yet to be built.
4. Self-Aware
Self-aware machines are the future generation of these new technology. They will
be intelligent, sentient, and conscious.
16
ADVANTAGES DISADVANTAGES
It reduces human error It’s costly to implement
It never sleeps, so it’s It can’t duplicate human
available 24x7 creativity
It never gets bored, so it It will definitely replace some
easily handles repetitive tasks jobs, leading to unemployment
It’s fast People can become overly
reliant on it
17
APPLICATIONS OF AI
SPEECH RECOGNITION
IMAGE RECOGNITION
FRAUD DETECTION
AUTOMATION
RECOMENDATIONS
18
EXAMPLES
ChatGPT
ChatGPT is an advanced language model developed by OpenAI, capable of
generating human-like responses and engaging in natural language conversations.
It uses deep learning techniques to understand and generate coherent text,
making it useful for customer support, chatbots, and virtual assistants.
Google Maps
Google Maps utilizes AI algorithms to provide real-time navigation, traffic updates,
and personalized recommendations. It analyzes vast amounts of data, including
historical traffic patterns and user input, to suggest the fastest routes, estimate
arrival times, and even predict traffic congestion.
19
Smart Assistants
Smart assistants like Amazon's Alexa, Apple's Siri, and Google Assistant employ AI
technologies to interpret voice commands, answer questions, and perform tasks.
These assistants use natural language processing and machine learning
algorithms to understand user intent, retrieve relevant information, and carry out
requested actions.
Self-Driving Cars
Self-driving cars rely heavily on AI for perception, decision-making, and control.
Using a combination of sensors, cameras, and machine learning algorithms, these
vehicles can detect objects, interpret traffic signs, and navigate complex road
conditions autonomously, enhancing safety and efficiency on the roads.
20
3 TYPES OF AI
1. Artificial Narrow Intelligence (ANI): Also known as Weak AI, it specializes in
performing specific tasks and lacks general cognitive abilities.
Computer Vision:
•Image Recognition: Identifying objects, people, or scenes in images.
•Image Generation: Creating new images from scratch or based on existing
ones.
•Video Analysis: Understanding and processing video content.
Expert Systems: 23
Robotics:
•Autonomous Navigation: Enabling robots to navigate environments without human
intervention.
•Manipulation and Control: Allowing robots to interact with and manipulate objects.
•Human-Robot Interaction: Enhancing the ways robots and humans work together.
MACHINE LEARNING
WHAT IS MACHINE 25
LEARNING(ML)?
MACHINE LEARNING IS AN APPLICATION OF AI THAT PROVIDES SYSTEMS THE ABILITY TO
AUTOMATICALLY LEARN AND IMPROVE FROM EXPERIENCE WITHOUT BEING EXPLICITY
PROGRAMMED
LEARNS
MACHINE + LEARNING
ORDINARY MACHINE
WITH AI PREDICTS
SYSTEM LEARNING
IMPROVES
26
LEARNS FROM
PREDICTIONS/
THE
DECISION
FEEDBACK
TYPES OF ML MACHINE LEARNING
27
REINFORCEMEN
SUPERVISEED UNSUPERVISE SEMI
T
LEARNING D LEARNING SUPERVISED
LEARNING
THE MACHINE NON-LABELED THE MACHINE LEARNING
LEARNS FROM THE TRAINING DATA LEARNS ON ITS
IS A BRANCH OF ML
TRAINING DATA THAT OWN
THAT COMBINES
IS LABELED
SUPERVISED &
CLASSIFICAT UNSUPERVISED BY
REGRESSION CLUSTERING USING BOTH
ION LABELED AND
UNLABELED DATA
KNN TO TRAIN AI FOR
DECISION TREE LINEAR REGRESSION K-MEAN
CLASSIFICATION &
REGRESSION
LOGISTIC REGRESSION
RANDOM FOREST
SUPPORT VECTOR MACHINE
28
SUPERVISED LEARNING
NEW
KNOWN DATA RESPONSE
ITS AN
APPLE
MODEL
KNOWN
RESPONSE
THESE ARE
APPLES
NEW DATA
UNSUPERVISED LEARNING 29
I CAN
SEEN A
PATTERN
MODEL
INPUT DATA
REINFORCEMENT LEARNING REINFORCED
30
RESPONSE
WRONG! IT’S AN
IT’S A ITS AN NOTED! APPLE
MANGO APPLE
INPUT
RESPONSE FEEDBACK LEARNS
INPUT
SUPERVISED VS UNSUPERVISED
LABELED DATA
DIRECT FEEDBACK
SUPERVISED
PREDICT OUTPUT
UNSUPERVISED
NON-LABELED DATA
NO FEEDBACK
FIND HIDDED
STRUCTURE IN DATA
32
A. K NEAREST NEIGHBOURS
(KNN)
NOW WE HAVE A
NEW DATA
POINT(BLACK
BALL)
COST
COST
WE WILL NOW
TRY TO
CLASSIFY THIS
USING KNN
DURABILITY
DURABILITY
CONSIDER THERE ARE 3 LET K = 5
CLUSTERS SO THESE GLOWING ONES ARE THE CLOSEST 5
1.FOOTBALL BALLS TO OUR NEW DATA POINT(BLACK BALL)
THUS WE WILL CLASSIFY THE BLACK BALL AS A
2.BASKETBALL TENNIS BALL
3.TENNISBALL THUS OUR NEW DATA POINT IS A TENNIS BALL
B.DECISION TREE 35
IS IT SUNNY
YE NO
S
IS IT
GO SWIM
RAINING
YES NO
STAY WALK THE
INDOORS DOG
36
D. LOGISTIC REGRESSION
LOGISTIC REGRESSION IS USED FOR BINARY CLASSIFICATION WHERE WE USE SIGMOID
FUNCTION, THAT TAKES INPUT AS INDEPENDENT VARIABLES AND PRODUCES A
PROBABILITY VALUE BETWEEN 0 AND 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 then it belongs to Class 1 otherwise it belongs
to Class 0. It’s referred to as regression because it is the extension of linear
regression but is mainly used for classification problems.
EXAMPLE 1 : WHETHER A DISEASES IS PRESENT OR NOT?
SMOKING
SO,WITH THE HELP OF LOGISTIC
REGRESSION,WE CAN DETERMINE
WHAT HAS AN INFLUENCE ON
WHETHER A CERTAIN DISEASES IS
EXAMPLE 2 : WHETHER A PERSON BUYS OR DOESN’T BUY A PARTICULAR 37
PRODUCT?
YES NO
KEY POINTS:
•LOGISTIC REGRESSION PREDICTS THE OUTPUT OF A CATEGORICAL DEPENDENT
VARIABLE. THEREFORE, THE OUTCOME MUST BE A CATEGORICAL OR DISCRETE VALUE.
[exact value , the number of students in a classroom, the roll of a die, or the number of
cars in a parking lot.]
•IT CAN BE EITHER YES OR NO, 0 OR 1, TRUE OR FALSE, ETC. BUT INSTEAD OF GIVING
THE EXACT VALUE AS 0 AND 1, IT GIVES THE PROBABILISTIC VALUES WHICH LIE
BETWEEN 0 AND 1.
•IN LOGISTIC REGRESSION, INSTEAD OF FITTING A REGRESSION LINE, WE FIT AN “S”
SHAPED LOGISTIC FUNCTION, WHICH PREDICTS TWO MAXIMUM VALUES (0 OR 1).
38
straight line that fits this data set and we can consider that
straight line as the probability if the person is having a
0.5 heart diseases or not .Based on this straight line what we
can do is we can predict if the person is having heart
diseases or not. for a new input features for lets say if you
want to find if the person having 75 age has a diseases or
not then what we can do is we can go to this line (mark
0 point from x -axis pink colour) and see what is the value of
this (mark point from y -axis pink colour). so this value will
25 50 serve as the probability for us (mark point blue colour), so
75 age this much is the probability the person having heart
1 diseases and if this probability is about 0.5 then we will
classify it as 1 or if the probability below 0.5 we will classify
it as 0.But you can see that the straight line cannot actually
diseases
0.5 fit properly to this data set you will see a lot of gaps.Also
lets say if you want to predict if the person having 50 age
Heart
E.RANDOM FOREST
EXAMPLE : You are a student who wants to choose A college graduation course
RIGHT after your high school now you are confused about which course to take
based on your strength and interest. So you ask for advice from different people
like teachers, parents, degree student and working people. Now one person with
their on way of thinking might focus on one aspect while other look at the other
factors this group of people each providing relating their own individual opinion,
can help you make a more balanced & reliable decision
TRAINING SET
TRAINING
DATA 1
TRAINING
DATA 2 ... TRAINING
DATA n
DECISION
TREE 1
DECISION
TREE 2 ... DECISION
TREE n
STEPS:
Tree-1 Tree – 2
Tree-3
CLASS-
CLASS-A CLASS-A
B
MAJORITY-VOTING
FINAL-CLASS
44
STRENGTHS WEEKNESS
It take less training time as Random forest is that when used for
compared to other algorithms’ regression they cannot predict
beyond the range in the training
it predit output with high data, and that they may over-fit
accuracy, even for the large datasets that are particularly noisy
dataset it runs efficiently. The sizes of the model created by
It can also maintain accuracy random forests may be very large. It
may take hundreds of megabytes of
when a last proportion of data memory and maybe slow to
is missing evaluate
45
TYPES OF SVM
LINEAR SVM - Linear SVM is used for linearly separable data
which means if a dataset can be classified into two classes by
using a single straight line, then such data is termed as linearly
separable data and classifier is used called as LINEAR SVM
CLASSIFIER
NON-LINEAR SVM - Non linear SVM is used for non linearly
separated data, which means if a dataset cannot be classified by
using a straight line, then such data is termed as non-linear data
and classifier used is called as NON-LINEAR SVM CLASSIFIER
DATA IS LINEARLY SEPARABLE NON-LINEAR SEPARABLE
48
KERNAL
$ 284,00 $304,00
50
A. LINEAR REGRESSION
LET US CONSIDER AN EXAMPLE WHERE THE FIVE WEEK’S SALES DATA IS GIVEN
AS SHOWN IN TABLE
APPLY LINEAR REGRESSION TECHNIQUE TO PREDICT THE 7TH AND 12TH WEEK
SAKLES
4 .
Y - INDEPENDENT
3.5
. (Week) (Sales In
Thousand)
2.5
. 1
2
1.2
1.8
2
. 3 2.6
1.5 4 3.2
1 5 3.8
1 2 3
4 X-5INDEPENDENT
52
LINEAR REGRESSION EQUATION IS GIVEN BY :
WHERE,
(Sales In
(Week) Thousan
d)
1 1.2 1 1.2
2 1.8 4 3.6
3 2.6 9 7.8
4 3.2 16 12.8
5 3.8 25 19
SUM 15 12.6 55 44.4
AVERAGE =3 = 2.56 = 11 = 8.88
53
( 𝑥𝑦 ) − ( 𝑥 ) ( 𝑦 ) REGRESSION EQUATION
𝑎 1= = 88.8 – 3 * = 0.66
2
𝑥 −𝑥
2 IS:
2.5211 − 3
2
DEPENDENT VARIABLE
55
DEPENDENT VARIABLE
56
REGRESSION VALUE IS NOT REVERSIBLE : REGRESSION
EQUATION USED TO PREDICT VALUE OF Y FROM A GIVEN VALUE OF
X CANNOT BE USED TO PREDICT THE VALUES OF X FROM GIVEN
VALUES OF Y
INDEPENDENT/EXPLANATORY/PREDICTOR VARIABLE
USUALLY DENOTED BY X.(THE VARIABLE WE USE FOR PREDICTIONS)
IT HELPS IN PREDICTING THE VALUE OF OTHER VARIABLES
DEPENDENT/RESPONSE/PREDICTED VARIABLE
USUALLY DENOTED BY Y.(THE VARIABLE WE WANT TO INFER OR
PREDICT).IT IS THE ONE WHICH IS BEING PREDICTED BY OTHER
VARIABLE
57
EXAMPLE:
THE YIELD OF WHEAT DEPENDS ON AMOUNT OF FERTILIZER
USED. HERE FERTILIZER IS INDEPENDENT VARIABLE AND
YIELD OF WHEAT IS DEPENDENT VARIABLE.
FIELDS OF PRODUCT DEPENDS ON AMOUNT OF
ADVERTISING. HERE ADVERTISING IS INDEPENDENT
VARIABLE AND SALES IS DEPENDENT VARIABLE
58
TYPES OF UNSUPERVISED
LEARNING
CLUSTERING : THE TASK OF GROUPING DATA POINTS
BASED ON THEIR SIMILARITY WITH EACH OTHER IS CALLED
CLUSTERING.
59
(185, 72)
2
√ (=1(179,68)
= (185,72)
85 − 1 79 )
2
+ ( 72 =
− 68)
7.211
2
(180,88)
2
New cendroid for cluster 1 (185+179)+(72+68) = 182.70
= (182,72) = =2
= (182,72) = = 19.10
Do the rest………
Thank You