DA_Unit_2
DA_Unit_2
Pallavi Shukla
Assistant Professor
UCER
Regression
• Regression analysis is a statistical method to model the relationship
between dependent (target) and independent (predictor) variables
with one or more independent variables.
• Helps us to understand how the value of the dependent variable
changes corresponding to an independent variable when other
independent variables are held fixed.
• Regression searches for relationships among variables.
• For example, you can observe several employees of some company
and try to understand how their salaries depend on the features, such
as experience, level of education, role, city they work in, and so on.
Regression
• In Regression, we plot a graph between the variables which best fits the
given datapoints.
• Using this plot, the machine learning model can make predictions about
the data.
• In simple words, "Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a way that the
vertical distance between the datapoints and the regression line is
minimum."
• The distance between datapoints and line tells whether a model has
captured a strong relationship or not.
Examples
• Underfitting and Overfitting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is
called Overfitting. And if our algorithm does not perform well even with
training dataset, then such problem is called underfitting.
Common Regression Algorithms
The most common regression algorithms are:
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Linear Regression
• Multivariate adaptive regression splines
• Logistic Regression
• Maximum likelihood estimation(least squares)
Linear Regression:
Ex – Ex –
Max f(x) of sin θ = 90o Arg.max f(x) of sin θ = 1
It means sin θ has its max value at 90o It means sin θ has a maximum value of 1.
BRUTE FORCE BAYESIAN CONCEPT
LEARNING -
• Also called Brute Force Algorithm.
P(h|D) = P(D|h) . P(h)
P(D)
• hMAP = Arg max P(h|D)
• Let P(h) = 1 / |H| for all h in H.
• h = a single hypothesis , H = a set consisting pf all hypothesis
• H = {h1, h2, h3,……..hn}
• Now, P(h) = Probability of hypothesis (h)
• P(D|h) = 1 , if di = h(xi)
0 , otherwise
P(D|h) = Conditional probability of data(D) when hypothesis (h) is given
di = Data Value
Xi = Variable Value
P(h|D) = 1 . 1/|H|
P(D)
• But P(D) = |VS H,D| / |H|
• Now, putting thois value in above equation ,
• P(h|D) = 1
• |VS H,D|
• Where |VS H,D| is called the version space of hypothesis set(H)
BAYE’S OPTIMAL CLASSIFIER
• It is a “Probabilistic Model” which makes the most probable prediction for a new
example.
• Equation is
• Question : We have been given dataset for weather condition with two
columns in which one has a value of weather condition and the other
column reports regarding whether player has gone for playing or not.
Find the probability of player going for play on sunny day.
0 Outlook Play
0. Rainy Yes
1. Sunny Yes
2. Overcast Yes
3. Overcast Yes
4. Sunny No
5. Rainy Yes
6. Sunny Yes
7. Overcast Yes
8. Rainy No
9. Sunny No
10. Sunny Yes
11. Rainy No
12. Overcast Yes
13. Overcast Yes
Solution: Frequency Table
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Make Likelihood Table :
• SVM classifier is a frontier that best segregates the two classes (Hyper plane/
line)
Support Vector Machine Terminology
1.Hyperplane: Hyperplane is the decision boundary that is used to
separate the data points of different classes in a feature space. In
the case of linear classifications, it will be a linear equation i.e.
wx+b = 0.
2.Support Vectors: Support vectors are the closest data points to
the hyperplane, which makes a critical role in deciding the
hyperplane and margin.
3.Margin: Margin is the distance between the support vector and
hyperplane. The main objective of the support vector machine
algorithm is to maximize the margin. The wider margin indicates
better classification performance.
Support Vector Machine Terminology
1. Kernel: Kernel is the mathematical function, which is used in SVM to map the
original input data points into high-dimensional feature spaces, so, that the
hyperplane can be easily found out even if the data points are not linearly
separable in the original input space. Some of the common kernel functions are
linear, polynomial, radial basis function(RBF), and sigmoid.
2. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane
is a hyperplane that properly separates the data points of different categories
without any misclassifications.
3. Soft Margin: When the data is not perfectly separable or contains outliers, SVM
permits a soft margin technique. Each data point has a slack variable introduced
by the soft-margin SVM formulation, which softens the strict margin
requirement and permits certain misclassifications or violations. It discovers a
compromise between increasing the margin and reducing violations.
Types of SVM
• Linear SVM
Non Linear SVM
PROPERTIES OF SVM
• Flexibility in choosing a similarity function
• Sparseness of solution when dealing with large data sets - only support vectors are
used to specify the separating hyperplane
• Ability to handle large feature spaces - complexity does not depend on the
dimensionality of the feature space
• Overfitting can be controlled by soft margin approach
• Nice math property: a simple convex optimization problem which is guaranteed to
converge to a single global solution
• Feature Selection
Advantages of SVM
• Handling high-dimensional data: SVMs are effective in handling high-
dimensional data, which is common in many applications such as image
and text classification.
• Handling small datasets: SVMs can perform well with small datasets,
as they only require a small number of support vectors to define the
boundary.
• Modeling non-linear decision boundaries: SVMs can model non-linear
decision boundaries by using the kernel trick, which maps the data into a
higher-dimensional space where the data becomes linearly separable.
Advantages of SVM
• Robustness to noise: SVMs are robust to noise in the data, as the decision boundary is determined
by the support vectors, which are the closest data points to the boundary.
• Generalization: SVMs have good generalization performance, which means that they are able to
classify new, unseen data well.
• Versatility: SVMs can be used for both classification and regression tasks, and it can be applied to a
wide range of applications such as natural language processing, computer vision, and
bioinformatics.
• Sparse solution: SVMs have sparse solutions, which means that they only use a subset of the
training data to make predictions. This makes the algorithm more efficient and less prone to
overfitting.
• Regularization: SVMs can be regularized, which means that the algorithm can be modified to avoid
overfitting.
Disadvantages of SVM
• Computationally expensive: SVMs can be computationally expensive for large
datasets, as the algorithm requires solving a quadratic optimization problem.
• Choice of kernel: The choice of kernel can greatly affect the performance of an
SVM, and it can be difficult to determine the best kernel for a given dataset.
• Sensitivity to the choice of parameters: SVMs can be sensitive to the choice of
parameters, such as the regularization parameter, and it can be difficult to
determine the optimal parameter values for a given dataset.
• Memory-intensive: SVMs can be memory-intensive, as the algorithm requires
storing the kernel matrix, which can be large for large datasets.
Disadvantages of SVM
• Limited to two-class problems: SVMs are primarily used for two-class
problems, although multi-class problems can be solved by using one-versus-
one or one-versus-all strategies.
• Lack of probabilistic interpretation: SVMs do not provide a probabilistic
interpretation of the decision boundary, which can be a disadvantage in some
applications.
• Not suitable for large datasets with many features: SVMs can be very slow and
can consume a lot of memory when the dataset has many features.
• Not suitable for datasets with missing values: SVMs requires complete datasets,
with no missing values, it can not handle missing values.
Applications of SVM
1.Face observation – It is used for detecting the face according to
the classifier and model.
2.Text and hypertext arrangement – In this, the categorization
technique is used to find important information or you can say
required information for arranging text.
3.Grouping of portrayals – It is also used in the Grouping of
portrayals for grouping or you can say by comparing the piece of
information and take an action accordingly.
Applications of SVM
1. Bioinformatics – It is also used for medical science as well like in laboratory,
DNA, research, etc.
2. Handwriting remembrance – In this, it is used for handwriting recognition.
3. Protein fold and remote homology spotting – It is used for spotting or you can
say the classification class into functional and structural classes given their
amino acid sequences. It is one of the problems in bioinformatics.
4. Generalized predictive control(GPC) – It is also used for Generalized predictive
control(GPC) for predicting and it relies on predictive control using a multilayer
feed-forward network as the plants linear model is presented
Applications of SVM
5. Facial Expression Classification – Support vector machines (SVMs) is a
binary classification technique. The face Expression Classification model
determines the precise face expression by modeling differences between
two facial images. Validation techniques include the leave-one-out
methods and the K-fold test methods.
6. Speech Recognition – The transcription of speech into text is called
speech recognition. Mel Frequency Cepstral Coefficients (MFCC)-based
features are used to train Support Vector Machines (SVM), which are used
for figuring out speech. Speech recognition is a challenging classification
problem that is categorized using a variety of mathematical techniques,
including support vector machines, pattern recognition techniques, etc
• For any query : Write mail to