0% found this document useful (0 votes)
132 views

ML Decode

Uploaded by

hruchitamorey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
132 views

ML Decode

Uploaded by

hruchitamorey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 130
SUBJECT CODE: 410242 Choice Based Credit System SAVITRIBAI PHULE PUNE UNIVERSITY - 2019 SYLLABUS B.E. (Computer) Semester - Vil MACHINE LEARNING (For END SEM Exam - 70 Marks) lresh A. Dhotre ME. (Information Technology) Ex-Faculty, Sinhgad College of Engineering, Pune. © Written by Popular Authors of Text Books of Technical Publications © Covers Entire Syllabus 2 Question’ Answer Format © Exact Answers and Solutions © Solved Model Question Paper (As Per 2019 Pattern) SOLVED SPPU QUESTION PAPERS * March - 2019 + June- 2022 A Gulde For Engineering Students A Guide For Engineering Students MACHINE LEARNING (For END SEM Exam - 70 Marks) SUBJECT CODE : 410242 BE, (Computer Engineering) Semester - VII © Copyright with Technical Publications All publishing rights (printed and ebook version) reserved with Technical Publications. No part of this book should be reproduced in any form, Electronic, Mechanical, Photocopy or any information storage and retrieval system without prior permission in writing, from Technical Publications, Pune. Published by : TECHNICAL Amit Residency, Office No.1, 412, Shaniwor Peth, Pune - 411030, M.S. INDIA Ph.: +91-020-24495496/97 PUBLICATIONS | Email : [email protected] ‘Aa UpTarstier krawisdye _) Website : www-technicalpublications.in Printer : Yosisj Printers & Binders, Sr.No. 10/1A,Ghule Industral Estate, Nanded Village Road, Tal. - Havel, Dist. - Pune - 411041 ISBN 978-93-8585-241-0 MM. 9789355852410 [1] ai) SYLLABUS Machine Learning - (410242) Credit Examination Scheme : 03 | End-Sem (Paper) : 70 Marks Unit III Supervised Learning : Regression Bias, Variance, Generalization, Underfitting, Overfitting, Linear regression, Regression : Lasso regression, Ridge regression, Gradient descent algorithm. Evaluation Metrics : MAE, RMSE, R2 (Chapter - 3) Unit IV Supervised Learning : Classification Classification ; K-nearest neighbour, Support vector machine. Ensemble Leaming : Bagging, Boosting, Random Forest, Adaboost. Binary-vs-Multiclass Classification, Balanced and Imbalanced Multiclass. Classification Problems, Variants of Multiclass Classification: One-vs-One and One- vs-All Evaluation Metrics and Score : Accuracy, Precision, Recall, Fscore, Cross-validation, Micro-Average Precision and Recall, Micro-Average F-score, Macro Average Precision and Recall, Macro-Average F-score. (Chapter - 4) Unit V__ Unsupervised Learning K-Means, K-medoids, Hierarchical, and Density-based Clustering, Spectral Clustering. Outlier analysis: introduction of isolation factor, local outlier factor. Evaluation metrics and score : elbow method, extrinsic and intrinsic methods (Chapter - 5) Unit VI Introduction To Neural Networks Artificial Neural Networks : Single Layer Neural Network, Multilayer Perceptron, Back Propagation Learning, Functional Link Artificial Neural Network, and Radial Basis Function Network, Activation functions, Introduction to Recurrent Neural Networks and Convolutional Neural Networks (Chapter - 6) (i) TABLE OF CONTENTS NOs cb tay 00 Gee Chapter - 3 Supervised Learnii Regression (3 - 1) to (3 - 26) 3.1. Bias and Variance... 3.2 Underfitting and Overfitting.... 3.3 Linear Regression ... 3.4. Regression : Lasso Regression, Ridge Regression... 3.5 Gradient Descent Algorithm... 3,6 Evaluation Metrics : MAE, RMSE, R2 Supervised Learni Chapter - 4.1 Classification : K-Nearest Neighbour. 4.2 Support Vector Machine .. 4.3. Ensemble Learning : Bagging, Boosting, Random Forest, AdabOost .....sssscsssessssesesnseesnesennecessesessnstsnseee 4.4 Binary-vs-Multiclass Classification... 4.5 Variants of Multiclass Classification : One-vs-One and One-vs-All 4.6 Evaluation Metrics and Score. 4.7 Cross-Validation 4.8 Micro-Average. 4.9 Macro-Average..... Chapter-5 Unsupervised Learning (5 - 1) to (8 - 32) 5.1 Introduction to Clustering... 5.2 K-Means and K-Medoids 5.3. Hierarchical Clustering .... 5.4. Density-Based Clustering... 5.5. Outlier Analysis. 5.6 Evaluation Metrics and Score.. Chapter - 6 Introduction to Neural Networks (6 - 1) to (6 - 28) 6.1 Artificial Neural Networks .. oat. 6.2 Multilayer Perceptron... 6.3. Back Propagation Learning... 6.4 Functional Link Artificial Neural Network, 6.5 Radial Basis Function Network 6.6 Activation Functions... 6.7 Introduction to Recurrent Neural Networks... 6.8 Convolutional Neural Network: = Solved Model Question Paper (M - 1) to (M - 3) (v) Supervised Learning : Regression 3.1: Bias and Variance Q.1 What is bias in machine learning ? Ans. : © Bias is a phenomenon that skews the result of an algorithm in favor or against an idea. * Bias is consideréd a systematic error that occurs in the machine learning model itself due to incorrect assumptions in the ML process. Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Fig. Q.1.1 shows bias, High bias Low bias Low variance High variance Test sample Training sample Low jh Model complexity = Fig. Q.1.1 G-) Machine Learning 3-2 Supervised Learning : Regression e Bias and variance are components of reducible error. Reducing errors requires selecting models that have appropriate complexity and flexibility, as well as suitable training data. © Low bias : A low bias model will make fewer assumptions about the form of the target function. ¢ High bias : A model with a high bias makes more assumptions and the model becomes unable to capture the important features of our dataset. A high bias model also cannot perform well on new data. e Examples of machine learning algorithms with low bias are decision trees, k-nearest neighbours and support vector machines. e An algorithm with high bias is linear regression, linear discriminant analysis and logistic regression. Q.2 How to reduce the high bias ? Ans. : e If the average predicted values are far off from the actual values then the bias is high. High bias causes algorithm to miss relevant relationship between input and output variable. e When a model has a high bias then it implies that the model is too simple and does not capture the complexity of data thus underfitting the data. Low variance means there is a small variation in the prediction of the target function with changes in the training data set. At the sarhe time, High variance shows a large variation in the prediction of the target function with changes in the training dataset. ¢ High bias can be identified when we have high training error and validation error or test etror is same as training error. e Following methods are used to reduce high bias : 1. Increase the input features as the model is underfitted. 2. Decrease the regularization term. 3. Use more complex models, such as including some polynomial features. A Guide for Engineering Students Machine Learning 3-3 Supervised Learning : Regression Q.3 Define variance. Explain low and high variance ? How to reduce high variance ? ‘Ans. © Variance indicates how much the estimate of the target function will alter if different training data were used. In other words, variance describes how much a random variable differs from its expected value. © Variance is based on a single training set. Variance measures the inconsistency of different predictions using different training sets, it's not a measure of overall accuracy. « Low variance means there is a small variation in the prediction of the target function with changes in the training data set. High variance shows a large variation in the prediction of the target function with changes in the training dataset. © Variance comes from highly complex models with a large number of features. 1. Models with high bias will have low variance. 2. Models with high variance will have a low bias. « Following methods are used to reduce high variance : 1. Reduce the input features or number of parameters as a model is overfitted. 2. Do not use a much complex model. 3. Increase the training data. 4, Increase the regularization term. Q4 Explain bias-variance trade off. Ans.: ¢ In the experimental practice we observe an important phenomenon called the bias variance dilemma. © In supervised learning, the class value assigned by the learning model built based on the training data may differ from the actual class value. This error in learning can be of two types, errors due to ‘bias’ and error due to ‘variance’. 1. Low-bias, low-variance : The combination of low bias and low variance shows an ideal machine leaning model. However, it is not possible practically. ° ‘A Guide for Engineering Studen's Machine Learning 3-4 2. Supervised Learning : Regression Low-bias, high-variance : With low bias and high variance, model predictions are inconsistent and accurate on average. This case occurs when, the model leams with a large number of Parameters and hence leads to an overfitting High-bias, low-variance : With high bias and low variance, predictions are consistent but inaccurate on average. This case occurs when a model does not leam well with the training dataset or uses few numbers of the parameter. It leads to underfitting problems in the model. High-bias, high-variance : With high bias and high variance, predictions are inconsistent and also inaccurate on average. Q.5 What is difference between bias and variance ? Ans. : Sr. No. Bias Variance 1, Bias is the difference between Variance is the amount that the the average prediction and the —_ prediction will change if correct value. different training data sets were used. 2. The model is incapable of The model recognizes the locating patterns in the dataset majority of the dataser's patterns that it was trained on and it and can even leam from the produces inaccurate results for noise or data that isn't vital to its both seen and unseen data. operation. 3. Low bias models : k-nearest Low variance models : Linear neighbors, decision trees and regression and logistic support vector machines. regression. 4. High bias models : Linear High variance models : k-Nearest regression and logistic neighbors, decision trees and regression, support vector machines. A Guide for Engineering Students Machine Learning 3-5 Supervised Learning : Regression 3. inderfitting and Overfitting Q6 What is overfitting and underfitting in machine learning model ? Explain with example. CaP [SPPU + June-22, Marks 6] ‘Ans, © Fig. Q.6.1 shows underfitting and overfitting. Values A oe aNd Time Fig. Q.6.1 (a) Underfitting Fig. Q.6.1 (b) Overfitting © Underfitting occurs when the model is unable to match the input data to the target data. This happens when the model is not complex enough underfitting and overfitting to match all the available data and performs poorly with the training dataset. «These kind of models are very simple to capture the complex patterns in data like linear and logistic regression. Underfitting examples + 1. The learning time may be prohibitively large and the learning stage was prematurely terminated. . The learner did not use a sufficient number of iterations. 3, The learner tries to fit a straight line to a training set whose examples exhibit a quadratic nature. © Overfitting relates to instances where the model tries to match non-existent data. This occurs when dealing with highly complex models where the model will match almost all the given data points and perform well in training datasets. However, the model would not b¢ a A Guide for Engineering Studen® Machine Learning 3-6 Supervised Learning : Regression able to generalize the data point in the test data set to predict the outcome accurately. * These models have low bias and high variance, These models are very complex like decision trees which are prone to overfiting, ¢ Reasons for overfitting are noisy data, training data set is too small and large number of features. Q.7 How to avoid overfitting and underfitting model ? Aus. : 1. Following methods are used to avoid overftting : © Cross validation * Training with more data © Removing features Early stopping the training * Regularization + Ensembling 2. Following methods are used to avoid underfitting * By growing the education time of the model. * By increasing the wide variety of functions. Q.8 How do we know if we are underfitting or overfitting ? Ans. + If by increasing capacity we decrease generalization error, then we ae underfitting, otherwise we are overfitting, If the error in representing the training set is relatively large and the generalization error is large, then underfitting If the error in representing the training set is relatively small and the generalization error is large, then overfitting; 4, There are many features but relatively small training set Q9 Explain the Fig, Q.9.1 (a), (b) and (¢), A Guide for Engineering Students Machine Learning 3-7 Supervised Learning : Regression Machine Leaming 00 eee Price Price Size Size Size (a) (b) (°) Fig. 0.9.1 ‘Ams. : ¢ Given Fig. Q.9.1 is related to overfitting and underfitting. Underfitting (High blas and low variance) : © A statistical model or a machine leaming algorithm is said to have underfitting when it cannot capture the underlying trend of the data. It usually happens when we have less data to build an accurate model ‘and also when we try to build a linear model with a.non-linear data, Price Price Price Size Size Size 2 Op + 04x + 6, xe 6, x24 0, x2 8p + 84x + Box’ Jo + Oyx + Opx'+ 2x + 02 Oy + 84x High variance (overfit) High bais (underfit) High bais (underfit) Fig. 0.9.2 © In such cases the rules of the machine learning model are too easy and flexible to be applied on such minimal data and therefore the model will probably make a lot of wrong predictions. © Underfitting can be avoided by using more data and also reducing the features by feature selection. Overfitting (High variance and low bias)’: © A statistical model is said to be overfited, when we train it with a Jot of data. in “A Guide for Engineering Stdet® ane Machine Learning 3-8 Supervised Learning : Regression When a model gets trained with so much of data, it starts learning from the noise and inaccurate data entries in our data set. © Then the model does not categorize the data correctly, because of too many details and noise, © The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. ® A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees. Q.10 Explain difference between overfitting and underfitting. Ans. : Sr. Overfitting Underfitting No. 1. Very low training error. High training error. 2. It is too complex. Model is too simple. 3, Support high variance and low Support low variance and high bias. bias. 4, Smaller quantity of features. Larger quantity of feature. 5. Performs more regularization, Performs less regularization, 6. Training error much lower than Training error close to test error, the test error. Q.11 What is goodness of fit ? Ans. : © The goodness of fit of a'model explains how well it matches a set of observations. Usually, the goodness of fit indicators summarizes the disparity between observed values and the model's anticipated values. A Guide for Engineering Students = Machine Learning 3-9 Supervised Learning : Regression ¢ Jn machine leaming algorithm, a good fit is when both the training data error and the test data are minimal. As the algorithm leams, the mistake in the training data for the modal is decreasing over time and so is the error on the test dataset. ¢ If we train for too long, the training dataset performance may continue to decline due to the model being overfitting and learning the irrelevant detail and noise in the training dataset. At the same time, the test set error begins to rise again as the ability of the model to generalize decreases. © The errors in the test dataset start increasing, so the point, just before the raising of errors, is the good point, and we can stop here for achieving a good model. 3, inear Regression Q.12 Define and explain regression with its model. Ans.: ¢. Regression finds correlations between dependent and independent variables. If the desired output consists of one or more continuous variable, then thé task is called as regression. © Therefore, regression algorithms help predict cqntinuous variables such as house prices, market trends; weather patterns, oil and gas prices etc. © Fig. Q.12.1 shows regression. < Data points Line of regression Dependent variable Independent variable Fig. .12.1 Regression A Guide for Engineering Students Machine Learning 3-10 Supervised Learning : Regression © When the targets in a dataset are real numbers, the machine leaning task is known as regression and each sample in the dataset has a real-valued output or target. © Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables. It can be utilized to assess the strength of the relationship between variables and for modelling the future relationship between them. © The two basic types of regression are linear regression and multiple linear regression. Q.13 Explain univariate regression. Ans. : © Univariate data is the type of data in which the result depends only on one variable. If there is only one input‘variable then we call it ‘Single Variable Linear Regression’ or ‘Univariate Linear Regression. © The function that we are trying to develop looks like. this : hg(x) = 09 + 0 x y =mxt+b © That is because linear regression is essentially the algorithm for finding the line of best fit for a set of data, © The algorithm finds the values for @9 and ; that best fit the inputs and outputs given to the algorithm. This is called univariate linear regression because the 2? parameters only go up to 1. © The univariate linear regression algorithm is much simpler than the one for multivariate. Q.14 When is it suitable to use linear regression over classification ? Ans. : © Linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. © The objective of a linear regression model is to find a relationship between the input variables and a target variable. A Guide for Engineering Students - Machine Learning 3-11 Supervised Learning : Regression 1. One variable, denoted x, is regarded as the predictor, explanatory, op independent variable. 2. The other variable, denoted y, is regarded as the response, outcome, or dependent variable. : © Regression models predict a continuous variable, such as the sales made on a day or predict temperature of a city. Let's imagine that we fit line with the training points that we have. If we want to add another data point, but to fit it, we need to change existing model. © This will happen with each data point that we add to the model; hence, linear regression isn't good for classification models. Regression estimates are used to explain the relationship between one dependent variable and one or more independent’ variables. Classification predicts categorical labels (classes), prediction models continuous - valued functions. Classification is considered to be supervised learning. Classifies data based on the training set and the values in a classifying attribute and uses it in classifying new data. Prediction means models continuous - valued functions, i.e. predicts unknown or missing values. Q.15 Why do we need to regularize in regression ? Explain. Ans. : ¢ Regression model fails to generalize on unseen data. This could happen when the model tries to accommodate all kinds of changes in the data including those belonging to both the actual pattern and also the noise. © As a result, the model ends up becoming a complex model having significantly high variance due to overfitting, thereby impacting the model performance (accuracy, precision, recall, etc.) on unseen data. © Regularization needed for reducing overfitting in the regression model. Regularization techniques are used to calibrate the coefficients of the determination of multi-linear regression models in order to minimize the adjusted loss function. A Guide for Engineering Studen® Machine Learning 3-12 Supervised Learning : Regression =_ Regression model having high variance, low bias x Regression model having balanced bias-variance x Fig. Q.15.2 Regression model after regularization © Regularization methods provide a means to control our regression coefficients, which can reduce the variance and decrease our of sample error. © The goal is to reduce the variance while making sure that the model does not become biased (underfitting). After applying the regularization technique, the above model could be obtained. Q.16 Consider following data for 5 students. Each X; (i = 1 to 5) represents the score of i student in standard X and corresponding Y; (i = 1 to 5) represents the score of i student in standard XII. i) What linear regression equation best predicts standard XII‘ score ? ii) Find regression line that fits best for given sample data. iii) How to interpret regression equation ? A Guide for Engineering Students Machine Learning 3-13 Supervised Learning ; Regression iv) If a student's score is 80 in std X, then what is his expected score in XII standard ? Score in X core in XU andard (Xj) standard (¥;) ‘Ans. : © The mean of the x values denoted by X © The mean of the y values denoted by Y © The standard deviation of the x values (denoted Sx) ‘© The standard deviation of the y values (denoted Sy) xy a -% 01-0 ns eT 8 8075-9025 7 18 5600 4900 2 -7 4550 4225 -8 -12 4200-4900 —EXY= gy2= 30500 TRF , | [Ema gy = nex alee ee n-1 390/5 = 78, Y= 385/5 = 77, ee ‘A Guide for Engineering Studer Machine Learning 3-14 Supervised Learning : Regression Sx = y (390-78)? /4 = [24336 = 156 Sy = (385-77)? /4 = [23716 = 154 We also need to compute the squares of the deviation scores : Yi m0? 85 Ae 289. ae) 95 324 70 49 144 49 630 The regression equation is a linear equation of the form : Y = Bo +BiX First, we solve for the regression coefficient (B,) : Br = SU - XK - YI 106-7) = 480 / 730 = 0.657 Once we know the value of the regression coefficient (8), we can solve for the regression slope (By ) : : By = Y-B,X Bo = 385 ~ 0.657 * 390 = 128.77 a Therefore, the regression equation is Y = 128.77 + 0.657 X Q.17 Consider following data : i) Find values of By and B, wort. linear regression model which best fits given data. ii) Interpret and explain equation of regression line. iii) If new person rates "Bahubali -Part I" as 3 then predict the” rating of same person for "Bahubali -Part II". ‘A Guide for Engineering Students oN Maske lor 3:15 __ Supervised Learsing Ron q esio, x = Rating for movie§ Y= Ratin; / "pahubali Part - 1" by i'* "Bahubali a on person person Ans. | x, ¥ x? xY Yo -) 0% - 4 3 16 _ 3 - 2 4 4 ; 3 = Mesee 2 ee am 8-1] : : 2 38 25 2 2 e } 3 9 = 0 a3 1 ; ; - + Sx- Dye yee D- py= 64 37 64 Find : X, ¥, Sx, Sy — 5 xe 2% ye D% 5. [LO- HF CL [Ecce n n {SS sy 2 art X = 186 =3, Y= 196 =3, ; Sx = fas-3P 75= fe ‘ nll ‘A Guide for Engineer" su dg Machine Learning 3-16 Supervised Learning : Regression Sy = ¥ (18-3) /5= /45 We also need to compute the squares of the deviation scores : ey ee eloufolols wlafrofalo 1 0 1 1 (ye 1 4 4 4 0 3 1 OF 4 IX= 18 SY= 18 10- 10 The regression equation is a linear equation of the form: Y = Bo +B,X First, we solve for the regression coefficient (B;) : Br = DIG - Xx - Y/Y (Kj - XP = 3/10 = 0.3 Once we know the value of the regression coefficient (B, ), we can solve for the regression slope (Bo ): Bo = Y- 8X Bo = 18 +03 * 18 = 12.6 Therefore, the regression equation is § 12.6+ 0.3 X Q.18 What do you mean by a linear regression ? Which applications are best modeled by linear regression 2 S&S [SPPU : March-19, In Sem, Marks 5] Ans. : © Best applications of linear regression are as follows : 1. If a company's sales have increased steadily every month for the past few years, by conducting a linear analysis onthe sales data with monthly sales, the company could forecast sales in future months. 2. Linear regression can also be used to analyze the marketing effectiveness, pricing and promotions on sales of a product. 3. Linear regressions can be used in business to evaluate trends and make estimates or forecasts. : A Guide for Engineering Students Machine Learning 3-17 Supervised Learning : Regressig, a | 4. Supposing two campaigns are run on TV and Radio in paraley linear regression can capture the isolated as well as the combines impact of running this ads together. Also Refer Q.12. Regression : Lasso Regression, Ridge Regression Q.49 What do you mean by logistic regression ? Explain with Uap [SPPU : June-22, Marks a) ined example. ‘Ans. : © Logistic regression is a form of regression analysis in which the outcome variable is binary or dichotomous. A statistical method used to model dichotomous or binary outcomes using predictor variables. « Logistic component : Instead of modeling the outcome, Y, directly, the method models the log odds (Y) using the logistic function. ¢ Regression component : Methods used to quantify association between an outcome and predictor variables. It could be used to build predictive models as a function of predictors. In simple logistic regression, logistic regression with 1 predictor variable. Logistic Regression : mw »( 2505) = Bo + BiX1 +B2X2 +... +BRXk Y = Bo + BiX1 +B2X2 +... +PyXz + € © With logistic regression, the response variable is an indicator of some characteristic, that is, a 0/1 variable. Logistic regression is used to determine whether other measurements are related to the presence of some characteristic, for example, whether certain blood measures are predictive of having a disease. © If analysis of covariance can be said to be a t test adjusted for other variables, then logistic regression can be thought of as a chi-square test for homogeneity of proportions adjusted for other variables. While the response variable in a logistic regression is a 0/1 variable, the losis Z tS " A Guide for Engineering tude" ad Machine Learning 3-18 Supervised Learning : Regression regression equation, which is a linear equation, does not predict the 0/1 variable itself, © Fig. Q.19.1 shows sigmoid curve for logistic regression, Linear | Logistic Fig. Q.19.1 © The linear and logistic probability models are : Linear Regression : P = ag t+ ajXy+a2Xq +... tayXy Logistic Regression : In[pA—p)] = bo + b)Xy +bgXq +. DXx © The linear model assumes that the probability p is a linear function of the regressors, while the logistic model assumes that the natural log of the odds p/(1—p) is a linear function of the regressors. ® The major advantage of the linear model is its interpretability. In the linear model, if a 1 is 0.05, that means that a one-unit increase in X1 is associated with a 5 % point increase in the probability that Y is 1 © The logistic model is less interpretable. In the logistic model, if bl is 0.05, that means that a one-unit increase in X1 is associated with a 0.05 increase in the log odds that Y is 1. And what does that mean? I’ve never met anyone with any intuition for log odds. A Guide for Engineering Students 3-19 Supervised Learning Regress, sion Q.20 Explain in detail the ridge regression and the lasso Tegression, EG SPPU : March-20, In Sem, Marks 9 ‘Ans. : © Ridge regression and the Lasso are two forms of regulatizeg regression. These methods are seeking to improve the Consequences of multicollinearity. 1. When variables are highly correlated, a large coefficient in ong variable may be alleviated by a large coefficient in another variable which is negatively correlated to the former. : Regularization. imposes an upper threshold on the values taken by the coefficients, thereby producing a more parsimonious solution, and a set of coefficients with smaller variance. Ridge «# Ridge estimation produces a biased estimator of the true parameter 6 EB] x] = (XTX+AIT TXT XB = (XT X+ ADT (XT X+A- ADB = [-MXTX+ ADB = B-AXTX+AIy 1B v © Ridge regression shrinks the regression coefficients by imposing a penalty on their size. The ridge coefficients minimize a penalized residual sum of squares. © Ridge regression protects against the potentially high variance of gradients estimated in the short directions. Lasso « One significant problem of ridge regression is thatthe penalty tem wil never force any of the coefficients to be exactly zero. Thus, the fine model will include all p predictors, which creates a challenge in model interpretation. A more modem machine learning alternative is the 1ass°. ¢ The lasso works in a similar way to ridge regression, except it use * different penalty term that shrinks some of the coefficients ex#!Y e zer0. ‘ 7 . voids # Lasso is a regularized regression machine learning technique that avoid me see ‘A Guide for Engineering 5 a Machine Learning 3-20 Supervised Learning : Regression © The lasso is a shrinkage method like ridge, with subtle but important differences. The lasso estimate is defined by, : 2 n isso = arg min > yi-Bo 38, xy B iso j=l subject to 3 Bil 0 is a small number that forces the algorithm to make small jumps ee is ‘A Guide for Engineering Smae" \ ig Machine Learning 3-22 Supervised Learning : Regression Limitations of gradient descent : © Gradient descent is relatively slow close to the minimum : technically, its asymptotic rate of convergence is inferior to many other methods. * For poorly conditioned convex problems, gradient descent increasingly ‘zigzags' as the gradients point nearly orthogonally to the shortest direction to a minimum point Q.24 Explain steepest descent method. Ans. : ¢ Steepest descent is also known as gradient method, © This method is based on first order Taylor series approximation of objective function. This method is also called. saddle point method. Fig. Q.24.1 shows steepest descent method. x f 3 Xo Fig. Q.24.1 Steepest descent method © The steepest descent is the simplest of the gradient methods. The choice of direction is where f decreases most quickly, which is in the direction opposite to Vf (xj). The search starts at an arbitrary point xq and then go down the gradient, until reach close to the solution. . A Guide for Engineering Students Machine Learning 3-23 Supervised Learning : Regression The method of steepest descent js the discrete analogue of Bradient descent, but the best move is computed using a local minimization rather than computing a gradient. It is typically able to converge in fey, steps but it is unable to escape local minima or plateaus in the objective function. ¢ The gradient, is everywhere perpendicular to the contour lines, A er each line minimization the new gradient is always orthogonal to the previous step direction. Consequently, the iterates tend to zig-zag down the valley in a very inefficient manner. The: method of steepest descent is simple, easy to apply and each jteration is fast. It also very stable; if the minimum’ points exist, the method is guaranteed to locate them after at least an infinite number of iterations. valuation Metrics : MAE, RMSE, R2 Q.25 Define and explain Squared Error (SE) and Mean Squared Error (MSE) w.r.t regression. ‘Ans. : © The most common measurement of overall error is the sum of the squares of the errors, or SSE (sum of squared errors). The line with the smallest SSE is called the least-squares regression line. © Mean Squared Error (MSE) is calculated by taking the average of the square of the difference between the original and predicted values of the data. It can also be called the quadratic cost function or sum of squared errors. ¢ The value of MSE is always positive or greater than zero. A value close to zero will represent better quality of the estimator/predictor. A" MSE of zero (0) represents the fact that the predictor is a perfett predictor. aif MSE = N 2 (Atal values - Predicted values)* 6 ‘A Guide for Engineering Sie" Machine Learning 3-24 Supervised Learning : Regression Y 3 > = Residual, _7— Regression / error eae Best-ft line ny 8 oY , Sum ( N x—e Fig. Q.25.1 Representation of MSE ¢ Here N is the total number of observations/tows in the dataset. The sigma symbol denotes that the difference between actual and predicted values taken on every i value ranging from 1 to n. Mean squared error is the most commonly used loss function for regression. MSE is sensitive towards outliers and given several examples with the same input feature values, the optimal pfediction will be their mean target value. This should be compared with Mean Absolute Error, where the optimal prediction is the median. MSE is thus good to use if you believe that your target data, conditioned on the input, is normally distributed around a mean value, and when it's important to penalize outliers extra much. * MSE incorporates both the variance and the bias of the predictor. MSE also gives more weight to larger differences, The bigger the error, the more it is penalized. * Example : You want to predict’ future house prices. The price is a continuous value, and therefore we want to do regression. MSE can here be used as the loss function Q.26 How the performance of a regression function is measured ? Ans. : » Following are the performance metrics used for evaluating a regression model : a) Mean Absolute Error (MAE) b) Mean Squared Error (MSE) A Guide for Engineering Studeints Machine Learning 3-25 Supervised Learning Revressy ©) Root Mean Squared Error (RMSE) d) Resquared ©) Adjusted R-squared 4. MAE : * MAE is the sum of absolute differences between our target an predicted variables, So it measures the average magnitude of eros in set of predictions, without considering their directions, i MAE wees 2, Mean Squared Error (MSE) : a ere MSE nd Ii) 3. Root Mean Square Error (RMSE) Root Mean Square Error (RMSE)is a standard way to measure the error of a model in predicting quantitative data. " Lg 9. \2 RE » (vi-vi? i=l 4, R-squared © R-squared is also known as the coefficient of determination. This metrie gives an indication of how good a model fits a given dataset. It indicates how close the regression line is to the actual data values. © The R squared value lies between 0 and 1 where 0 indicates that this model doesn't fit the given data and 1 indicates that the model fis perfectly to the dataset provided. . Resquared = 1 — Fist sum of errors Second sum of errors w= 1 SSes_,_ Di Wind S Stor yi = swe ‘A Guide for Engineeti"é Su Machine Learning 3-26 Supervised Learning : Regression 5) Adjusted R-squared : * The adjusted R-squated shows whether adding additional predictors improve a regression model or not. Q=R?) W-1) N-p-1 Adjusted R? = 9.27 For a given data having 100 examples, if squared errors SE,, SEz; and SE3 are 13.33, 3.33 and 4.00 respectively, calculate Mean Squared Error (MSE). State the formula for MSE. Ans, : Squared Error ; + Squared Error Mee ae +...+ Squared Error yy ae of data samples Mean Squared Error = 283+855+ 400 = 0.2066 END... & A Guide for Engineering Students Supervised Learning ; Classification 4,1 : Classification : K-Nearest Neighbour, Q.1 What are neighbors ? Why is it necessary to use neares neighbor while classifying ? ‘Ans.: © To find a predefined number of training samples closest in distance to the new point, and predict the label from these. . The number of samples can be a user-defined constant (k-nearest neighbor leaming), or vary based on the local density of points (radius-based neighbor learning). The distance can, in general, be any metric measure : standard Euclidean distance is the most common choice. In the nearest neighbor algorithm, we classified a new data point by calculating its distance to all the existing data points, then assigning it the same label as the closest labeled data point. © Despite its simplicity, nearest neighbors has been successful in a large number of classification and regression problems, including handwritten digits or satellite image scenes. ¢ Being a non-parametric method, it is often successful in classification situations where the decision boundary is very irregular. 7 H 1g © Neighbors-based classification is a type of instance-based Hearing * non-generalizing leaming : it does not attempt to construct @ 2° internal model, but simply stores instances of the training data- ares! © Classification is computed from a simple majority vote of the me neighbors of each point : a query point is assigned the data a Me has the most representatives within the nearest neighbors of the ro ie ee eee @-) Machine Learning 4-2 Supervised Learning : Classification © The basic nearest neighbors classification uses uniform weights : that is, the value assigned to a query point is computed from a simple majority . vote of the nearest neighbors. © Under some circumstances, it is better to weight the neighbors such that nearer neighbors contribute more to the fit. This can be accomplished through the weights keyword. * The default value, weights = ‘uniform’, assigns uniform weights to each neighbor. weights = ‘distance’ assigns weights proportional to the inverse of the distance from the query point. © Alternatively, a user-defined function of the distance can be supplied which is used to compute the weights, Q.2 Explain KNN algorithm with its advantages and disadvantages. Ans.: ¢ The k-nearest neighbor (KNN) is a classical classification method and requires no training effort, critically depends on the quality of the distance measures among examples. © It is the simplest machine leaming algorithms based on supervised learning technique. KNN algorithm assumes the similarity between the new data and available data and put the new data into the category that is most similar to the available categories. The K-nearest neighbour classification is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data, Fig. Q.2.1 shows KNN. ¢ KNN stores all available data and classifies a new point based on the similarity. This algorithm also used for regression ‘as well. as for Classification but mostly it is used for the classification problems. ¢ KNN is a non-parametric algorithm, which means it does not make any assumption on underlying data. It is also called a lazy leamer algorithm because it does'not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset, A Guide for Enginesring Students Machine Learning ir Classi fica, Class A : e Class B @ Xp Fig, Q.2.1 KNN © KNN, k is the number of nearest neighbors. The number of neighbors the core deciding factor. k is generally an odd number if the number classes is 2. When k = 1, then the algorithm is known as the nearey neighbor algorithm. © KANN algorithm gives user the flexibility to choose distance while building k-NN model. a), Euclidean distance) Hamming distance | c) Manhattan distance d) Minkowski distance ‘The performance of the KNN algorithm is influenced by three main factors : 1. The distance function or distance metric used to determine the nese! neighbors. 2. The decision rule used to derive a classification from the kenearts neighbors. ee | 3. The number of neighbors used to classify the new ¢ Advantages : 1. The KNN algorithm is very easy to implement. | 2. Nearly optimal in the large sample limit. 3. Uses local information, which can yield highly aday 4, Lends itself very easily to parallel implementations. ptive behavior Disadvantages : 1, Large storage requirements. 2. Computationally intensive recall. a ‘A Guide for ‘Engineeti"s fachine Learning 4-4 Supervised Learning : Classification 2.3 Let on a scale of 1 to 10 (where 1 is lowest and 10 is highest), a student is evaluated by internal examiner and external examiner and accordingly student result can be pass or fail. A sample data is collected. for 4 students. If a new student is rated by internal and external examiner as 3 and 7 respectively (test instance) decide new student's result using KNN classifier. Student No. | (x,) Rating by | (x,) Rating by | (y) Result internal external examiner examiner ei 7 7 Pass S> 7 4 Pass 83 3 4 Fail Sy 1 4 Fail Snew 3 7 2 Anns. +, X;_= Input vector of dimension 2 = {x;1,X;p} where, xq = Rating by intemal examiner Xig = Rating by external examiner For i = 1, 2,3, 4 (ie. 4 mimber of sample instances) a) y; = Result of a student and y; € {pass, fail} b) Xo = Ko1-X02) = GB, 7) Step 1: Let K = 3 (because number of classes = 2 and K should be odd) Step 2 : Calculation of Euclidean distance between xg and x1,x2,%3/X4 xo1)* é g # a fi Number of features in input vector = 2 A Guide for Engineering Students dy = 3.60 Step 3: Arranging all above distances in non decreasing order. (d39,d40, dy9,dg9) = (3; 3.60, 4, 5) Step 4 : Select K = 3 distance from above as (439, 449/40) (3, 3.60, 4, 5) Step 5 1 Decide instances corresponding to 3-nearest instances. For give test instance 3, nearest neighbors are 3°4 student, 4" student, 1®* studen, ie. (3, 4, fail), (1, 4, fail), (7, 7, pass). Step 6.: Decide Kpage and Kyat Kpass = 1 Kp = 2 Kral > Kpass Step.7 : New student or test instance xq is classified to "fail" becaus? Kya is maximum. S ering Smt ‘A Guide for Engi Machine Learning 4-6 Supervised Learning : Classification 4.2 : Support Vector Machine Q.4 What do you mean by SVM ? Explain with example. F SG [SPPU : May-22, Marks 8] Ans. : © Support Vector Machines (SVMs) are a set of supervised learning methods which learn from the dataset and used for classification. © Support vector machines are supervised machine learning algorithms, and they are used for classification and regression analysis. The SVM performs both linear classification and nonlinear classification. ¢ The nonlinear classification is performed using the kernel function. In nonlinear classification, the kernels are homogenous polynomial, complex polynomial, Gaussian radial basis function, and hyperbolic tangent function. © SVM finds a hyperplane to separate the inputs into separate groups. There can be many hyperplanes that successfully divide the input vectors. To optimize the solution and find the optimum hyperplane, Support vectors are used. © The points closest to the hyperplane are known as support vectors. In SVM, the hyperplane having maximal distance from support vectors is chosen as the output hyperplane. The hyperplane and support vectors are shown in Fig. Q.4.1. ‘Support vectors Fig. 0.4.1 A Guide for Engineering Students ’ lassigi tia a 6 So as it's a far 2d area so by using just USiNE @ Straight ling, ble to without difficulty separate those instructions. But there’ multiple lines that may separate those lessons. Consider the ia picture + Supervised Learning ; Machine Learning Fig. Q.5.2 Linear SVM understanding © Hence, the SVM algorithm helps to discover the first-rate line o selection boundary; this exceptional boundary or region is known 352) hyperplane. SVM algorithm unearths the nearest point of the traces from each of the lessons. These points are referred to as guide vectors. Tht distance between the vectors and the hyperplane is called the magi And the purpose of SVM is to maximise this margin. The hyperlét* With maximum margin is known as the most suitable hyperplane. 6 Explain non linear SVM with examples, c3°[SPPU : May-19, Marks 4 Ans. : Non-linear SVM : Nonlinear SVM is used for niet Separated facts, because of this if a dataset can't be labelled throvgt ! usage of a directly lin, then such facts is called as non-linear informs!” | and classifier used is known as non-linear SVM classifier. | — ng st A Guide for Engineet"6 Machine Learning 4-10 Supervised Learning : Classification © If information is linearly arranged, then we can separate it through using a directly line, however for non-linear information, we can not draw an unmarried directly line. Consider the beneath picture : x Fig. Q.6.1 Non linear SVM © So to separate these data points, we need to feature one greater size. For linear information, we've used dimensions x and y, so for non-linear information, we will upload a 3°4 dimension z. It can be calculated as : 2= % +y2 © By adding the third measurement, the sample area will become as below photograph : a . Fig. Q.6.2 Non linear SVM with third measurement A Guide for Engineering Students oN 4-11 Supervised Learning : Cias,, Sificati ‘On Machine Learning So now, SVM will divide the datasets into instructions Within following way. Consider the under photo : the Fig. @.6.3 Datasets representation Q7 Explain key properties of support vector machine. Ans. : 1, Use a single hyperplane which subdivides the space into two half-spaces, one which is occupied by-Class 1 and the other by Class a 2. They maximize the margin of the decision boundary using quadratic optimization techniques which find the optimal hyperplane. 3. Ability to handle large feature spaces. 4. Overfitting can be controlled by soft margin approach + When used in practice, SVM approaches frequently map the examples to a higher dimensional space and find margin maximal hyperplanes i the mapped space, obtaining decision boundaries which are mt hyperplanes in the original space. 6. The most popular versions of SVMs use non-linear kernel func and map the attribute space into a higher dimensional spa? facilitate finding "good" linear decision boundaries in the modified space. de ‘A Guide for Enginecrtne 5 Machine Learning 4-12 Supervised Learning : Classification Q8 Explain applications and limitations of SVM. Ans. : SVM applications * SVM has been used successfully in many real-world problems, 1, Text (and hypertext) categorization 2. Image classification 3. Bioinformatics (Protein classification, Cancer classification) 4, Hand-written character recognition 5. Determination of SPAM email. Limitations of SVM 1. It is sensitive to noise. 2. The biggest limitation of SVM lies in the choice of the kemel. 3. Another limitation is speed and size. 4. . The optimal design for multiclass SVM classifiers is also a research area. 4.3 : Ensemble Learning : Bagging, Boosting, Random Forest, Adaboost Q9 Explain ensemble learning. Ans, : © The idea of ensemble learning is to employ multiple leamers and combine their predictions. If we have a committee of M models with uncorrelated errors, simply by averaging them the average error of a model can be reduced by a factor of M. ® Unfortunately, the key assumption that the errors due to the individual models are uncorrelated is unrealistic; in practice, the errors. are typically highly correlated, so the reduction in overall error is generally small. © Ensemble modeling is the process of running two or more related but different analytical models and then synthesizing the results into a single score or spread in order to improve the accuracy of predictive analyties and data mining applications. © Ensemble of classifiers is a set of classifiers whose individual decisions combined in some way to classify new examples. * A Guide for Engineering Students

You might also like