Engineer Being Machine Learning Notes

Uploaded by

rcbkar3

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

78 views

Engineer Being Machine Learning Notes

Uploaded by

rcbkar3

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 95

MACHINE LEARN! NOTES MOST IMPORTANT QUESTIONS OF MACHINE LEAR! -ENGINEER BEING iG AKTU MODULE 1 PART-I Learning is the process of acquiring new understanding, knowledge, beliaviors, skills, values, attitudes, and preferences. Learning is any process by which system improves its performance from experience. Ques2. What is Machine Learning? 2020-21 2M Ans. Machine leaning (ML) is defined as a discipline of artificial intelligence (AT) that provides machines the ability to automatically lea from data and past experiences to identify patterns and make predictions with minimal human intervention. “Machine learning enables a machine to automatically lean from data, improve performance from experiences, and predict things without being explicitly programmed”. ‘Ques3.Difference between ML, AI, Deep Learning? 2020-21 2MArtificial Intelligenes Lis the broadest concept of all, and gives a machine the ability to imitate human Rit poe behaviour. J orcitisqenee Machine Learning: Machine Learning uses 7 Mecine. lem. algorithms and techniques that enable the machines { to learn from past experience/trends and predict the { (Dep output based on that data, their performance improve. \ earring as they are exposed to more data over time Deep Learning: subset of machine learning in which multilayered neural networks learn from > vast amounts of data. ‘The main difference between machine learning and deep learning technologies is of presentation of data. Machine learning uses structured/unstructured data for learning, while deep learning uses neural networks for learning models, applications of ML? ‘Ans. Machine learning is important because it gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine learning a central part of their operations; Machine learning has become a significant competitive differcutiator for many, companies. Applications of ML: 1. Image recognition: a. Image recognition is the process of identifying and detecting an object or a feature in a digital image or video. b. This is used in many applications like systems for factory automation, toll booth monitoring, and security surveillance.2. Speech recognition : a. Speech Recognition (SR) is the translation of spoken words into text. b. It is also known as Automatic Speech Recognition (ASR), computer speech recognition, or Speech To Text (ST). c. In speech recognition, a sottware application recognizes spoken words. 3.Product recommendation Machine learning is widely used by various e-commerce and entertainment companies such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some product on Amazon, then we started getting an advertisement for the same product while internet surfing on the same browser and this is because of machine learning. 4, Email Spam and Malware Filtering: Whenever we receive a new email, itis filtered automatically as important; normaly and spam, We always receive au important mail in ou inbox with the important symbol and spam emails in our spam box, and the technology behind this is Machine-leamning- 5. Stock Market trading: Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up and downs in shares, so for this machine*Ieamning’s long short term memory neural networkis used for the prediction of stock market trendsTypes of Machine Learning: © Supervised Learning © Unsupervised Learning * Reinforcement Learning Supervised learning is the types of machine learning in which machines are trained using well “labelled” training data, and on basis of that data, machines predict the output. The labelled data means some input data is already tagged with the correct output. Ex: Risk Assessment, Image classification, Fraud Detection, spam filtering, etc. Types of Supervised learning © Classifications) classification problem is when the Output variable is a category, such as “red” or “blue” “disease” and “no disease”, Yes-No, MaleFemale, True-false, etc.© ii, Regression: A regression problem is when the output variable is a real value, such as, Forecasting sales, Weather forecasting, ete. Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision The goal of unsupervised learning is to find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format. + The output is dependent upon the coded algorithms. a ob Rag 200 + Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. ‘© "Association: Association rule learning is a kind of unsupervised learning technique that tests for the reliance of one data element on another data element and design appropriately so that it can be more cost- effective. It tries to discover some interesting relations or associations between the variables of the dataset. Semi Supervised learning is between the supervised and unsupervised learning families. The semi-supervised models use both labeled and unlabeled data for trainingReinforcement Learning is a feedback-based Machine learning technique in which an agent leams to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback. and for each bad action. the agent gets negative feedback or penalty. ‘The main elements ofan RL system are: + The agent or the learner +The environment the agent interacts |= Renate aed with = i J Net +The policy that the agent follows to take actions Dig Ko +The reward signal that the agent observes upon taking action GENETICATGORITHM TRANITIONAT AT GORTTHM. A genetic algorithm isa search-based _| Traditional Algorithms refers algorithm used for solving optimization | to general algorithms we use to solve problems in machine learning. problems. It is a methodical procedure to solve a given problem. There can be several algorithms to solve a problem. More Advanced Not as Advanced Used in ML, AT Used in Programming, Math,1) Process Complexity of Machine Learning The machine leaning process is very complex, which is also another major issue faced by machine leaming engineers and data scientists. There is the majority of hits and trial experiments; hence the probability of error 1s higher than expected. Further, it also includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, etc., making the procedure more complicated and quite tedious. 2) Getting bad recommendations A machine learning model operates under a specific context which tesiilts in bad recommendations and concept drift in the model. Suppose at a specific \time customer is looking for some gadgets, but now customer requirement changed over time but still machine leaming. model showing same recommendations to the customer while customer expectation has been changed. This incident is called a Data Drift. However, we can overcome this by regularly updating and monitoring data according to the expectations. 3) Overfitting and Underfitting Overfitting: Overfitting is one of the most common issues faced by Machine Learning engineers and data scientists, Whenever a machine learning model is trained with a huge amount Of data, it starts capturing noise and inaccurate data into the training data set. Underfitting: Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and destroys the accuracy of the machine learning model. 4) Inadequate Traitiing Data The major issue that comes While Using machine leaning algorithms is the lack of quality as well as quantity of data. Although data plays a vital role in theprocessing of machine leaming algorithms, many data scientists claim that inadequate data, noisy data, and unclean data are extremely exhausting the machine learning algorithms. For example, a simple task requires thousands of sample data, and an advanced task such as speech or image recognition needs millions of sample data examples. Further, data quality is also important for the algorithms to work ideally, but the absence of data qnality is alsa found in Machine I earning applications 5) Monitoring and maintenance As we know that generalized output data is mandatory for any machine learning model. Hence, regular monitoring and maintenance become compulsory for the same. Different results for different actions require data change; hence editing of codes as well as resources for monitoring them also become necessary; Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving, Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have ‘multiple branches, whereas Leaf nodes are’ the Output! Of those decisions and do not contain any further branches. © In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. representation :You Tube ENGINEER BEINGMachine Leaming ANN Clustering Reinforcement Leat Decision Tree Leamin Bayesian Networks SVM (Support Vector Mac Genetic Algorithms 2020-21 10M The term "Artificial Neural Network" is derived from Biological neural networks — that develop cS oe a a pate brain. Similar to the human brain that has 0 neurons . These+ The architec Hon cir aewtas Biological Network ificial Ne letwork Si Dent ‘Synapse iterconnect ‘Axon Output In a neural network, there are three essential layers — Input Layers Tupe The inpns layer is the first layer of an ANN that receives the input information in the form of various texts, numbers, audio files, image pixels, etc. idden Havers In the middle ofthe ANN model are the hidden layers. There can bea singleOutput Layer In the ouput layer, we obtain the result that we obtain through rigorous computations performed by the middle layer: Artificial Neural Networks Application problems to apply: Following are the important Artificial Neural Networks applications Handwritten Character Recognition ANNS are used for handwritten character recognition. Neural Networks are trained to recognize the handwritten characters which ean be in the form of letters or digits Facial Recognition In order to recognize the faces based on the identity of the person, we make use of neural networks. They are most commonly used in areas where the users require security access. Speech Recognition ANNs play an important role in speech recognition. The earlier models of Speéch Recognition were based on statistical models like Hidden Markov Models. With the advent of deep learning. various types of neural networks are the absolute choice for obtaining an accurate classificetion. 2020-21 10M (UNIT 2) SVM or Support Vector Machine isa linear model for classificationsand regression problems. It can solve linear and non-linear problems and work well for many practical problems:ccording to the SVM algorithm we find the points closest to the line from both the classes. These points are called support vectors. we compute the distance between the line and the support vectors. This distance is called the margin. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is the optimal hyperplane. Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible. 202 1 10M Clustering © Away of grouping the data points into different clusters, consisting of similar data points, The objects with the possible similarities remain in a group that has less or no similarities with another group." © Itis an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals with the unlabeled dataset. * After applying this clustering technique, each cluster or group is ptovided with a cluster-ID. ML system can use this id to simplify the processing of large and complex datasets. © The clustering technique is commonly used for statistical data analysis. Example = Clustering technique with the real-world example of Mall: When we visit any shopping mall, we can observe that the things with similar usage are grouped together. Such as the t-shirts are grouped in one section, and trousers are at other sections, similarly, at vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that we can easily find out the things. The clustering technique also works in the same way.Classification and Regression Regression and Classification algorithms are Supervised Learning algorithms. Both the algorithms are used for prediction in Machine learning and work with the labeled datasets. But the difference between both is how they are used for different machine learning problems. Classification Regression Classification algorithms are used to predict/Classify the discrete values such as Male or Female, True or False, Spam or Not Spam, ete. Regression algorithms are used to predict the continuous values such as price, salary, age, etc. The task of the classification algorithm is to map the input value(x) with the discrete output variable(y). The task of the regression algorithm is to map the input value (x) with the continuous output variable(y). Classification Algorithms are used with discrete data. Regression Algorithms are used with continuous data. The Classification algorithms can be divided into Binary Classifier and Mulli-class Classifier. The regression Algorithm can be further divided into Linear and ‘Non-linear Regression. Classification Algorithms can be used to solve classification problems 'suchas Identification of spam emails, Speech Recognition, Identification of cancer.cells, etc. Inj Email Spam Detection, the model is trained on the basis of millions of emails on different parameters) and whenever it receives a new email, it identifies whether the'email is spam/ornot. If the email is spam, then it is moved to the Spamifolder Regression algorithms can be used to solve the regression problems such as Weather Prediction. House price prediction, etc. Suppose we want to do weather forecasting, so-for thisywerwill use the Regression algorithm. In weather prediction, the model is frainedon the past datay and/once the training is completed, it can easily predict the weather for future days.2021-22 2M A learning problem is said to be well defined if it has three features: the class of tasks, the measure of performance to be improved, and the source of experience Ex: A checkers learning problem ~Task T: playing checkers —Performance measure P: percent of games won against opponents Trait g experience B. playing practice games against itsell "Data Science is a field of deep study of data that includes extracting useful insights from the data, and processing that information using different tools, statistical models, and Machine learning élgorithms.” Machine Leaning allows the computers to learn from the past expericnees by its own, it uses statistical methods to improve the performance and predict the output without being explicitly programmed. Or Design the final design of Checkers Learning Program 2021-22 10M Learning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes, and preferences. Learning is any process by which a system improves its performance from experience. Designing a Learning System in Machine Learning:Step 1) Choosing the Training Experience: The very important and first task is to choose the training data or training experience which will be fed to the Machine Learning Algorithm. Three attributes are used: 1. Whether the training experience provides direct or indirect feedback regarding the choices made by the performance system. 2. Direct training examples in léaming to play checkers consist of individual checkers board states and the correct move for each. 3. Indirect training examples in the same game consist of the move sequences and final outcomes of various games played in which information about the correctness of specific moves early in the game must be inferred indirectly from the fact that the game was eventually won or lost ~credit assignment problem. 2. The degree to which the learner controls the sequence of training examples. Example: ~The learner might rely on the teacher to select informative board states and to provide the correct move for each ~The learner might itself propose board states that it finds particularly confusing end ask the teacher for the correct move, - Or the learner may have complete control over the board states and (indirect) classifications, as it does when it learns by playing against itself with no teacher present. 3.The representation of the distribution of samples across which performance will be tested is.the third crucial attribute. This basically means the more diverse the set of training experience can be the better the performance can get. Example: If the training experience in play checkers consists only of games played against itself, the leamer might never encounter certain crucial board states that are very likely to be played by the human checker’s champion. Step 2- Choosing target function: ‘To determine what type of knowledge will be learned and how this will be used by the performance program. Example: ~In play checkers, it needs to learn to choose thé best move among those legal moves.Step 3- Choosing Representation for Target function: Once done with choosing the target function now we have to choose a representation of this target function, When the machine algorithm has a complete list of all permitted movements, it may pick the best one using any format, such as linear equations, hierarchical graph representation, tabular form, and so on. Out of these moves, the NextMove function will move the Target move, which will increase the success rate. For example; if achess machine has four alternative moves, the computer will select the most optimal move that will ead to victory Step 4- Choosing Function Approximation Algorithm: In this step, we choose a learning algorithm that can approximate the target function chosen. This step further consists of two sub-steps: a. Estimating the training value, and b. Adjusting the weights. Tew i Cael (eT nes Cerenco. Hap otests Probie Adlitien | -C caibe Exorp les Trace Cort Wisronp) The final design consists of four modules, as described in the picture. 1. The performance system: The performance system solves the given performance task.ENGINEER BEINGMOST IMPORTANT QUESTIONS MACHINE LEARNI AKTU iIGINEER BEING MODULE 2 PART-I a 2020-21 10M Or Discuss Support vectors im SVM. 2020-21 2M Or iia ey 2020-21 10M SVM or Support Vector Machine is a linear model for classification and regression problems, It can solve linear and non-linear problems and work well for many practical problems. It tries to classify data by a hyperplane that maximizes the margin between the classes in the training data. Hence, SVM is an example of a large margin classifier. ‘The idea of SVM is simple: The algorithm creates a line or a hyperplane which separates the data into classesAccording to the SVM algorithm we find the points closest to the line from both the classes. These points are called support vectors. we compute the distance between the line and the support vectors. This distance is called the margin. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is the optimal hyperplane. Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible. SVM KERNELS. * SVM can work well in non-linear data cases using kernel trick. * The function of the kernel trick is to map the low-dimensional input space and transforms into a higher dimensional space. + In simple words, kemel converts non-separable problems into separable problems by adding more dimensions to it. + It makes SVM more powerful, flexible and accurate.Dennen erent S oe THREE TYPES OF KERNEL 1)Linear Kernel: A linear kernel can be used as normal dot product ofany two given observations, The equations for the kernel function: K(x, xi)=sum(x« xi) 2)Polynomial kernel: It is more generalized form of linear kernel and distinguish curved or nonlinear input space. Itis popular in image processing. Following is the formula for polynomial kernel — k(X, Xi)=1+sum(X« Xi)*d , d is the degree of the polynomial 3)Gaussian Radial Basis Function (RBF) Kernel: RBE kernel, mostly used in SVM classification, maps input space in indefinite dimensional space:It is a general-purpose kernel; used when there is no prior knowledge about the data Following formula explains it mathematically : K(x, xi)-exp(-gamma * sum(x-xi"2)) Gamma funetion: 1/20? APPLICATIONS OF KERNEL * Face detection — SVM classify parts of the image as a face and non-face and create a square boundary around the face. ¢ Handwriting recognition — We use SVMs to recognize handwritten characters used widely. ¢ Texture Classification using SVM- In this SVM application, we use the images of certain textures and use that data to classify whether the surface is smooth or not. + Stenography Detection in Digital Images Using SVM, we can find out if an image is pure or adulterated. This could be used in security-based organizations to uncover secret messages. Yes, we can encrypt messages in high-resolution images. In high-resolution images, there are more pixels, hence, the message is more hard torfind. We can segregate the pixels and store in data‘in various datasets. We can analyze those datesets using SVM. PROPERTIES OF SYM: 1. Flexibility:in-choosing-a similarity function2. Sparseness of solution when dealing with large data sets- only support vectors are used to specify the separating hyperplane Ability to handle large feature spaces- complexity does not depend on the dimensionality of the feature space 4. Overfitting can be controlled by soft margin approach (we let some data points enter our margin intentionally) s. A simple convex optimization problem which is guaranteed to converge to a single global solution, DISADVANTAGES OF SVM: 1. SVM algorithm is not suitable for large data sets because the required training time is higher. 2. SVM does not perform very well when the data set has more noise ic. target classes are overlapping. 3. In cases where the number of features for each data point exceeds the number of training data samples, the SVM will underperform, 4, SVMs with the ‘wrong’ kernel - For SVMs nowadays, choosing the right kernel function is key. As an example, using the linear kemel when the data are not linearly separable results in the algorithm performing poorly. 2020-21 2M Regression is asupervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables.It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship between variables Some examples of regression can be as: © Prediction of rain using temperature and other factors © Determining Market trends Prediction of road accidents due to rash driving. Tt is used to find the trends in data. By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. Linear Regression Logistic Regression Linear Regression is a supervised regression model. Logistic Regression is a supervised classification model. In Linear Regression, we predict the value by an integer number. In Logistic Regression, we predict the value by 1 or 0. It is based on the estimation. Teast square It is based on maximum likelihood estimation. Here when we plot the training datasets, a straight line can be drawn that touches maximum plots. ‘Any change in the coefficient leads toa change in both the direction and the steepness of the logistic function. It means positive slopes result in an S- shaped curve and negative slopes result in a Z-shaped curve. Linear regréssion is used to estimate the dependent variable in case of a change in independent yariables..For example, predict the price of houses. Whereas logistic regression is used to calculate the probability of an event. For.cxample, classify..if tissue is benign or malignant.MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BEING MODULE 2 PART-II 2020-21 2M The vertices and edges in Bayesian Network have some sort of meaning, The network building itself gives you important information about the subject dependence between the variables) With Neural Networks the network structure does not tell you anything like Bayesian Network. Similarity in ANN and Bayesian Network is that they both uses directed graphs. Input ddan Layer Output Outpt See 2 aaa Sean 2021-22 10MOr 2021-22 10M Bayes theorem is one of the most popular machine learning concepts that helps to calculate the probability of occurring one event with uncertain knowledge while other one has already occurred. Bayes Theorem is a way of finding a probability when we know certain other posisbilities. P(X[Y) = P(VIX).P(X) PCY), Which tells us: how often X happens given that Y happens, written P(X/Y), When we know: how ofter Y happens given that X happens, written P(Y/X) and how likely X is on its own, written P(X) and how likely Y is on its own, written P(Y) The above equation is called’as Bayes Rule or Bayes Theorem: o™P(X{Y) is called as posterior, which we need to’ealculate!"It is defined as updated probability after considering the evidence. P (Y|X) is called the likelihood. It is the probability of evidence when hypothesis is true. c P(X) is called the prior probability, probability of hypothesis before considering the evidence © P(Y) is called marginal probability. It is defined as the probability of evidence under any consideration.Hence, Bayes Theorem can be written as posterior = likelihood * prior / evidence EXAMPLE: Dangerous fires are rare (1%) But smoke is fairly common (10%) due to barbecues. And 90% of dangerous fires make smoke We can then discover the probability of dangerous fire when there is no smoke: P(Fire/Smoke) = P(Fire) P(Smoke/Firey/P(Smoke) =(1% * 90% )/ 10% =9% Naive Bayes Classifier Algorithm Naive Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems. It is mainly used in ‘ext classification that includes a. high-dimensional training dataset. It is a probabilistic classifier, which means it predicts on the basis of the probability of an object Some popular examples of Natve Bayes Algoridn are spam filtration, Sentimental analysis, and classifying articles. The distinction between Bayes theorem and Naive Bayes is that Naive Bayes assumes conditional independence where Bayes theorem does not. This means the relationship between all input features are independent Working of Naive Bayes! Classifier: Working of Naive Bayes! Classifier can be understood with the help of the below example:Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we should play or not on a particular day according ‘o the weather conditions. So to solve this problem, we need to follow the below steps: 1. Convert the given dataset into frequency tables. 2. Generate Likelihood table by finding the probabilities of given features. 3. Now, use Bayes theorem to calculate the posterior probability. Problem: If the weather is sunny, then the Player should play or not? | Outlook a Play 0 Rainy Yes 1 Sunny Yes 2 Overcast Yes 3 ‘Overcast Yes 4 Sunny No 5 Rainy Yes 6 Sunny Yes 7 Overcast Yes 8 Rainy No 9 Sunny No 10 Sunny Yes 1 Rainy No 12 Overcast Yes 13 a 40a eS Likelihood Table:Frequency Table: Weather Yes No. Overcast Rainy Sunny 3 Total 10 ‘Applying Bayes Weather No es Overcast 0 Rainy B, Sunny Zz 3 All 4/14=0.29 10/14=0.71 P(¥es | Sunny)— PSunny | Yes)*P(Ves)/P(Sunny) P(Sunny | Yes)=3/10=03 P(Sunny)=0.35 P(Yes)-0.71 So P(Yes | P(No | Su P(Sunny | NO) = 2/40: .71/0.35= 0. (0)*P(No) nny)P(No)= 0.29 P(Sunny)= 0.35 AD So P(No | Sunny)= 0.5*0.29/0.35 = 0.41 So as we can see from the above calculation that P(Yes | Sunny)>P(No | Sunny) Hence on a Sunny day, Player can play the game. Ques 3) what problem docs EM algorithm solves? 10M 2021-22 Or what are task of E-stepsin EM Algorithm? 2M 2020-21 The Expectation-Maximization (EM) algorithm is defined as the combination of various unsupervised machine learning algorithms, which is used to determine the local maximum likelihood estimates (MLE) or maximum a posteriori estimates (MAP) for unobservable variables in statistical models. it is a technique to find maximum likelihood estimation when the latent variables are present. It is also referred to as the latent variable model. A latent variable model consists of both observable and unobservable variables where observable can be predicted while unobserved are inferred from the observed variable. These unobservable variables are known as latent variables Steps in EM Algorithm The EM algorithmyis completed)mainly in™4"steps, which include Initialization ‘Step, Expectation Step, Maximization Step, and convergence Step.G2D—> Initiar Values [fe Eee | Monson wetion Sep 1S naan Sapp 1* Step: The very first step is to initialize the parameter values. Further, the system _is provided with incomplete observed data with the assumption that data is obtained from a specific model. 2" Step: This step is known as Expectation or E-Step, which is used to estimate or guess the values of the missing or incomplete data using the observed data. Further. E-step primarily updates the variables. 3" Step: This step is known as Maximization or M-step, where we_use complete data obtained from the 2™ step to update the parameter values. Further, M-step primarily updates the hypothesis. 4" step: The last step is to check if the values of latent variables are Converging or not. If.it-gets "yes", then stop the” process "else, repeat the process from step 2 until the convergence occurs.MOST IMPORTANT QUESTIONS MACHINE LEAR! AKTU -ENGINEER BEING MODULE 3 PART-I If we depend too much on the training data while drawing the decision tree, there is a possibility that the tree will go into overfitting. That is, a particular hypothesis will work good on the training data, but it doesn’t work good on Testing or the real world data So such tree is calledas a overfitting. Underfitting occurs when our machine learning model is not able to capture the underlying trend of the data. In the case of underfitting, the model is not able to learn enough from the training data, and hence it reduces the accuracy and produces unreliable predictions. a. overfitting the data b. handling continuous valued attribute c. handling missing attribute values d. handling attributes with different costsAns: a. overfitting the data If we depend too much on the training data while drawing the decision tree, there is a possibility that the tree will go into overfitting. That is, a particular hypothesis will work good on the training data, but it doesn’t work good on Testing or the real world data So such tree is called as a overfitting, This particular overfitting can be addressed with the two techniques reduced error pruning post rule pruning. The decision tree works well with the problems where we have fixed number of attributes and the discrete number of possibilities for each attributes. Ifa particular attribute has the continuous values, then we cannot apply the decision tree directly. First, we need to convert those particular attributes which are having continuous values into a discrete possibilities. Then only we can apply decision tree learning. if you have some missing attributes, we need to fill those particular missing attributes with a proper values then only we can use this learning. Let us Say that a particular attribute is not having a value, we need to find some value or fill it with the proper value Whenever we apply decision tree algorithm, each and every attribute in the given equal importance. But sometimes what happens is @ given problem definition, there is a possibility that a particular attribute may have more importance or it is given more weightage, In such case we cannot use the core decision tree learning. We need to handle this particular issue with some sort of calculation.| Ca-4)+ (4-4 ir r (3-37 + G ie ay 2 - ee GAMES Wplis se | BEINGil | | un ts. hit | ENGINEER BEINGSEGEFvesRavaR ly In Decision Tree the majo root node in each level. This popular attribute selection mea: 1. Information Gain 2. Gini Index Information Gain When we use a node in a decision tree to partition the training instances into smaller subsets the entropy changes. Information gain is a mcasure of this change inentropy Gain(S.A)= Entropy(S) ~ y-vatuents) Sh.-Entropy (Se) Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) BEINGesse] Teli ls] 222299929299 29 pajass}aesi}}a| gan BABSBSAREERENG ot BEN iseassscere ENGINEER BEL ISSRABABABLEREAacess assageeeg|ge22eeeee228 Se gF920 77122 sessTs) PFE EEE RE! pppasagays ai}ipals BRB ASRABRAEE 3 | BRBASRRERAR EEREMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BEING MODULE 3 PART-II different from radial basis function network? 2020-21 10M Ans: Instance-based learning refers to a family of techniques for classification and regression, which produce a class label/predication based on the similarity of the query to its nearest neighbor(s) in the training set. Some of the instance-based learning algorithms are : 1. K Nearest Neighbor (KN) 2. Locally Weighted Learning (LWL) 3. Case-Based Reasoning Locally weighted regression Locally weighted linear regression is a non-parametric algorithm, that is, the model does not learn a fixed set of parameters as is done in ordinary linear regression. Rather parameters are computed individually for each query point, Locally weighted regression (LWR) is a memory-based method that performs a regression around a point of interest using only training data that are “ocal" to that point. Locally weighted linear regression is'a supervised learning algorithm. There exists No training phase. All the work is done during the testing phase/while making predictions. Locally weighted regression methods are a generalization of k-Nearest Neighbour:* In Locally constructed Radial basis funct RBF network on layer, and an outpt Input Layer ‘The input layer simply feeds the data to the hidden layers. As d FSSUlE, thé RuMber of neurons in the input layer should be equal to the dimensionality of the data. Hidden Layerematically Output Layer The output layer uses a linear activatio regression tasks.In general, the case-based reasoning process entails: Retrieve- Gathering from memory an experience elosest to the current problem. 2. Reuse- Suggesting a solution based on the experience and adapting it to mect the demands of the new situation. 3. Revise- Evaluating the use of the solution in the new context. 4, Retain- Storing this new problem-solving method in the memory system A CADET system employs case based reasoning to assist in the conceptual design of simple mechanical devices such as water faucets. It uses a library containing approximately 75 previous designs and design fragments of two suggest conceptual designs to meet the specifications of new design problem 4 aT T+ terpet Fo, © 81 1, [) 8 entiiow + at lt Sat = r 1S Qt igs ©. The function is represented in terms of qualitative relationships among the water flow levels and temperatures at its inputs and outputs;° In the functional description, an arrow with a “+” labeled indicates that the variable at the arrow head increases with the variable at its tail. A “-” label indicates that the variable at the head decreases with the variable at the tail. o Here Qe refers to the flow of cold water from the into the faucet, Qh to the input flow of hot water, and Qm to the single mixed flow out of the faucet. o Te, Th, Tm refers to the temperature of the cold water, hot water and mixed water respectively. ° The variable Ct denotes the control signal for temperature that is input to the faucet and Cf denote the control signal for water flow. ° The control Ct and Cfare to influence the water flow Qc and Qh, thereby indirectly influencing the faucet output flow Qm and temperature Tm. 2021-22 10M Ease of knowledge elicitation : Lazy methods can utilise easily available case or problem instances instead of rules that are difficult to extract. Absence of problem-solving bias: Cases can be used for multiple problem- solving purposes, because they are stored in a raw forms"This in contrast to eager methods, which can be used merely for the purpose for which the knowledge has already’been compiled. Incremental learning : A CBL system can be put into operation with a minimal set solved cases furnishing the case bases The case base will besfilled with new cases increasing the system’s problem-solving ability Ease of maintenance : This is particularly due to the fact that CBL systems can adapt to many changes in the problem domain and the relevant environment, merely by acquiringEase of explanation: The results ofa CBL system can be justified based upon the similarity of the current problem to the reirieved case.CBL are easily traceable to precedent cases, it is also easier to analyse failures of the system. For example, CASEY for classification of auditory impairments, CASCADE for classification of software failures 2021-22 The inductive bias (also known as leamning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. In machine learning, one aims to construct algorithms that are able to learn to predict a certain target output. Inductive learning methods require a certain number of training examples to generalize accurately. Analytical learning stems fromthe idea that when not enough training examples are provided, it may be possible to “replace” the “missing” examples by prior knowledge and deductive reasoning. 2021-22request is made. on of the target provided training + Lazy learningING AKTU INEER BEING Perceptrons are the buildin learning algorithm of binary The perceptron consists of 4 parts. 1. Input values or One input layer 2, Weights and Bias 3. Net sum 4. Activation Function a. All wil bAc. Apply that weighted sum to the correct Activation Function. Weights shows the strength of the particular node. A bias value allows you to shift the activation function curve up or down. In short, the activation functions are used to map the input between the required values like (0, 1) or (-1, 1), Perceptron 1s usually used to classify the data into two parts. Iheretore, tt 1s also known as a Linear Binary Classifier. ‘Ques 2)What is Gradient descent? 2021-22 2M Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks, to find a local minimum/maximum of a given function This method is commonly used in machine learning (ML) and deep tearning(DL) to minimize a cost/loss function. 2021-22 2M In machine learning, the delta rule is a gredient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network. Itis a special case ot the more general backpropagation algorithm. ‘Ques:4) Describe BPN algorithm in ANN along with a suitable example. 2020-21 10MBack-propagation is used for the (raining of neural network. U C The Backpropagation algorithm looks for the minimum value of the error function in weight space using a technique called the delta rule or gradient descent. Tau aitificial neural nctwork, the values of weights aud Liases ave randuuily initialized. Due to random initialization, the neural network probably has errors in giving the correct output. We need to reduce error values as much as possible. So, for reducing these error values, we need a nen that can compare the desired output ofthe neural network withthe.n e a a and biases su For this, we and biases.Backpropagation is a short form for "backward propagation of errors." It is a standard method of training artificial neural networks. Backpropagation Algorithm: Step 1: Inputs X, arrive through the preconnected path. Step 2: The input is modeled using true weights W. Weights are usually chosen randomly. Step 3: Calculate the output of cach neuron fiom the input layer dhe hidden layer to the output layer. Step 4: Calculate the error in the outputs Backpropagation Error Actual Output — Desired Output Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the error. Step 6: Repeat the process until the desired output is achieved. Why We Need Backpropagation? Most prominent advantages of Backpropzgation are: + Backpropagation is fast, simple and easy to program + It isa flexible method as it does not require prior knowledge about the network + It is a standard method that generally works well + It does not need any special mention of the features of the function to be leamed, ‘Types of Backpropagation Networks Two Types of Backpropagation Networks are: + Static Back-propagation + Recurrent Backpropagation The output two runs of @ neural network compete among themselves to become active. Several output neurons may be active, but in competitive only single output neuron is active at one time2020-21 10M. Self OrganizingTo determine the best matchin and calculate the Euclidean dist i and current input vector. The node with tor closest to the inpt tagged as the winning neuron. Step 4: Find the new weight between input vector sample and winning output Neuron. New Weights = Old Weights + Learning Rate (Input Vector — Old Weights) Step 5: Repeat st e eel weight are similar to old we map stop clConvolutional Neural Networks (CNNs) are specially designed to wo images. Convolutional Neural Networks (CNNs) are specially desigr with images. An image consists of pixels. In deep learning, images are represented as arrays of pixel values. There are three main types of layers in a CNN: ¢ Convolutional layers ° 5 Pooling layers Tn additic ti er and fully connThere are four main types of operations in a CNN: Convolution operation, Pooling operation, Flatten operation and Classification (or other relevant) operation. Convolutional layers and convolution operation: The first layer in a CNN is a convolutional layer. It takes the images as the input and begins to process. ‘There are three elements in the convolutional layer: Input image, Filters and Feature map Section (3x3) Convolution operation between the image and filter, of spots 2}/0/1/0 4/3 o[1[3]2 ales 1) tts 3 |} 2 0};0);0)1 3)1 ol1[afo Feature map (axa) Input image Convolutional (6x6) operation Fil : This is also called Kernel or Feature Detector. Image section: The size of the image section should be equal to the size of the filter(s) we choose. The number of image sections depends on the Stride. Feature map: The feature map stores the outputs of different convolution operations between different image sections and the filter(s). ‘The number of steps (pixels) that we shift the filter over the input image is called Stride.Padding adds additional pixels with zero values to each side of the image. That helps to get the feature map of the same size as the input. Padding=t Pooling layers and pooling operation Pooling layers are the second type of layer used in a CNN. There can be multiple pooling layers in a CNN. Each convolutional layer is followed by a Padded Input mage pooling layer. So, convolution and pooling layers are es) used together as pairs. It Reduce the dimensionality (number of pixels) of the output returned from previous convolutional layers. There are three elements in the pooling layer: Feature map, Filter and Pooled feature map There are two types of pooling operations. +) Max pooling: Get the maximum value in the area where the filter is, applied. + Average pooling: Get the average of the values in the area where the filter is applied. Then, we can flatten a pooled feature map that contains multiple channels. Fully connected (dense) layers2020-21 10M.Size of kernel or filter is 3*3 hence the size of image section is also 3*3 (lle PAN foi O- (t-kits 1D4 TXL4 LXE t0K0 OX( 1 OKO + 1KO. = |IMOtKl +ixt “LnputYou Tube ENGINEER BEINGMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU MODULE 5 PART-I REINFORCEMENT LEARNING — Introduction to Reinforcement Learning , Learning Task, Example of Reinforcement Learning in Practice, Learning Models for Reinforcement — (Markov Decision process , Q Learning - Q Learning function, Q Learning Algorithm ), Application of Reinforcement Learning, Introduction to Deep Q Learning. GENETIC ALGORITHMS: Introduction, Components, GA eyele of reproduction, Crossover, Mutation, Genetic Programming, Models of Evolution and Learning, Applications. Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty. The elements of reinforcement leaming are: Agent, Environment, Action, State, Policy, Reward. Leaming Models in RL: © Markov Decision Process © Q-Learning Algorithm © Deep Q LeamingThe Markov Property state that : “Future is Independent of the past iven the present” Mathematically we ean express this statement as : P[Si+1 | Si] = P[S#i1 | Si, .- 15 It says that "If the agent is present in the current state S1, performs an action al and move to the state s2, then the state transition from s1 to s2 only depends on the current state and future action and states do not depend on past actions, rewards, or states”. MDP is a framework that can solve most Reinforcement Learning problems with discrete actions With the Markov Decision Process, an agent can artive at an optimal policy for maximum rewards over time. Markov Process is the memory less random process i.e. a sequence of a random state S[1],S[2],....S[n] with a Markov Property. Markov decision process has 5 tuples(S,A,P.R, 5): *. Sis the set of states. ¢ Ais the set of action. « P(S, A,S’)is the probability that action A in the state S at time T will lead to state S’ at time T+ 1. * R(S, A, S’) is the immediate reward received after a transition from State S to S dash due to action A.* Discount Factor (x): It determines how much importance is to be given to the immediate reward and future rewards. It has a value between 0 and 1 Quearning algorithm © Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman equation. © The main objective of Q-leaming is to lear the policy which can infarm the agent that what actions © The goal of the agent in Q-learning is to maximize the value of Q. * Qsstands for quality in Q-learning, which means it specifies the quality of ari Gétion taken by the agent should be taken for maximizing the toward under what circumstances. * A Q-Table is used to find the best action for each state in the environment. We use the Bellman Equation at each state to get the expected future state and reward and save it in a table to compare with other states. Bellman Equation V(s) ~ max [R@a) +yV@')] Where, ‘V(s)= value calculated at a particular point.R(s, a) = Reward at a particular states by performing an action. y = Discount factor Q-Leamning algorithm works like this: Initialize all Q-values, e.g., with zeros Choose an action a in the current state s based on the current best Q-value Perform this action a and observe the outcome (new state s’). Measure the reward R after this action Update Q with an update formula that is called the Bellman Equation. Repeat steps 2 to 5 until the learning no longer improves EXAMPLE: An example of Q-learning is an Advertisement recommendation system, In a normal ad recommendation system, the ads you get are based on your previous purchases or websites you may have visited, If you’ ve bought a TV, you will get recommended TVs of different brands. Using Q-learning, we can optimize the ad recommendation system to recommend products that are frequently bought together. The reward will be if the user clicks on the suggested product. DEEP Q-LEARNING MODEL. ° O-Learning approach is practical for very small environments‘and quickly, loses it’s feasibility when the number of states and actions in the environment inereases: The solution for the above leads us to Deep Q Learning which uses a deep neural network to approximate the values Deep Q Leaming uses the Q-leaming idea and takes it one step further. Instead of-using a Q-table;weusea Neural Network thattakesia state and approximates the Q-values for each action based on that state The basic working'step for Deep Q-Learningiis that the initial stateiis fed into the neural network and it retums the Q-value of all possible actions as an output.CT able” => Q Value State Deep Q Learning eRe) Le) State => | Nel ae =p Q Value Action2 ™D Q Value Action3 The difference between Q-Learning and Deep Q-Learning can be illustrated as follows: + [aaa ‘amasInstead of using a Q-table, we use a Neural Network that takes a state and approximates the Q-values for each action based on that state Deep Neural Network Ques 6) What are the applications of reinforcement learning? Following are the applications of reinforcement learning : Robotics for industrial automation. 2, Business Strategy planning. 3. Machine leaming and data processing. 4. Ithelps us to create training systems that provide custom instruction and materials according to the requirement of students. 5. Aircraft control and robot motion control.INEER BEING foreg on Diabetic Retinopathy, Bung anna speaker, Selderving ea REINFORCEMENT LEARNING ~ Inodiction to Reinfrsement Leaning Learning Task, Example of Reinforement Learning ia Practice, Learning Modes fr Reinforeement~ (Markov Decision proces, Learing» Q Leasing funtion, Q Leaming Algom .Applition ef Reinforcement Leaning, odin to DepThis algorithm reflects the process of natural selection where the fittest individuals are selected in order to produce offspring of the next generation. The process of natural selection starts with the selection of fittest individuals from a population. © They produce offspring which inherit the characteristics of the parents and will be added to the next generation. * Ifparents have better fitness, their offspring will be better than parents and have a better chance at surviving. This process keeps on iterating and at the end, a generation with the fittest individuals will be found. © This notion can be applied for a search problem. The genetic algorithm is a method for solving both constrained and unconstrained optimization problems that is based on natural selection, the process that drives biological evolution. The genetic algorithm repeatedly modifies a population of individual solutions. Five phases are considered in a genetie algorithm. 1 Initial population Nv Fitness function 318 SBIBAtion 4. Crossover 5. Mutation

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6129)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (627)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1148)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (935)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8215)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (631)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1253)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (8365)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (860)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (877)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (954)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (2923)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (484)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (277)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4972)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (444)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2061)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4281)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (447)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1987)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1068)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1993)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2641)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1936)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (125)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4074)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (75)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (901)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (143)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (790)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (105)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (109)

Engineer Being Machine Learning Notes

Uploaded by

Engineer Being Machine Learning Notes

Uploaded by

You might also like