ML Unit 1

Uploaded by

mba department

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

8 views

ML Unit 1

Uploaded by

mba department

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 26

Unit I: Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine Learning ‘Systems, Main Challenges of Machine Statistical Learning: Introduction, Supervised and Unsupervised Learning, Training and Test Loss, Tradeoffs in Statistical Learning, Estimating Risk Statistics, Sampling distribution of an estimator, Empirical Risk Minimization ‘The evolution of machine learning from 1950 is depicted in Figure 1.1 Nearest ncighbour algorithm created str Sar beste pat First machine learning com roe - Geoltrey Kagel, a website fo ‘competitions, launched 2010 |__ 15atrs Watson beats two human champions in Jeopardy Bo Google's AlphaGe program beats unhandicapped professional 2016 human player 1 Evolution of machine learning,Page -2 INTRODUCTION. ARTIFICIAL INTELLIGENCE, MACHINE LEARNING, DEEP LEARNING Ogee eMC Pena eu ies eee re) Preeti a ea) ference cueaes) Application of Machine Learning that uses complex algorithms and Cees enc etae sl) Artificial Intelligence is the concept of creating smart intelligent machines. ‘Machine Learning is a subset of artificial intelligence that helps you build Al-driven applications. Deep Learning is a subset of machine learning that uses vast volumes of data and complex algorithms totrain a model. ‘What is Artificial Intelligence? “Artificial intelligence is the capability of a computer system to mimic human functions such as learning and problem-solving. Through AI, a computer system uses maths and logic to simulate the reasoning that people use to lear from new information and make decisions.” “Artificial intelligence, commonly referred to as AI, is the process of imparting data, information, and human intelligence to machines. The main goal of Artificial Intelligence is to develop self-reliant machines that can think and act like humans. These machines can mimic human behavior and perform tasks by learning and problem-solving. Most of the AI systems simulate natural intelligence to solve complex problems.” Let’s have a look at an example of an Al-driven product - Amazon Echo. 1100 wdpon) att a > vdpoh 1110 amazon alexa lice commrtedtoconmands Processing «Alas vole output T ‘Alexa Voice Services pay Ale. The current ens | temperature temperature | in Chicago? | wncchiago is 2 amazon echo >Page - 3 |Capabilities of AT and machine learning ‘Companies in almost every industry are discovering new opportunities through the connection between Al land machine learning. These are just a few capabilities that have become valuable in helping companies transform] their processes and products: predict trends and behavioural patterns by discovering cause-and-effect Recommendation engines With recommendation engines, companies use data analysis to recommend products that someone might be| interested in, |Speech recognition and natural language understanding [Speech recognition enables a computer system to identify words in spoken language, and natural language| lunderstanding recognizes meaning in written or spoken language. Image and video processing [These capabilities make it possible to recognise faces, objects, and actions in images and videos, and implement] functionalities such as visual search. |Sentiment analysis |A computer system uses sentiment analysis to identify and categorise positive, neutral, and negative attitudes that are expressed in text. [Types of Artificial Intelligence Reactive Machines - These are systems that only react. These systems don’t form memories, and they don’t fuse any past experiences for making new decisions. [Limited Memory - These systems reference the past, and information is added over a period of time. The referenced information is short-lived. [Theory of Mind - This covers systems that are able to understand human emotions and how they affect [decision making. They are trained to adjust their behavior accordingly. [Self-awareness - These systems are designed and created to be aware of themselves. They understand their Jown internal states, predict other people’s feelings, and act appropr ‘Machine Translation such as Google Translate Self Driving Vehicles such as Google's Waymo AI Robots such as Sophia and Aibo Speech Recognition applications like Apple’s Siri or OK GooglePage - 4 Applications of AI and machine learning Companies in several industries are building applications that take advantage of the connection between artificial intelligence and machine learning. These are just a few ways that AI and machine learning are helping companies transform their processes and products: © Retail Retailers use AI and machine learning to optimise their inventories, build recommendation engines, and enhance the customer experience with visual search. ‘+ Healtheare Health organizations put AI and machine learning to use in applications such as image processing for improved cancer detection and predictive analytics for genomics research. + Banking and finance In financial contexts, Al and machine learning are valuable tools for purposes such as detecting fraud, predicting risk, and providing more proactive financial advice. © Sales and marketing Sales and marketing teams use AI and machine learning for personalised offers, campaign o} sales forecasting, sentiment analysis, and prediction of customer churn. © Cuber security ‘Al and machine learning are powerful weapons for cybersecurity, helping org: themselves and their customers by detecting anomalies. © Customer service Companies in a wide range of industries use chatbots and cognitive search to answer questions, gauge customer intent, and provide virtual assistance. «Transportation ‘Al and machine learning are valuable in transportation applications, where they help companies improve the efficiency of their routes and use predictive analytics for purposes such as traffic forecast + Manufacturing Manufacturing companies use AI and machine learning for predictive maintenance and to make their operations more efficient than ever. An introduction to Machine Learning Arthur Samuel, an early American leader in the field of computer gaming and artificial intelligence, coined the term “Machine Learning ” in 1959 while at IBM. He defined machine learning as “the field of study that gives computers the ability to learn without being explicitly programmed *. However, there is no universally accepted definition for machine learning. Different authors define the term differently. We give below two more definitions. What is Machine Learning? “Machine learning is an application of Al. It’s the process of using mathematical models of data to help a computer learn without direct instruction. This enables a computer system to continue learning and improving on its own, based on experien Machine learning is programming computers to optimize a performance criterion using example data or past experience. We have a model defined up to some parameters, and learning is the execution of a computer program to optimize the parameters of the model using the training data or past experience. The model may be predictive to make predictions in the future, or descriptive to gain knowledge from da ‘The field of study known as machine learning is concerned with the question of how to construct computer ML Unit -1Page - 5 programs that automatically improve with experience. [Definition of learning: A computer program is said to learn from experience E with respect to some class of tasl IT and performance measure P , if its performance at tasks T, as measured by P , improves with experience E. [Examples 1, Handwriting recognition learning problem Task T: Recognizing and classifying handwritten words within images Performance P: Percent of words correctly classified Training experience E: A dataset of handwritten words with given classifications 2. A robot driving learning problem Task T: Driving on highways using vision sensors Performance P: Average distance travelled before an error Training experience E: A sequence of images and steering commands recorded while observing ahuman| driver [Definition: A computer program which learns from experience is called a machine learning program or simply a| learning program, How Does Machine Learning Work? ‘Machine learning accesses vast amounts of data (both structured and unstructured) and leams from it to predict the future. It learns from the data by using multiple algorithms and techniques. Below is a diagram that shows how a machine learns from data Leas fom past Predles the ata output a Why Use Machine Learning? |Consider how you would write a spam filter using traditional programming techniques(see in below figure) 1, First you would look at what spam typically looks like. You might notice that some words or phrases (such as “4U,” “credit card,” “fee,” and “amazing”) tend to come up a lot in the subject. Perhaps you would also notice a few other patterns in the sender's name, the email’s body, and so on. 2. You would write a detection algorithm for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns are detected. 3. You would test your program, and repeat steps I and 2 until it is good enough.Page - 6 Figure : The traditional approach Since the problem is not trivial, your program will likely become a long list of complex rules—pretty hard to maintain, In contrast, a spam filter based on Machine Learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam examples compared to the ham examples (Below Figure). The program is much shorter, easier to maintain, and most likely more accurate. Study the problem solution Figure : Machine Learning approach Moreover, if spammers notice that all their emails containing “4U” are blocked, they might start writing “For U” instead. A spam filter using traditional programming techniques would need to be updated to flag “For U” emails. If spammers keep working around your spam filter, you will need to keep writing new rules forever. ML Unit -1Page -7 In contrast, a spam filter based on Machine Learning techniques automatically notices that “For U” has become unusually frequent in spam flagged by users, and it starts flagging them without your intervention Train ML Evaluate algorithm solution Figure : Automatically adapting to change Finally, Machine Learning can help humans learn (Below Figure ): ML algorithms can be inspected to see What they have learned (although for some algorithms this can be tricky). For instance, once the spam filter has been trained on enough spam, it can easily be inspected to reveal the list of words and combinations of words that it believes are the best predictors of spam. Sometimes this will reveal unsuspected correlations or new trends, and thereby lead to a better understanding of the problem. Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparent. This is called data mining. Study the Train ML problem algorithm 7 Solution ~ | Inspect th solution lterate ifneeded Understand the problem better Figure: Machine Learning can help humans learn Machine Learning is great for: Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one MachineLearning algorithm can often simplify code and perform better. Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution Fluctuating environments: a Machine Learning system can adapt to new dataPage - 8 ‘* Getting insights about complex problems and large amounts of data. TYPES OF MACHINE LEARNING (or) CLASSIFICATION OF MACHINE LEARNING Machine learning implementations are classified into four major categories, depending on the nature of the learning “signal” or “response” available to a learning system which are as follows: A. Supervised learning: ‘Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. The given data is labeled .Both classification and regression problems are supervised learning problems Example - Consider the following data regarding patients entering a clinic . The data consists of the gender and age of the patients and each patient is labeled as “healthy” or “sick” gender age label M 48 sick M 67 sick F 53 healthy M 49 sick F 32 healthy M 34 healthy M 21 healthy ‘Supervised learning: In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels Training set Figure : A labeled training set for supervised learning (e.g., spam classification) A typical supervised learning task is classification. The spam filter is a good example of this: it is trained with many example emails along with their class (spam or ham), and it must learn how to classify new emails B. Unsupervised learning: Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of ML Unit -1Page - 9 input data without labeled responses. In unsupervised learning algorithms, classification or categorization is not included in the observations. Example: Consider the following data regarding patients entering a clinic. The data consists of the gender and age of the patients. gender age M 48 M 67 F 33 M 49 F 34 Mies 28 Asa kind of learning, it resembles the methods humans use to figure out that certain objects or events are from the ame class, such as by observing the degree of similarity between objects. Some recommendation systems that you find on the web in the form of marketing automation are based on this type of learning. Reinforcement learning: Reinforcement learning is the problem of getting an agent to act in the world so as to maximize its rewards A learner is not told what actions to take as in most forms of machine learning but instead must discover which actions yield the most reward by trying them, For example — Consider teaching a dog a new trick: we cannot tell him what to do, what not to do, but we can reward/punish it if it does the right/wrong thing. ‘When watching the video, notice how the program is initially clumsy and unskilled but steadily improves with training until it becomes a champion. Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of a training dataset, it is bound to learn from its experience. Example: The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed to find the best possible path to reach the reward. The following problem explains the problem more easily. ‘The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path ML Unit -1Page - 10 which gives him the reward with the least hurdles. Each right step will give the robot a reward and each wrong step will subtract the reward of the robot. The total reward will be calculated when it reaches the final reward that is the diamond. ID. Semi-supervised learning: Where an incomplete training signal is given: a training set with some (often many) of the target outputs Imissing. There is a special case of this principle known as Transduction where the entire set of problem instances} is known at learning time, except that part of the targets are missing. Semi-supervised learning is an approach to} Imachine learning that combines small labeled data with a large amount of unlabeled data during training. Semi- supervised learning falls between unsupervised learning and supervised learning. Example: [Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 1, jwhile another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just one label per person.4 and it is able to name everyone in every photo, which is useful for searching photos. Figure : Semi-supervised learning Introduction to Deep Learning |What is Deep Learning? “Deep learning is a branch of machine learning which is completely based on artificial neural networks, as neural] network is going to mimic the human brain so deep learning is also a kind of mimic of human brain. In deep learning, we don’t need to explicitly program everything. ” Types of Deep Neural Networks DNN and ANN :- Deep Learning is a subset of Machine Learning that is based on artificial neural Inetworks (ANNs) with multiple layers, also known as deep neural networks (DNNs). These neural networks are| inspired by the structure and function of the human brain, and they are designed to learn from large amounts of| [data in an unsupervised or semi-supervised manner, Deep Learning models are able to automatically learn features from the data, which makes them wel suited for tasks such as image recognition, speech recognition, and natural language processing RNN: - The most widely used architectures in deep learning are feedforward neural networks, [convolutional neural networks (CNNs), and recurrent neural networks (RNNS), ML Unit - 1Page - 1 IFNN:- Feedforward neural networks (FNNs) are the simplest type of ANN, with a linear flow of information through the network. FNNs have been widely used for tasks such as image classification, speech recognition, and natural language processing. Convolutional Neural Networks (CNNs) are a special type of FNNs designed specifically for image and video recognition tasks. CNNs are able to automatically learn features from the images, which makes them well-suited for tasks such as image classification, object detection, and image segmentation. The major difference between deep learning vs machine learning is the way data is presented to the machine.Machine learning algorithms usually require structured data, whereas deep learning networks work on multiple layers of artificial neural networks. ‘This is what a simple neural network looks like: oxo ee dente ‘The network has an input layer that accepts inputs from the data. The hidden layer is used to find any hidden features from the data. The output layer then provides the expected output. Recurrent Neural Networks (RNNS) are a type of neural networks that are able to process sequential data, such as time series and natural language. RNNs are able to maintain an internal state that captures information about the previous inputs, which makes them well-suited for tasks such as speech recognition, natural language processing, and language translation. In human brain approximately 100 billion neurons all together this is a picture of an individual neuron and each neuron is connected through thousand of their neighbours. The question here is how do we recreate these neurons in a computer. So, we create an artificial structure called an artificial neural net where we have nodes or neurons. We have some neurons for input value and some for output value and in between, there may be lots of neurons interconnected in the hidden layer. Difference between Machine Learning and Deep Learning : ML Unit -1Page - 12 Machine Learning Machine Learning | Works on small amount of Dataset for accuracy. Works on Large amount of Dataset. [Dependent on Low-end Machine: Tieavily dependent on High-end Machine Divides the tasks into sub-tasks, solves them Solves problem end to end individually and finally combine the results. [Takes less time to train, Takes longer time to train, [Testing time may increase. Less time to test the data, Here is an example of a neural network that uses large sets of unlabeled data of eye retinas. ‘The network model is trained on this data to find out whether or not a person has diabetic retinopathy. How Does Deep Learning Work? 1, Calculate the weighted sums. 2. The calculated sum of weights is passed as input to the activation function, 3. The activation function takes the “weighted sum of input” as the input to the function, adds a bias, and decides whether the neuron should be fired or not. 4, The output layer gives the predicted output 5. The model output is compared with the actual output. After training the neural network, the model uses the backpropagation method to improve the performance of the network. The cost function helps to reduce the error rate. Automatic Text Jor character-by-character. Imay even capture the style. Healthcare ~ Helps in diagnosing various diseases and treating it. |Automatic Machine Translation — Certain words, sentences or phrases in one language is transformed into} fanother language (Deep Learning is achieving top results in the areas of text, images). [Image Recognition — Recognizes and identifies peoples and objects in images as well as to understand content| [and context. This area is already being used in Gaming, Retail, Tourism, etc. [Predicting Earthquakes — Teaches a computer to perform viscoelastic computations which are used in predicting| learthquakes. [Deep learning has a wide range of applications in various fields such as computer vision, speech Recognition, ML Unit -1 — Corpus of text is learned and from this mode! new text is generated, word-by-word| Then this model is capable of learning how to spell, punctuate, form sentences, or itPage - 13 Natural Language Processing, and many more. Some of the most common applications include: Image and Video Recognition: Deep learning models are used to automatically classify images and videos, detect objects, and identify faces. Applications include image and video search engines, self-driving cars, and surveillance systems Deep learning models are used to transcribe and translate speech in real-time, which is used in voiceSpeech Recognition:-controlled devices, such as virtual assistants, and accessibility technology for people with hearing impairments Natural Language Processing: Deep learning models are used to understand, generate and translate human languages. Applications include machine translation, text summarization, and sentiment analysis. Robotics: Deep learning models are used to control robots and drones, and to improve their ability to perceive and interact with the environment. Healthcare: Deep learning models are used in medical imaging to detect diseases, in drug discovery to identify new treatments, and in genomics to understand the underlying causes of diseases. Finance: Deep learning models are used to detect fraud, predict stock prices, and analyze financial data. Gaming: Deep learning models are used to create more realistic characters and environments, and to improve the ‘gameplay experience. ‘Recommender Systems: Deep learning models are used to make personalized recommendations to users, such as product recommendations, movie recommendations, and news recommendations. Social Media: Deep learning models are used to identify fake news, to flag harmful content and to filter out spam. Autonomous systems: Deep learning models are used in self-driving cars, drones, and other autonomous systems to make decisions based on sensor data. ‘Types of Machine Learning Systems ML Unit -1Page - 14 Peet) Main Challenges of Machine Learning In short, since your main task is to select a learning algorithm and train it on some data, the two things that can go ‘wrong are “bad algorithm” and “bad data.” Let’s start with examples of bad data. 1.Insufficient Quantity of Training Data, For a toddler to learn what an apple is, all it takes is for you to point to an apple and say “apple” (possibly repeating this procedure a few times). Now the child is able to recognize apples in all sorts of colors and shapes. Genius. Machine Learning is not quite there yet; it takes a lot of data for most Machine Learning algorithms to work properly. Even for very simple problems you typically need thousands of examples, and for complex problems such as image or speech recognition you may need millions of examples (unless you can reuse parts of an existing model). The Unreasonable Effectiveness of Data In a famous paper published in 2001, Microsoft researchers Michele Banko and Eric Brill showed that very different Machine Learning algorithms, including fairly simple ones, performed almost identically well on a complex problem of natural language disambiguation8 once they were given enough data (as you can see in Figure 1-20). ML Unit -1Page - 15 Figure 1-20. The importance of data versus algorithms ‘As the authors put it: “these results suggest that we may want to reconsider the tradeoff between spending time and money on algorithm development versus spending it on corpus development.” 2. Nonrepresentative Training Data In order to generalize well, it is crucial that your training data be representative of the new cases you want to generalize to. This is true whether you use instance-based learning or model-based learning, For example, the set of countries we used earlier for training the linear model was not perfectly representative; a few countries were missing. Figure 1-21 shows what the data looks like when you add the missing countries. 10 5 re ile 7 er /t _\ fe ee eee aes % 720000 40000 ‘60000 ‘20000 "100000 ‘or per eapta Figure 1-21. A more representative training sample If you train a tinear model on this data, you get the solid line, while the old model is represented by the dotted line. As you can see, not only does adding a few missing countries significantly alter the model, but it makes it clear that such a simple linear model is probably never going to work well. It seems that very rich countries are not happier than moderately rich countries (in fact they seem unhappier), and conversely some poor countries seem happier than many rich countries. 3. Poor-Quality Data, Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poorquality measurements), it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well. It is often well worth the effort to spend time cleaning up your training data. The truth is, most data scientists spend a significant part of their time doing just that. For example: ‘© Ifsome instances are clearly outliers, it may help to simply discard them or try to fix the errors manually. * If some instances are missing a few features (e.g., 5% of your customers did not specify their age), you must decide whether you want to ignore this attribute altogether, ignore these instances, fill in the missing values (e.g., with the median age), or train one model with the feature and one model without it, and so on. ML Unit -1Page - 16 4, Irrelevant Features As the saying goes: garbage in, garbage out. Your system will only be capable of learning if the training data contains enough relevant features and not too many irrelevant ones. A critical part of the success of a Machine Learning project is coming up with a good set of features to train on. This process, called feature engineering, involves: ‘* Feature selection: selecting the most useful features to train on among existing features. ‘© Feature extraction: combining existing features to produce a more useful one (as we saw earlier, dimensionality reduction algorithms can help). © Creating new features by gathering new data. 5,Overfitting the Training Data Say you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all taxi drivers in that country are thieves. Overgeneralizing is something that we humans do all too often, and unfortunately machines can fall into the same trap if we are not careful. In Machine Learning this is called overfitting: it means that the model performs well on the training data, but it does not generalize well. Figure 1-22 shows an example of a high-degree polynomial life satisfaction model that strongly overfits the training data, Even though it performs much better on the training data than the simple linear model, would you really trust its predictions? Life satisfaction ‘7000060000 —~—«80000~—~—«100000 GDP per capita Figure 1-22. Overfiting the training dataa Overfitting happens when the model is too complex relative to the amount and noisiness of the training data. The possible solutions are: ‘© To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a high- degree polynomial model), by reducing the number of attributes in the training data or by constraining the model To gather more training data To reduce the noise in the training data (e.g. fix data errors and remove outliers) 6,Underfitting the Training Data As you might guess, underfitting is the opposite of overfiting: it occurs when your model is too simple to learn the underlying structure of the data, For example, a linear model of life satisfaction is prone to underfit; reality is just more complex than the model, so its predictions are bound to be inaccurate, even on the training examples ‘The main options to fix this problem are: ‘© Selecting a more powerful model, with more parameters «Feeding better features to the learning algorithm (feature engineering) ‘Reducing the constraints on the model (e.g., reducing the regularization hyperparameter) 7. Stepping Buck By now you already know a lot about Machine Lean However, we went through so many concepts that you ML Unit -1Page - 17 may be feeling a little lost, so let's step back and look at the big picture + Machine Learning is about making machines get better at some task by learning from data, instead of having to explicitly code rules. + There are many different types of ML systems: supervised or not, batch or online, instance-based or model- based, and so on + In a ML project you gather data in a training set, and you feed the training set to a learning algorithm. If the algorithm is model-based it tunes some parameters to fit the model to the training set (i.e., t0 make good predictions on the training set itself), and then hopefully it will be able to make good predictions on new cases as well. If the algorithm is instance-based, it just learns the examples by heart and generalizes to new instances by comparing them to the learned instances using a similarity measure. + The system will not perform well if your training set is too small, or if the data is not representative, noisy, or polluted with irrelevant features (garbage in, garbage out). Lastly, your model needs to be neither too simple which case it will underfit) nor too complex (in which case it will overfit) Statistical Learning: Introduction There are two major goals for modeling data: 1) To accurately predict some future quantity of interest, given some observed data, and 2) To discover unusual or interesting patterns in the data. To achieve these goals, one must rely on knowledge from three important pillars of the mathematical sciences. Function approximation:- Building a mathematical model for data usually means understanding how one data variable depends on another data variable. The most natural way to represent the relationship between variables is via a mathematical function or map. We usually assume that this mathematical function is not completely known, but can be approximated well given enough computing power and data. Thus, data scientists have to understand how best to approximate and represent functions using the least amount of computer processing and memory. Optimization:- Given a class of mathematical models, we wish to find the best possible model in that class This requires some kind of efficient search or optimization procedure. The optimization step can be viewed as a process of fitting or calibrating a function to observed data. This step usually requires knowledge of optimization algorithms and efficient computer coding or programming. Probability and Statisties: In general, the data used to fit the model is viewed as a realization of a random process or numerical vector, whose probability law determines the accuracy with which we can predict future observations. Thus, in order to quantify the uncertainty inherent in making predictions about the future, and the sources of error in the model, data scientists need a firm grasp of probability theory and statistical inference. ML Unit -1Page - 18 Supervised and Unsupervised Learning: Feature and response: Given an input or feature vector x, one of the main goals of machine learning is topredict an output or response variable y. For example, © x could be a digitized signature and y a binary variable that indicates whether the signature is genuine or false. ‘* x represents the weight and smoking habits of an expecting mother and y the birth weight of the baby. Prediction function: which takes as an input x and outputs a guess g(x) for y (denoted by ¥, for example) Regression: the response variable y can take any real value. when y can only lie in a finite set, say y € (0, ...,¢— 1}, then predicting y is conceptually the same as classifying the input x into one of c categories, and so prediction becomes a classification problem. loss funetion: We can measure the accuracy of a prediction by with respect to a given response y by lossfunction using some loss function Loss(y, 4). n a regression setting the usual choice is the squared error loss “12y- 9 In the case of classification, the zero-one (also written 0-1) loss function Loss(y, 9) = ly.) is often used, which incurs a loss of | whenever the predicted class by is not equal to the class y. we will encounter various other useful loss functions, such as the cross-entropy and hinge loss functions. Error is often used as a measure of distance between a “true” object y and some approximation ¥y, thereof. If y is real-valued, the absolute error |y - 4 | and the squared error (y-4,)° are both well-established error concepts, as are the norm ||y— || and squared norm |ly ¥ || ? for vectors. The squared error (y— 7) ? is just one examplePage - 19 Supervised Learning: One tries to learn the functional relationship between the feature vector x and response y in the presence of a teacher who provides n examples. It is common to speak of “explaining” or predicting y on the basis of explanatory x, where x is a vector of explanatory variables: An example of supervised learning is email spam detection, Unsupervised learning: learning makes no distinction between response and explanatory variables, and the objective is simply to learn the structure of the unknown distribution of the data. In other words, we need to learn f(x). In this case the guess g(x) is an approximation of f(x) and the risk is of the form Ug) = BLoss(f(X), g(X)). Training and Test Loss: Given an arbitrary prediction function g, it is typically not possible to compute its risk (8) However, using the training sample T, we can approximate (8) via the empirical (sample average) risk fet 64g) = =D Loss(¥i8(X)), which we call the training loss. The training Tdss is thus an unbiased estimator of the risk (the expected loss) for a prediction function g, based on the training data. To approximate the optimal prediction function g+ (the minimizer of the risk “8)) we first select a suitable collection of approximating functions G and then take our learner to be the function in G that minimizes the training loss; that is gf = argmin 6,(g). ro The prediction accuracy of new pairs of data is measured by the generalization risk of the learner. For fixed taining set tit is defined as ML Unit -1Page - 20 &(g9) = BLoss(¥, ¢9(X)). where (X, ¥) is distributed according to f(x,y). In the discrete case the generalization risk is therefore: &(g%) = D,, Loss(y. ¢#(x)) f(x,y) (replace the sum with an integral for the continuous case). The situation is illustrated in Figure 2.1, where the distribution of (X, ¥) is indicated by the red dots. The training set (points in the shaded regions) determines a fixed prediction function shown as a straight line. Three possible outcomes of (X,Y) are shown (black dots). The amount of loss for each point is shown as the length of the dashed lines. The generalization risk is the average loss over all possible pairs (x.y), weighted by the corresponding f(x,y). oF | 42 cee | Figure: The generalization risk for a fixed training set is the weighted-average loss over all possible pairs (x, y) For a random training set J, the generalization risk is thus a random variable that depends on T (and G). If we average the generalization risk over all possible instances of T, we obtain the expected generalization risk: BU(g?) = BLoss(¥, g9(X)), where (X,Y) in the expectation above is independent of J. In the discrete case, we have BOGE) = Leey.x yoko LOSSY, B&(2) f(, Y)f (E191) “+ fn Yn)- Figure 2.2 gives an illustration. Figure: The expected generalization risk is the weighted-average loss over all possible pairs (x, y) and over all training sets. For any outcome ¢ of the training data, we can estimate the generalization risk without bias by taking the sample averagePage - 21 1g aD boss(¥ 88(X))), = is called Test Loss. Where (Mis¥ e+e Mol == 7" is a so-called test sample, The test sample is completely separate from T, but is drawn in the same way as T; ‘Table 2.1: Summary of definitions for supervised learning. Fixed explanatory (feature) vector. Random explanatory (feature) vector. Fixed (real-valued) response. Random response. Joint pdf of X and ¥, evaluated at (x,y). Conditional pdf of Y given X = x, evaluated at y Fixed training data ((x,,y;),i = 1,....m} Random training data ((X;, Yi), = 1,....m). Matrix of explanatory variables, with n rows x7, = 1,.... and dim(x) feature columns; one of the features may be the constant 1 y Vector of response variables (V1,..-.Y# g Prediction (guess) function. Lossy.) Loss incurred when predicting response y with ¥. as) Risk for prediction function g; that is, B Loss(, ¢(X)). 7 Optimal prediction function; that is, argmin, ((¢). Optimal prediction function in function class G; that is, argmineg €(g). ‘Training loss for prediction function g; that is, the sample average estimate of ((g) based on a fixed training sample r. ‘The same as ¢,(g), but now for a random training sample T- The learner: argmineg f,(g). That is, the optimal prediction function based on a fixed training set + and function class G. ‘We suppress the superscript G if the function class is implicit. ‘The learner, where we have replaced r with a random training set T. Tradeoffs in Statistical Learning: The relation between model complexity, computational simplicity, and estimation accuracy, it is useful to decompose the generalization risk into several parts, so that the tradeoffs between these parts can be studied. We will consider two such decompositions: the approximation-estimation tradeoff and the bias-variance tradeoff. ‘We can decompose the generalization risk into the following three components:Page - 22 eay= C+ Ua9)-C +698) - U8"), ‘The decomposition can now be interpreted as follows. 1. The first component, ¢* = E(Y ~ g*(X)), is the irreducible error, as no prediction function will yield a smaller expected squared error. ‘The second component, the approximation error é(g9) — €(g"), is equal to E(g9(X) ~ °(X))?. We leave the proof (which is similar to that of Theorem 2.1) as an exercise; see Exercise 2. Thus, the approximation error (defined as a risk difference) can here be interpreted as the expected squared error between the optimal predicted value and the optimal predicted value within the class G. For the third component, the statistical error, £(g%) — &(g9) there is no direct inter- pretation as an expected squared error unless @ is the class of linear functions; that is, g(x) = x7 for some vector B. In this case we can write (see Exercise 3) the Statistical error as ¢(g?) ~ é(9%) = E(g#(X) ~ g9(X)?, ‘Thus, when using a squared-error loss, the generalization risk for a linear class F can be decomposed as: €(g9) = B(g&(X) - YY = € + B(g(X) - 8°)? + B@(X) - 89K) Note that in this decomposition the statistical error is the only term that depends on the training set. ‘The errors in a machine learning model can be broken down into 2 parts: 1. Reducible Error 2. Irreducible Error Irreducible errors are errors that cannot be reduced even if you use any other machine learning model. Reducible errors, on the other hand, is further broken down into square of bias and variance. Due to this bias-variance, it causes the machine learning model to either overfit or underfit the given data, What exactly is Bias? Bias is the inability of a machine learning model to capture the true relationship between the data variables.It is caused by the erroneous assumptions that are inherent to the learning algorithm. For example, in linear regression, the relationship between the X and the Y variable is assumed to be linear, when in reality the relationship may not be perfectly linear. In general, High Bias indicates more assumptions in the learning algorithm about the relationships between the variables, Less Bias indicates fewer assumptions in the learning algorithm.Page - 23 What is the Variance Error? This is nothing but the concept of the model overfitting on a particular dataset. If the model learns to fit very closely to the points on a particular dataset, when it used to predict on another dataset it may not predict as accurately as it did in the first ‘Variance is the difference in the fits between different datasets. Generally, nonlinear machine learning algorithms like decision trees have a high variance. It is even higher if the branches are not pruned during training, Low-variance ML algorithms: Linear Regression, Logistic Regression, Linear Discriminant Analysis High-variance ML algorithms: Decision Trees, K-NN, and Support Vector Machines Bias — Variance Tradeoff Low Variance High Variance © © * ‘* Ifa model uses a simple machine learning algorithm like in the case of a linear model in the above code, the model will have high bias and low variance (underfitting the data). ‘+ Ifa model follows a complex machine learning model, then it will have high variance and low bias( overfitting the data). High Bias Let's summarize:Page - 24 You need to find a good balance between the bias and variance of the model we have used. This, tradeoff in complexity is what is referred to as bias and variance tradeoff. An optimal balance of bias and variance should never overfit or underfit the model. This tradeoff applies to all forms of supervised learning: classification, regression, and structured output learning. How to fix bias and variance problems? Fixing High Bias ‘* Adding more input features will help improve the data to fit better. ‘+ Add more polynomial features to improve the complexity of the model. Decrease the regularization term to have a balance between bias and variance. Fixing High Variance ‘+ Reduce the input features, use only features with more feature importance to reduce overfitting the data Getting more training data will help in this case, because the high variance model will not be working for an independent dataset if you have very data. Estimating Risk Statistics: Different methods of estimating risk measures: 1, In-Sample Risk 2. 2. Cross-Validation 1, In-Sample Risk : Due to the phenomenon of overfitting, the training loss of the learner (8 ‘good estimate of the generalization risk “(S") of the learner... To simplify the analysis, suppose that we wish to estimate the average accuracy of the predictions of the learner g, at the n feature vectors x1, ...2» (these are part of the training, set r). In other words, we wish to estimate the in-sample risk of the leaner g,: DY) BLoss(¥s, eG), am where each response ¥/ is drawn from f(y’|), independently. Even in this simplified se ting, the training loss of the learner will be a poor estimate of the in-sample risk. 2. Cross-Validation: The idea is to make multiple identical copies of the data set, and to partition each copy into different trainingand test sets, as illustrated in Below Figure. Here, there are four copies of the data set (consisting of response and explanatory variables). Each copy is divided into a test set (colored blue) and training set (colored pink). For each of these sets, we estimate the model parameters using only training data and then predict the responses for the test set. The average loss between the predicted and observed responses is then ‘a measure for the predictive power of the model.Page - 25 Resp. Expl. Resp. Expl. Resp. Expl. Resp. Expl. test test test test Figure: An illustration of four-fold cross-validation, representing four copies of the same data set. The data in each copy is partitioned into a training set (pink) and a test set (blue). The darker columns represent the response variable and the lighter ones the explanatory variables. In particular, suppose we partition a data set T of size n into K folds Ci,...,Cx of sizes yy. -4mx (hence, my +++ x = n). Typically m ~ n/K, k= 1,...4K. Let fc, be the test loss when using C; as test data and all remaining data, denoted T_, as training data. Each fc, is an unbiased estimator of the generalization risk for training set. T+; that is, for (gr-,). ‘The K-fold cross-validation loss is the weighted average of these risk estimators a folgr.) e 1S° ose (6.99 d, where the function « : (1,....n) + {1,..-,K) indicates to which of the K folds each of the n observations belongs. As the average is taken over varying training sets {7-1}, it estimates the expected generalization risk B ¢(gr), rather than the generalization risk ((g.) for the particular training set r. Sampling distribution of an estimator: In statistics, it is the probability distribution of the given statistic estimated on the basis of a random sample. It provides a generalized way to statistical inference. The estimator is the generalized mathemati parameter to calculate sample statistics. An estimate is the result of the estimation. ‘The sampling distribution of estimator depends on the sample size. The effect of change of the sample size has to be determined. An estimate has a single numerical value and hence they are called point estimates. ‘There are various estimators like sample mean, sample standard deviation, proportion, variance, range etc. ‘Sampling distribution of the mean: It is the population mean from which the samples are drawn, For all the sample sizes, it is likely to be normal if the population distribution is normal. The population mean is equal 31Page - 26 to the mean of the sampling distribution of the mean, Sampling distribution of mean has the standard deviation, which is as follows o vn ‘Where “™, is the standard deviation of the sampling mean, _ is the population standard deviation and n is the sample size. As the size of the sample increases, the spread of the sampling distribution of the mean decreases. But the ‘mean of the distribution remains the same and it is not affected by the sample size. ‘The sampling distribution of the standard deviation is the standard error of the standard deviation, It is defined as: V2" Here, 7 is the sampling distribution of the standard deviation. It is positively skewed for small m but it approximately becomes normal for sample sizes greater than 30. Empirical Risk Minimization: Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on their performance. The core idea is that we cannot know exactly how well an algorithm will work in practice (the true "risk") because we don't know the true distribution of data that the algorithm will work on, but we can instead measure its performance on a known set of training data (the "empirical" risk) In general, the risk Rh) cannot be computed because the distribution P(xy) is unknown to the learning algorithm (this situation is referred to as agnostic learning). However, we can compute an approximation, called empirical risk, by averaging the loss function on the training set; more formally, computing the expectation with respect to the empirical measure: Reap(h) = 2 ST Hae), u)- a ‘The empirical risk minimization principle states that the learning algorithm should choose a hypothesis f which minimizes the empirical risk: f = argmin Renp(h). het Thus the learning algorithm defined by the ERM principle consists in solving the above optimization problem,

ML-UNIT-1
No ratings yet
ML-UNIT-1
26 pages
ML Unit-1
No ratings yet
ML Unit-1
34 pages
Lecture bsmd -Introduction to ML
No ratings yet
Lecture bsmd -Introduction to ML
16 pages
DataAnalyticsChapter2Vision PDF
No ratings yet
DataAnalyticsChapter2Vision PDF
56 pages
1. Chapter 1 Introduction to ML
No ratings yet
1. Chapter 1 Introduction to ML
52 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
74 pages
ML UNIT 1
No ratings yet
ML UNIT 1
34 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
78 pages
ML_Unit 1
No ratings yet
ML_Unit 1
110 pages
Question 1: What Is Machine Learning Answer 1
No ratings yet
Question 1: What Is Machine Learning Answer 1
23 pages
presentation <3
100% (1)
presentation <3
17 pages
U20cs604 Machine Learning Unit I
No ratings yet
U20cs604 Machine Learning Unit I
33 pages
UNIT5
No ratings yet
UNIT5
15 pages
Chapter 1 Overview
No ratings yet
Chapter 1 Overview
19 pages
UNIT 2 - Artificial Intelligence and Machine Learning
No ratings yet
UNIT 2 - Artificial Intelligence and Machine Learning
71 pages
Cognate x Spidey
No ratings yet
Cognate x Spidey
46 pages
Part-1 Introduction of ML
No ratings yet
Part-1 Introduction of ML
17 pages
Module 2 'Machine Learning-AI'
No ratings yet
Module 2 'Machine Learning-AI'
19 pages
Ai first chp
No ratings yet
Ai first chp
7 pages
Centre For Management Studies: Online Submission of Assignment-03
No ratings yet
Centre For Management Studies: Online Submission of Assignment-03
13 pages
Unit1 - Machine Learning
No ratings yet
Unit1 - Machine Learning
17 pages
Introduction To AI
No ratings yet
Introduction To AI
52 pages
AI
No ratings yet
AI
7 pages
What Is Artificial Intelligence
100% (1)
What Is Artificial Intelligence
27 pages
Eda 5
No ratings yet
Eda 5
48 pages
Chapter 1
No ratings yet
Chapter 1
56 pages
Basics of ML
No ratings yet
Basics of ML
70 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
45 pages
Artificial Intelligence Introduction
No ratings yet
Artificial Intelligence Introduction
8 pages
Ai An ML For Threat Detection
No ratings yet
Ai An ML For Threat Detection
8 pages
Unit - 1, Notes
No ratings yet
Unit - 1, Notes
38 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
Lecture Notes in Applied Machine Learning 1
No ratings yet
Lecture Notes in Applied Machine Learning 1
3 pages
Introduction To Artificial Intelligence: Inte Ligê Ncia Artif Icial E Cibe Rse Gurança (Inacs)
No ratings yet
Introduction To Artificial Intelligence: Inte Ligê Ncia Artif Icial E Cibe Rse Gurança (Inacs)
35 pages
Unit 5
No ratings yet
Unit 5
26 pages
AI and ML in IoT
No ratings yet
AI and ML in IoT
6 pages
Artificial Intelligence & Machine Learning
No ratings yet
Artificial Intelligence & Machine Learning
27 pages
Internship Project.1
100% (1)
Internship Project.1
32 pages
UNIT 5
No ratings yet
UNIT 5
125 pages
DocScanner Sep 27, 2024 9-01 AM
No ratings yet
DocScanner Sep 27, 2024 9-01 AM
24 pages
What is Machine Learning
No ratings yet
What is Machine Learning
10 pages
Unit 1 - Introduction To AI
No ratings yet
Unit 1 - Introduction To AI
133 pages
CH - 1 Artificial Intelligence Class 11 Notes
100% (2)
CH - 1 Artificial Intelligence Class 11 Notes
11 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
17 pages
Unit 1
No ratings yet
Unit 1
24 pages
AI-driven Applications.: Differences Between AI vs. Machine Learning vs. Deep Learning
No ratings yet
AI-driven Applications.: Differences Between AI vs. Machine Learning vs. Deep Learning
10 pages
Presentation On Artificial Intelligence and Machine Learning
No ratings yet
Presentation On Artificial Intelligence and Machine Learning
20 pages
ML UNIT I NEW
No ratings yet
ML UNIT I NEW
56 pages
ML Notes
No ratings yet
ML Notes
202 pages
1 - Machine Learning (Start)
No ratings yet
1 - Machine Learning (Start)
32 pages
Lecture 1 - Introduction to the Course and AI,ML (1)
No ratings yet
Lecture 1 - Introduction to the Course and AI,ML (1)
44 pages
Unit3
No ratings yet
Unit3
80 pages
Introduction Toartificial Intelligence
100% (1)
Introduction Toartificial Intelligence
6 pages
Ch-1 Notes
No ratings yet
Ch-1 Notes
7 pages
Beyond The Hype: A Guide To Understanding and Successfully Implementing Artificial Intelligence Within Your Business
100% (2)
Beyond The Hype: A Guide To Understanding and Successfully Implementing Artificial Intelligence Within Your Business
20 pages
Rahul Mula - Artificial Intelligence Using Python_ Learn to Create Intelligences Like Voice Search Engine, Face Recognizers, Etc. Artificially-Independently Published (2021)
No ratings yet
Rahul Mula - Artificial Intelligence Using Python_ Learn to Create Intelligences Like Voice Search Engine, Face Recognizers, Etc. Artificially-Independently Published (2021)
272 pages
Machine Learning Presentation
No ratings yet
Machine Learning Presentation
10 pages
Machine learning, explained _ MIT Sloan
No ratings yet
Machine learning, explained _ MIT Sloan
11 pages
Be Unit 1
No ratings yet
Be Unit 1
12 pages
Unit 3
No ratings yet
Unit 3
10 pages
Holistic Human Health
No ratings yet
Holistic Human Health
6 pages
New Microsoft Office Word Document (2)
No ratings yet
New Microsoft Office Word Document (2)
1 page
B Tech-IT
No ratings yet
B Tech-IT
42 pages
UHV Important Questions
100% (1)
UHV Important Questions
2 pages
Company Final Accounts
No ratings yet
Company Final Accounts
2 pages
Lesson Plan Subject: UHV
No ratings yet
Lesson Plan Subject: UHV
3 pages
Unit II
No ratings yet
Unit II
51 pages
Primarymarket
No ratings yet
Primarymarket
28 pages
Humanrightsandduties 150805163042 Lva1 App6891
No ratings yet
Humanrightsandduties 150805163042 Lva1 App6891
26 pages
Intellectual Property: "Know Your Rights"
No ratings yet
Intellectual Property: "Know Your Rights"
32 pages
What Is Transfer Pricing in India
No ratings yet
What Is Transfer Pricing in India
3 pages
Product Management
No ratings yet
Product Management
1 page
Banking Notes
No ratings yet
Banking Notes
41 pages
GLOBE2
No ratings yet
GLOBE2
7 pages
SCM Unit Ii
No ratings yet
SCM Unit Ii
6 pages
Long Term Savings
No ratings yet
Long Term Savings
8 pages
Ratio Analysis
No ratings yet
Ratio Analysis
25 pages
Recent Reforms in Financial Sector (For Students)
No ratings yet
Recent Reforms in Financial Sector (For Students)
6 pages
Financial Evaluation of LEASING
100% (1)
Financial Evaluation of LEASING
7 pages
Insurance & Insurance Products, Agents & Brokers
No ratings yet
Insurance & Insurance Products, Agents & Brokers
39 pages
Indian Accounting Standards (Abbreviated As Ind-AS) in India Accounting Standards
100% (1)
Indian Accounting Standards (Abbreviated As Ind-AS) in India Accounting Standards
6 pages
Unit V
No ratings yet
Unit V
13 pages
HR Outsourcing
100% (1)
HR Outsourcing
3 pages
Credit Rating Agencies in India
No ratings yet
Credit Rating Agencies in India
4 pages
Fringe Benefits
No ratings yet
Fringe Benefits
6 pages
Unit Vi
No ratings yet
Unit Vi
8 pages
Financial Accounting Nature and Scope, Importance
No ratings yet
Financial Accounting Nature and Scope, Importance
3 pages

ML Unit 1

Uploaded by

ML Unit 1

Uploaded by

You might also like