Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
46 views
ML PYQs
Uploaded by
bhatshriya20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save ML PYQs For Later
Download
Save
Save ML PYQs For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
46 views
ML PYQs
Uploaded by
bhatshriya20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save ML PYQs For Later
Carousel Previous
Carousel Next
Download
Save
Save ML PYQs For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 32
Search
Fullscreen
MACHINE LEARNING INSE! 1_| Which of the following is a disadvantage of k-Nearest Neighbors algorithm? (7a) Low accuracy 6) Computational by) Insensitive to outliers 4d) Need very less memory xpensive |The instance-based learner a) Lazy-learner b) 4) None of these Eager learner Which of the following is not a supervised learning? a) Naive Bayesian ) Linear Regression ©) Principal Component Analysis__d) Decision Tree | os “The Euclidean distance between the data-point A & Bis ___ 1D Variable 1 Variable 2 Variable 3 Vari ted 07 06 08 08 04 0S os oO a B (0.5 MI a) 0.17 b)0.38 0.91 0.9 “The _____ processes are powerful, non-parametric tools that can be used in supervised learning, namely in regression but also in classification problems. (a) Stochastic (b) Markov (©) Gaussian (si istical 7105 Mi Machine Learning uses the theory of in building mathematical models. | because the core task is making inference from a sample. (a)Statistics _(b) Mathematics __(€) optimization (a) physies A {sa function that separates the examples of different classes. | (a) Determinant (b) Discrimina () Random Process (4) Optimization Problem 105 Mi 105 Mi If the Values of two variables move in the opposite direction, the correlation is , (b) Weak (6) Positive (A) Negative » | The ~ is a model assessment technique used to evaluate a machine learning algorithm's performance when making predictions on new datasets it has not been | trained on. This is done by partitioning a dataset and using a subset to train the algorithm, and the remaining data for testing. Correlation (b) Cross-Validation (c) Generalization (d) NormamW | number | Eunetion | ¢ Las) Cost! \ [The with the val Variable is to associate PF iaiges on FINITE acterize a randon acter Jom variable is diserete, | adore probabilities is called probability Mass ye and ne then this assitt of values. Temust be. by de! Density Punetion ity mal Gb) Probab! (al) None © Mass Funet a) Probabi data set. describing CARS following, PG, minimum weight tiers ina tabular format. Saininauns | MAXIMUM (DISPLACEMENT | and maximum cars, average M! count of 8 and 4 cylin fears with CARs COUNT [AVERAGE MPG, | Display displacement o} Calculate the coefficient of correlation value PEEvesh the attributes bove dataset. inders” in the a MI(12 | What are outliers? Mention any two strategies to deal with outliers in datasets. | [3M] |_| Noise or outlier is a random error or variance in a measured variable. | \'Ans | Strategies to deal with outliers include 7 - - | Rule of thumb | 61.5 *1QR above Q3 of below QI is an outlier ‘+ 2 standard deviations away from mean \ Binning + Smoothens a sorted data value by consulting its neighborhood. The sorted values are distributed into a number of buckets or bins Also called as local smoothing |_| Regression-Data can be smoothed by fitng the data to a function | Clustering- Outliers can be detected by clustering. In the real-world data, tuples with missing values for some attributes are a common occurrence. Describe various methods for handling this problem. | First, determine the pattern of your mi ing data | ‘There are three types of missing data: | | |. Missing Completely at Random: There is no pattern in the | missing data on any variables. This is the best you can hope for. | | . jissing at Random: There is a pattern in the missing data but | | [not on your primary dependent variables such as likelihood to | | | recommend | . Missing Not at Random: There is a pattern in the missing data | | that affect your primary dependent variables. For example, lower- | income participants are less likely to respond and thus affect your | | conclusions about income and likelihood to recommend. Missing not | || at random is your worst-case scenario, Proceed with caution. | | | (a) Replacing a missing value with the most commonly occurring value for that | | | attribute, or | | (b) With the most probable value based on staisties | | (c) Replace missing values with the mean. | | (4) Replace missing values with the median | (¢) Replace missing values with an interpolated estimate | ales with a constant, |oF using ] when one is mi sing, the others can be ik in a reasonable way | (h) Replace missing values with a dummy value and create an indicator variable | (g) Replace missing values using imputation. Imputation is a wa features to model each other. That way used to fill in the bla for "m 2." When a missing value really means that the feature is not ‘able, then that fact can be highlighted. Filling in a dummy value that is clearly different trom actual values, such as a negative rank, is one way to do this. Another is to create a new true/false feature tracking whether the original feature is missing (i) Replace mi appl ing values with 0. A missing numerical value can mean zero. | First, determine the pattern of your missing data. [0.5 M] | ____ aa 3 Sir alee TOSES SMG ot f "14 | What is Regularization? What is the main application of a Regularizer on cost ] 2M) functions of a Machine Learning model? Regularization optimizes the predictive models by Preventing overfitting. | | The performance of a machine learning model can be evaluated through a cost | function. | Ans | Generally, a cost function is represented by the sum of squares of the difference | between the actual and predicted value. | This is also called the “Sum of squared residuals’ or ‘Sum of squared errors’. | A predictive mode! when being trained attempts to fit the data in a manner that | | minimizes this cost function. |_| A model begins to overfit when it passes through all the data points. _ Ip such instances, although the value of the cost funetion is equal to zero, the model having considered the noise in the dataset, does not represent the actual | function | | Under such circumstances, the error calculated on training data is k | However, on the test data, the error remains huge. | _ ’ ‘ i +—_T Essentially a model overfits the data by employing highly complex curves having | | terms with large degrees of freedom and corresponding coefficients for each term |__| that provide weight to it | For higher degrees of freedom the test set error is large when compared to the | train set error. |_| Regularization is a concept by which machine learning algorithms can be | | prevented from overfitting a dataset. | | Lamibdais a hyperparamete i % \ , ao A \ As the value of the penalty increases, the coefficients shrink in value in order to | | minimize the cost function. \ V6 | Since these coefficients also act as weights for the polynomial terms, shrinking | |_| these will reduce the weight assigned to them and ultimately reduce its impact. \ | therefore, for the case above, the coefficients assigned to higher degrees of | | polynomial terms have shrunk to an extent where the value of such terms no longer impacts the model as severely as it did before and so we have a simple curve. eeestanmmeeeeeneeteeanniny Regularization is an effective technique to prevent a mode! from overtiting. It allows us to reduce the variance in a model without a substantial inerease in | | wsbias. | | This method allows us to develop a mo data poins are available in our dataset. | Ridge regression helps to shrink the eoetlicients of a model where the parameters, | or features that determine the model is already known generalized model even if only a few in contrast, lasso repression can be elective to exclude insignificant variables | |i eethe model's equation I other words, lasso regression ean help in Feature | | selection Page Sof 6Overall, it's an important technique that can substantially improve performance of our model. Regularization Definition — 0.5 M Overfitting Definition — 0.5 M Any one cost function — 0.5 M (Example SSE) Impact of Regularization on the above cost function ~ 0.5 Mar MANES INSTITUTE OF TECHNOLOGY SS (A constituent ont of MAE, Manipal IV SEMESTER, DSE END-SEMESTER EXAMINATION (MAY’23 marks) SUBJECT: Machine Learning [DSE 2254] Duration: 3 hr Max Marks: 50 Instructions to Candidates: ~ » © Answer all the questions. — © Answer ALL questions Missing data may be suitably assumed QI. (i) How can we use k-fold cross validation to find the optimal value of k in KNN model? (ii) For the given dataset, construct the dendrogram using Single Linkage method in agglomerative clustering. Explain each step in the process Name Variable | Variable 2 A 719 8.6 B 68 8.2 c 87 96 D 6.1 73 E 15 20 F 37 43 ? G 12 85 y = 37 (4 marks) Q2. (i) What are the benefits of pruning in decision tree induction? (ii) Explain different approaches to tree pruning (3 marks) Q3. i) What is Regularization? (ji) What is the main application of a Regularizer on cost functions of a Machine Learning model (3 marks) (Q4. (i) What are accuracy measures in classification problems? (jiy What are the three errors that contribute to this prediction errr.(iii) What is Bias — Variance tradeoff in Machine Learning problems. (4 marks) QS. A classifier has made a total of 100 predictions for class A (27) & B (73). The model " for 68 makes only 79 correct predictions, The model has predicted "A" for 32 times, and times. In 9 instances, an actual “A” was wrongly labeled as “B”. (i) Prepare the Confusion matrix for this situation. (ii) Calculate the precision, recall, accuracy and F-score (3 marks) Q6. What are the steps involved in expectation maximization algorithm. (3 marks) Q7. A leading fashion store chooses to predict the willingness of a customer to buy a shirt of a particular price category based on the customers’ data. The company strongly believes that the willingness of a customer to buy depends on 3 factors ~ gender, the type of car used by the customer and the shirt price category. (i) Use the Naive Bayesian classifier method to determine if a customer would buy a shirt if gender="Male”, car type = "sports”, and the shirt price category ="expensive”. (ii) Explain the steps in the model formation. siti bel pues ‘ustomer 1) ] Gender | Car Type] Shirt price] Will Buy? category i Male | Sports ‘Cheap No. 2 Male | Sports Expensive | Yes 3 Male |Family | Cheap Yes 4 Male |Family | Expensive | No [5 ‘Male | Sports Cheap Yes 6 Male | Sports Expensive | Yes T Male [Family [ Cheap Yes 8 Male [Family | Expensive [No] 9 Female_| Sports Cheap No _| 10 Female |Family | Cheap No TL] Female [ Sports Expensive [No 2 Female [Family [Expensive [No] Female | nsive [Ni (4 marks) Q8. Derive the equation for Margin Widih in Support Vector Machine (3 marks) 3 marks Q9. i) Write the equation for Laplace smoothing, ‘With an example, explain the need for his smoothing (3 marks) 3 marks Q10. i) Apply PCA 10 reduce the Following dataset into only ONE Prins Gi) Enplain each step inthe proces, 'y ONE Principal component. —_critic acid] fixed acidity | | 87 765 (10.5 78.6 10.5 (( 427 F 128 | 147 _| 84 (2 marks) Q11. Gi) With an example, explain Instance-based learning method. {i) How do Instance-based leaming methods differ from model-based learning methods (5 marks) Q12. A Bagging model is used to improve the performance of machine learning, models. Explain with an example (3 marks) 13, (4) Why is sometimes Gini Impurity preferred to estimate Information Gain instead of entropy? (Qi) One IT firm is hiring candidates hired to which unit/domain is based on some of their professional features. Apply decision the first best probable feature for decision making by using Gini 1 its product development units. A candidate would be tree classifer to identify measure. (Use the following dataset), Major [Experience | Skill | Hiring Domain Cs [p=2year_| high [Backend cs_|e2year high _ [Testing EEE year | high [Backend SE year | low [Backend se _|<2year high _ [Network Security DSE p=2year_| low _|Backend EEE |< 2year low [Backend EEE [<2 year high [Tech Support EEE >= 2year_| high [Backend cs__|c2year high [Testing | (4 marks) Q14. (i) Write any four prominent characteristics possessed by a Perceptron. {iy How isthe net input computed at every neuron in an Artificial Neural Network? ) write the corresponding formula and explain the variables or terms involved in it (4 marks) 115. (4) What are the two major limitations of k-means ‘clustering algorithm? (ji) How are these limitations addressed (2 marks)‘Type: DES 4. (i) Draw a Gaussian distribution and explain its properties. (ii) What is Normal distribution? Write and explain the formula used for Normal Distribution. Write about the core parameters on which a normal distribution works? (ii) Explain the types of skewness probability distribution. {iv) What are the three types of skewness? Illustrate with an example. (5) Q2, (i) What is a Markov Chain? (ii) Write about the components of a Markov chain with a sample probabilistic graphical model. (3) 3. (i) How dimensionality can be reduced using subset selection procedure? (ii) Write about the working principles of Factor Analysis and Principal Component Analysis (PCA). (2) 4. (i) Apply hierarchical clustering on the given dataset to construct the dendrogram using Single Linkage method. Use Squared Euclidean Distance measure to compute the distances. Transaction_Id_| Credit Score MiT2023001 27.8 MIT2023002 33.9) |m1T2023003 10.5) MIT2023004 74) 'MIT2023005 19.8] (ii) Write various methods followed in Hierarchical data clustering (5) QS. (i) Use the given data to compute the Recall and Precision values for Setosa and Versicolor classes. {ii) Also calculate the model accuracy. Predicted Values ‘Actual Valuestioning its Q6. What is K-fold cross validation? Mention any two real time applications ment | usage in solving Machine Learning problems. (2) \ der C and H Q7, Use K-Means clustering to cluster the following data into TWO groups. Consi as initial cluster centers (use Squared Euclidean distance measure). Name Variable 1 Variable 2 A 79 86 B 68 82 c 87 96 D 6 13 E 1s 20 F a7 43 G 12 85 Hl 85 97 1 20 a4 3 13 26 6) Q8. (i) What is Model Selection and Generalization? With an example illustrate their usage in solving Machine Learning Problems. (li) What is Generalization error? Explain with an example. (3) 89. What is the objective of SVM? Explain with an example diagrammatically (2) 10. (i) Use KNN classification algorithm on the following dataset to predict the class for (P1=3, P2=7,) Use the Squared Euclidean distance measure to perform distance calculations (Assume K = 2). SNO | Pi | Pz CLASS — {7 7 |FALSE 2 [7] 4 FALSE 313 [ 4 | TRUE 411] 4 [irue 11. Derive the equation for estimatin, 18 the post assuming that the attributes follow a Gem sPostet " Probabilities for continuous attributes Gaussian distri bution. (3) Q12. (i) What is Cascading strategy in ensembling? (i) Give TWO use cases where ‘2scading is useful, (2) ——————— <<<Q13. Apply Simple Linear Regression to the below data. l (i (i) Calculate the regression line equation. Use the Loss function to calculate los in prediction for each data points in given dat Evaluate the model by calculating its R2 value. Name |Weisht]Blood Pressure (kg) "| (mmtig) ‘Y" Robin | 85 107 83 75, \_Rakut [68 105 (fina [71 86 ‘Abhay | 85 94 Atul [72 74 6) GHA, Discuss the following in the context of a Bayesian Belief Network: (i) Mathematical representation of a Bayesian Belief Network using full joint probability distribution to ident tify the conditional dependence or independence among the attributes, (ii) Conditional Probability Tables (CPT) and their usage or application in solving a Bayesian Belief Network (BBN) problem. (3) QI5. Discuss any FOUR factors that cause bias in Machine Learning, (2)(ip MANIPAL INSTITUTE OF TECHNOLOGY MANIPAL. (A constituent wnit of MAHE, Manipal) IV SEMESTER, DSE END-SEMESTER EXAMINATION (MAY’23) SUBJECT: Machine Learning [DSE 2254] Duration: 3 hr Max Marks: 50 Instructions to Candidates: © Answer all the questions. «Answer ALL questions Missing data may be suitably assumed QI. (i) How can we use k-fold cross validation to find the optimal value of k in KNN model? (ii) For the given dataset, construct the dendrogram using Single Linkage method in » agglomerative clustering. Explain each step in the process. (5 marks) Name Variable |___ Variable 2 A 79 8.6 B 68 8.2 c 87 96 D 61 13 E 15 20 E 37 43 G 712 85 H 85 97 ‘Ans: K-fold cross validation Definition 1 Mark Distance Matrix: 1.5 Marks ,. First distance Matrix updation: 1.5 Marks Drawing the dendrogram: 0.5 Marks Drawing the cluster formulated on the 2-D plane: 0.5 Marks This is the distance between the closest members of the two clustei This is the distance between the members that are farthest apart. Single Linkage Complete Linkage This method involves looking at the distances between all pairs and averages jall of these distances. This is also called Unweighted Pair Group Mean Averaging. Average Linkage Page 1 of 20 ee[- —“‘“‘C 1-01 7 Squared Euclidean Distance Computation - Iteration: 4 LAS Tey Gili Bd 1.25299 A -17047 | 1.28060 | 2.22036] 9.19348 | 4.78853 | | 2.26715 2.36008 | 1.14017] 8.15659 | 4.98196 = 0.00000 | 3.47131 | 10,46900 | 7.28628 | 3.47131 [0.00000] 7.01783 | 3.84187 — 00000 | 3.18276 | 8.64523 | 10.40624 ° 0.00000 | 5.46717 | 7.22495 " 0.00000 | 1.76918 0.00000 Find the minimum o} fall the distances in the entire matrix: = 0.22360 This distance is ben tween the data points C and H. ‘Therefore the first cluster is formed between C and H. 4a EE ETS H 0.00000 A 8 (1.17047 | 0700000 < | 1.28060 | 2.36008 | 6160000 | 2.22036 | 1.14017 | 3.47131 [op0000 £_| 9.19348 | 8.15659 | 10.46900 | 7.01783 | O0000 F S {498196 | 7.28628 [3.84187 | 3.18276 | 0000000 0.50000 1.86010 | 1.62788 | 8.64523 | 5.46717 0.00000 922360 | 3.39411 | 1040624 | 7.22495 | 1.76918 | 160000 To update the distance matrix: MIN{dist(C, H), A]= MIN{dist(C,A), dist(H, A)} = MIN{0.22360, 1.25299} = 0.22360 MIN{dist(C, H), B)= MIN ¢dist(C,B), dist(H, B)} =MIN (2.36008, 2.267] 5} = 2.36008 a MINIdis«C, H), Dy = MIN{dis(C,b), dist(H, D)) = MIN(3.47131, 3.3041 1} =3.39411 ) MINIdisuc. = ‘Ey isu. 1, Ey MIN (dis), dis, = MIN{10.46909, 10.40624} = 10,4064 MIN(dis4(C, 1, FI= MIN os . | . ICE), dist, py) St(H, Fy} = MIN(7.28628, 7.22495} = 7.22495 MINIdist(, 1), GJ=MIny MING. 86010. 1.76918) = 17918 “86010. 1.76918) = 1.7691 rogram , one iteation aca tH ORESPonding clay "er formulation mong all the data poin's\ i QQ. (i) What are the benefits of pruning in decision tree induction? Ans: When decision trees are built, many of the branches may reflect noise or outliers in the training data, Tree pruning methods address this problem of overfitting the data. Tree pruning attempts to identify and remove such branches, with the goal of improving classification accuracy on unseen data (i) 1 Mark 2 (ii) Decision Tree Definition: 0.5 Marks Decision Tree Sample Diagram with types of nodes: 0.5 Marks Pre-pruning approach: 1 Mark Post-pruning approach: 1 Mark (ii) Explain different approaches to tree pruning (3 marks) Ans: Decision trees are a machine learning algorithm that is susceptible to overfitting. One of the techniques you can use to reduce overfitting in decision trees is pruning. Decision Trees are a non-parametric supervised learning method that can be used for classification and regression tasks. The goal is to build a model that can make predictions on the value of a target variable by learning simple decision rules inferred from the data features. Decision Trees are made up of Root Node - the very top of the decision tree and is the ultimate decision you're trying to make. ‘Internal Nodes - this branches off from the Root Node, and represent different options Leaf Nodes - these are attached at the end of the branches and represent possible outcomes for each action. Just like any other machine learning algorithm, the most annoying thing that can happen is overfitting, And Decision Trees are one of the machine learning algorithms that are susceptible to overfitting Overfitting is when a model completely fits the training data and struggles or fails to generalize the testing data. This happens when the model memorizes noise in the training data and fails to pick up essential patterns which can help them with the test data, One of the techniques you can use to reduce overfitting in Decision Trees is Pruning, Pruning is a technique that removes the parts of the Decision Tree which prevent it from growing to its full depth. The parts that it removes from the tree are the parts that do not provide the power to classify instances. A Decision tree that is trained to its full depth will highly likely lead to overfitting the training data - therefore Pruning is important, In simpler terms, the aim of Decision Tree Pruning is to construct an algorithm that will perform worse on training data but will generalize better on test data. Tuning the per parameters of your Decision Tree model can do your made lot of justice and save you ot of time and money. Page 3 of 20TF ing and Post-pruning. There are two types of pruning: Pre-pruning and Post) «tine. It involves the training pipeline. nting 8 ior shot the decision tree ~ Prev , ; Pre-pruning: Tuning the hyperparamet heme heuristic known as ‘early stopping’ which stops the gt it from reaching its full depth. | 7 i ith small samples. — It stops the tree-building process to avoid producing las it See te monitored During each stage of the splitting of the tree, the cross-validatior aber ear ineaccicon If the value of the error does not decrease anymore - then we stop the & tree. ; The hyperparameters that can be tuned for early stopping and prevet max_depth, min_samples_leaf, and min_samples_split nting overfitting are: ‘These same parameters can also be used to tune to get a robust model. ‘One should be cautious as early stopping can also lead to underfitting. Post-pruning: Does the opposite of pre-pruning. Allows the Decision Tree model to grow to its full depth. Once the model grows to its full depth, tree branches are removed to prevent the model from overfitting. The algorithm will continue to partition data into smaller subsets until the final subsets produced are similar in terms of the outcome variable. ‘onsist of - . leamed the data to the T. “ofonly afew data points allowing the tree to have However, when a new data point is int get predicted wel roduced that differs from the learned data - it may not ‘The hyperparameter that can is be tuned fo : post-prars , 5 of nodes prance, A higher value of oop at ha wi be used as another option > oa lead to an ji is ber z oa complexity Pruning (pos aN increase in the nut z Cou Dass Ties rain steps, > Train an 3 has val ml th ion 7, at Coma ie and m ith anole: ‘ tte mae ses Mh dfs SY Prng_path( rit This hy ad test “P-alphas values and compute ' er al ue of | ¢¢ © get the polPhas Values, est fit Models,Q. i) What is Regularization? Ans: While training a machine learning model. the model can easily be overfitted or under fitted To avoid this, we use regularization in machine learning to properly fit a model onto our test set. Regularization techniques help reduce the chance of overfitting and help us get an optimal model, Regularization Definition and generic usage : 0.5 + 0. 1 Mark (ii) What is the main application of a Regularizer on cost functions of a Machine Learning model (2 marks) Ans: A regularizer prevents the cost function of a model not to reach the error value beyond a certain limit. It can be used to reduce overfitting using a Regularizer function such as Lambda. A regularizer applies penalty terms to the weights in the layers (in case of a neural network or DS _Lssoregression in Regression Models) whieh will help prevent overiting Ridge Regression: MSE® 1 E (Viens “Viens? net toss = Dion ~9 ADB c : ft q ini acai (ERS elec Elva Yess)? Mse= \ A= Tuning parameter Regularizer Applications: 1 Mark E Janation or equation of any of the Regularizer techniques such as Lasso or Ridge: Exp 0.5 + 0.5 Mark Page 5 of 20a 5 mark i s? ( ) rat are accuracy measures in classification problem Q4. (i)What are " ‘5 Marks “ Confusion Matrix Content Cae Mae Explanation of 3 errors: 0.5 * 3= 1.5 Error due to Bias: | Mark k Error due to Variance: 1 Marl Accuracy ‘Accuracy simply measures ofthe numberof correct predictions andthe total numberof predictions. TP +1N TP+TN+FP+FN Actual Values ‘often the cassie correctly predicts. We can define accuracy as the ratio t! how! lassfier cor Accuracy = Positive (1) Negative (0) ] p Positive (i) | TP f Negative (0)| FN TN |__| True Positive: We predicted positive and it's true. Inthe image, we predicted that a woman is pregnant anc she actually is, Predicted Values Tue Negative: We predicted negative and it's true. In the image, we predicted that a man is not pregnant and he actually is not False Positive (Type 1 Error). We predicted positive and it's ase. In the image, we predicted that a man's bregnant but he actually not, Precision = True Positi ve True Positive + False Positive Page 6017”~ (ii) What are the three errors that contribute to this prediction error. Ans: Prediction error quantifies one of two things, « In regression analysis, i's a measure of how well the model predicts the response variable. «In classification (machine learning) i's a measure of how well samples are classified to the correct category Prediction error can be quantified in several ways, depending on where you're using it, In general, you can analyze the behavior of prediction error with bias and variance (Johari, n.4.). In statistics, the root-mean-square error (RMSE) aggregates the magnitudes of prediction errors, The Reo-Blackwell theory can estimate prediction error as well as improve the efficiency of initial estimators. in machine learning, Cross-validation (CV) assesses prediction error and trains the prediction rule, A second method, the bootstrap, begins by estimating the prediction rule's sampling distribution (or the sampling distibution’s parameters): It can also quantiy prediction ertor and other aspects of the prediction rule. (iii) What is Bias — Variance tradeoff in Machine Leaning problems Ans: + Error due to : The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict. Of course you only have one model so talking about expected or average prediction values might seem a little strange. However, imagine you could repeat the whole model building process more than once: tach time you gather new data and run a new analysis creating anew model. Due to randomness in the underlying data sets, the resulting models will have a range of predictions. Bias measures how far off in general these models’ predietions are from the correct value. . Error due to Variance: The error due to variance is taken as the variability of j model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions fora given point vary between different realizations of the model. Page 7 of 20— anvanensny wor onurvnenss ANNI exenting combinations of both high and low ur different cases rep" We ean plot bias and variance: High Variance Low Variance ~ © © ae High Bias Fig. | Graphical illustration of bias and variance. QS. A classifier has made a total of 100 predictions for class A & B. The model makes only 79 correct predictions. The model has predicted "A" for 32 times, and "B" for 68 times. In 9 instances, an actual “A” was wrongly labeled as “B”. (i) Prepare the Confusion matrix for this situation. (ii) Calculate the precision, recall, accuracy and F-score (3 marks) Confusion matrix & keywords 1 marks (Precio, rec acuragy face) NTRS (0S marks ea) [Predict = | A “oe on Ero] 100 | ‘True Positive (TP): 20 (Instances where actual is : wal is A and predicted i ive (FP): 12 (Instances where actual is B but predicted i a) is + False Pc + False Negative (FN): 9 (Instances. whe ere actual is + True Negative (TN): 59 (Instances where tchual s Bang ated he icted is B) © Precision: TP / (TP + FP)= 0, © Recall (Sensitivity): TP 71 N) =0. : Accuracy: {TP+TN)/(TP + TN + Bp. ae - ve: 2* (Precision * Recall) / Precision ag ef call) = 0.64 20 Re” aQ6. What are the steps involved in expectation maximization algorithm. (2 marks) ion 2 marks Steps & Explan ‘The Expectation-Maximization (EM) algorithm is an iterative algorithm used to estimate parameters of probabilistic models with latent (unobserved) variables. Here are the steps involved in the EM algorithm: ization: Initialize the parameters of the model, including any latent variables, to some © Initi itial values can be random or based on prior knowledge. initial values. Thes E-step (Expectation step): In this step, the expected values of the latent variables are estimated based on the current parameter values. It involves computing the posterior probabilities or membership probabilities of the latent variables given the observed data and the current parameter estimates. This step calculates the "expectation" of the latent variables. Mstep (Maximization step): In this step, the parameters of the model are updated by maximizing the expected log-likelihood obtained in the E-step. The parameter estimates are computed by finding the values that maximize the expected log-likelihood with respect to the current latent variable expectations. This step involves "maximizing" the likelihood. 2 Convergence Check: Check for convergence by comparing the changes in the parameter estimates between consecutive iter s. If the changes are below a certain threshold or the maximum number of iterations is reached, terminate the algorithm. Otherwise, go back to the E-step and repeat the process. Repeat E-step and M-step: Iteratively perform the E-step and M-step until convergence is achieved, In each iteration, the latent variable expectations are updated based on the current parameter estimates, and the parameters are updated based on the expected values of the latent variables. Output: Once convergence is reached, the final estimated parameters are obtained as the output of the algorithm. These parameter estimates can be used for various purposes, such as inference, clustering, or classification, depending on the specific application. o> = il Page 9 of 20eo dict the willingness of a customer to buy a y. tomers’ data. The company strongly beliey, Fe Q7. A leading a particular price catego! the willingness of a custome! the customer and the shirt price categ (i) Use the Naive Bayesian cl gender= "Male”, car type = "sports fashion store chooses to pret ry based on the cus ‘omer to buy depends on 3 factors — 1od to determine if a customer would buy a Shin ” and the shirt price category = “expensive”, i ory. lassifier meth (ii) Explain the steps in the model formation: hin 5 gender, the type of car ys. "] (5 marks, Shit price Will Buy? Ans: Number of tuples in the data set = 1. #Yestuples=5 and #No tuples = 7 Target Attribute = Will Buy {“yes”, “ 2 no”) Plves) =5/12= 41.66; P(No) = 7/12 = 58.33 Attributes: Gender, Car type, Shirt Price Category; Customer I) ] Gender ] Car Type category i Male | Sports Cheap No 2 Male | Sports Expensive | Yes 3 Male _| Family Cheap Yes 4 Male [Family | Expensive _| No 5 ‘Male | Sports Cheap Yes 6 Male__| Sports Expensive _| Yes 7 Male | Family | Cheap Yes 8 Male |Family | Expensive _| No 9 Female [Sports| Cheap No 10 Female |Family | Cheap No 1 Female | Sports | Expensive | No 12 Female [Family [Expensive [No Tabulation of target values and probability computations: 1.5 Marks Tabulation of attribute prior probabil Computation of 2 posterior probabil 0.5 * 3= 1.5 Marks fies: = 1+ 1 = 2 Marks Computation of Prior Probabilities Page 10 of 20u ay 3/5 3/7 2/5 4/7 100% [ 100% P(Yes) =5/12 = 41.66; P(No) = 7/12 = 58.33 Given New Instance: gender="Male”, car type = "sports”, and the shirt price category ="expensive”. We need to compute the following posterior probabilities: (i) PCyes”) * P(gender="Male”|"yes”) * P(car type = “sports"|"yes”) * P( the shirt price category ="expensive’|"yes") = (5/12) * (5/5)* (3/5) * (2/5) = 150/1500 = 0.1 (i) P((No") * P(gender="Male"|' No”) * P(car type = "sports'['no") * P( the shirt price category ="expensive’|'no") = (7/12) * (3/7) * (3/7) * (4/7) = 252/41 16 = 0.06122 Since 0.1 > 0.06122 The probability of the customer buying the shirt is “yes”. Page 11 of 20Q8. Derive the equation for Margin Width in Support Vector Machine 8 man, Ans: 3 Base Line Equations: 0.5 * 3 = 1.5Marks Derivation of Margin Width = 1.5 Marks Computing the margin width | * Vector w=[w, w, ...] is perpendicular to the boundaries + Choose x- st f(x") = -1; let x* be the closest point with f(x") aetrew * Closest two points on the margin also satisfy wer tb =I wert t+b=41 wer tw) tha a Ryo Ma)= 41 = rel? + we => rijwi?-1 >r a M = (r+ - 5 pate Soe Region 1 Ter" = Ta Q9. (i) Write the equation for Laplace smoothing. (ii) With an example, explain the need for this smoothing. (2 marks) Equation & keywords 1 marks Explanation 0.5 marks Example 0.5 marks The equation for Laplace smoothing is as follows: P(word|category) = (count(word, category) + 1) / (count(category) + V) 5 countiword, calegory) represents mumber of occurrences of the word in a specific category. © count(category) represents the total count of all words in that category. ° Vrepresems the size of the vocabulary, ie, the total number Of unique words in the dataset. [aplace smoothing is used to address the problem of zero probabilities or probability estimates of zero in probabilistic models, particularly in the context of text classification or language modeling. It is necessary when a word or feature has not been observed in a particular category during training. | resulting in a probability of zero for that word given the category. However, zero probabilities can | cause issues when using these models for prediction or evaluatian, ‘An example to ilustrate the need for Laplace smoothing * Let's say we have a text classification problem where we want to classify movie reviews #8 Positive or negative based on the words in the review During training, we count the Page 12 of 20occurrences. only in of a partic ‘The PCA (Pi of a dataset while preserving the most important information, of words in each category. However, itis possible that certain words are present postive reviews and not in negative reviews or vice versa. In such cases, the count sular word given a category can be zero. “Let's assume it appears in 100 positive reviews but has not eviews. Without smoothing, the probability of "amazing" tiven the negative category would be zero. When using the model for prediction or a ieation ifwe encounter a new review containing the word "amazing" and try to calculate the probability ofthat review being negative, it would result in a zero probability. Laplace smoothing addresses this issue by adding a small constant (1 in this case) to both the numerator and denominator. This ensures that no probability estimate becomes zero. In the example, Laplace smoothing would assign a small probability to the word "amazing" in the negative category, even though it was not observed during training. Consider the word "amazing. been observed in any negative n ponent. Q10. () Apply PCA to reduce the following dataset into only ONE Principal comy (5 marks) (i) Explain each step in the process. critic acid] fixed acidity 87 76.5 105 78.6 10.5 80.4 12.7 80.9 28 82.7 (147 | 84.5 | Steps explanation 1 marks Centered data 0.5 marks Covariance matrix 1 marks Figen value & Normalized Eigen vector 1.5 marks 1 marks PC calculation (Formula & Correct answer) .cipal Component Analysis) process involves several steps to reduce the dimensionality = Data Preprocessing: Before applying PCA, its important to preprocess the data by standardizing or normalizing it. This step ensures that all variables have the same scale, preventing any single variable from dominating the PCA results. ce Matrix Calculation: Covariance matrix is computed from the preprocessed jonships between different variables in the dataset and provides insights into their linear dependencies. « Figenvalue-Eigenvector Decomposition: The covariance matrix is decomposed into its cigenvalues and eigenvectors, Eigenvalues represent the variance of the data along the Corresponding eigenvector (principal component), indicating the amount of information captured by each principal component. «Principal Component Selection: The eigenvalues are sorted in descending order, and the corresponding eigenvectors are arranged accordingly. The principal components with the highest eigenvalues capture the most significant variance in the data, The goal is to select a subset of principal components that explain a significant portion of the total variance. he Data ‘A=[8.7, 10.5, 10.5, 12.7, 12.8, 14.7] Bel 76.5, 78.6, 80.4, 80.9, 82.7, 84.5] Page 13 of 20eae pA = (8.7 + 10.5 + 10.5 + 12.7 + 12.8 + 14.7)/6 = 11.05 Perea ren 6+ 804 + 80.9 + 82.7 + 845)/6 80.45 ean o 5 Centered A~A-HA 5, -0.5, 1.6, 1.7, 3.6] Centered B= B - B= [-3.9, -1.8, 0.9, 0.4, 2.2, 4.0] Covarience matrix Var (A)=9.8 19.8, 8.0) Var (B) = 9.6 (8.0, 9.6) cova, B)= 8.0 | v2) covin,y) IA-All=0 10 r= bi Expand, Simplify and solve for eigenvalues (3): A°2 - 19.50h + 84.95 = 0 ‘ Using the quadratic formula, : 4 = (-b + V(b”2 - 4ac)) / 2a 1.5 18.26 oF A= 0.24 Fork 18.26: eigenvector associated with 4 = 18.26 : v1 = [0.9, 0.4] For i= 0.24: eigenvector associated with A ~ 0.24 : v2 = [-0.7, 0.7] Normalize Principal eigen vector To normalize vector v1 = [0.924, 0.383], divide each element of the vector by its magnitude. ‘magnitude (length) of v1: |v1| = V(0.924"2 + 0.3832) = V(0.854176 + 0.146689) = VI = 1 Divide each element of v1 by its magnitude: Normalized v1 = v1 /|v1|=[0.924/ 1, 0.383 / 1] = (0.924, 0.383] Normalized vector v1 is [0.9, 0.4]. Compute PC PC=a,X,4+a,X, Principal component 1 =X - vI Principal component 2 = X - v2 Principal component 1 = [-2.6, -0.9, -0.1, 1.6, 2.8, 6.0] Principal component 2 = [-1.7, 0.3, 0.4, -1.3, 14, -I.l] Q11. (i) With an example, explain Instance-based learning method. (3 Marks) ‘Ans: Definition of Instance Based Learning Method: 1 Mark Definition of Model Based Learning Method: | Mark Comparison between Instance Based and Model Based Learning Methods: 1 Mark The Machine Learning systems which are categorized as instance-based learning are the systems that learn the training examples by heart and then generalizes to new instances based on som? similarity measure 'tis called instance-based because it builds the hypotheses from the training instances. Page 14 0f20OO til a ing or lazy-learning (because they delay processing un also known as memory-based learning OF tem instance must be classified). ‘The time complexity of this algorithm i Each time whenever a new query Is encountered, its pre assign to a target function value for the new instance. Examples: KNN, Self-Organizing Maps. i ining data. depends upon the size of training / / nN i viously stores data is examined. And i i | ing methods (ii) How do Instance-based learning methods differ from model-based learning mi Ans: | ; ‘Model-based learning (also known as structure-based or eager learning) takes a different approach by constructing models from the training data that can generalize better than instance- based methods. This involves using algorithms like linear regression, logistic regression, random forest, etc. trees to create an underlying model from which predictions can be made for new data points. The model based learning approach has several benefits over instance-based methods, such as faster processing speeds and better generalization capabilities due to its use of an underlying model rather than relying solely on memorized examples. However, this approach requires more time and effort to develop and tune the model for ‘optimal performance on unseen data sets. instance-based learning and model-based learning are two broad categories of machine learning algorithms, There are several key dferences between these two types of algorithms, including: * Generalization: in model-based learning, the goal is to learn a generalizable model that can be used to make predictions on new data, This means that the model is trained on a dataset and then tested on a separate, unseen dataset to evaluate Contrast, instance-based learning algorithms simply memorize the training examples and use them to make predictions on new data. This means that instance-based learning algorithms don’t try to learn a generalizable model, and their performance on new data ic not as reliable as model-based algorithms. * Scalability: Because instance-based learning algorithms simply memorize the training examples, they can be very slow and memory-intensive when working with large datasets This is because the model has to store all ofthe training examples in memory and compare new data points to each of the stored examples. In contrast, model-based learning algorithms can be more scalable because they don't have to store all of the training examples. instead, they learn a model that can be used to make predictions without s the training data + Interpretability: Model-based learning algorithms often produce models that are easier to interpret than instance-based learning algorithms. This is because the model-based algorithms learn a set of rules or parameters that can be inspected to understand how the model is making predictions, In contrast, instance-based learning algorithms simply store the training examples and use them as a basis for making predictions, which can make it difficult to understand how the predictions are being made. while instance-based learning algorithms can be effective for small or medium-sized its performance. In toring verall, a sets, they are generally not as scalable or interpretable as model-based learning algorithms. jatasets, erefore, model-based learning is often preferred for larger, more complex datasets Therefore, Page 15 of 20the performance of machine learning 1, Q12, A Bagging model is used to improve Explain with ee mata mt Example I marks 4 Bagong (Bootstrap Aggrezating) model i a technique used to improve the performance sn, senate: of machine Tearing models, particulaly in the context of reducing variance an; It involves training multiple instances of the same base model on differen; addressing overfitting : apse ofthe traning data and combining their predictions to make a final decision Dataset Split: We randomly divide our dataset into multiple subsets, typically of equal size using a process called bootstrapping. «Base Model Training: We train a base model, such as a decision tree or a random forest. on each of the subsets independently. Each base model is trained on a different subset of the data, resulting in different models with potentially different biases. «Prediction Aggregation: Once the base models are trained, we can use them to make predictions on new, unseen data (or even on the training data itself). For classification tasks, a common way to combine predictions is by majority voting. Each base model independently predicts the class of the email (spam or not spam), and the class with the majority of votes becomes the final prediction. Final Decision: The final prediction is made based on the aggregated results of the base models. For example, if three out of five base models predict an email as spam, and two predict it as non-spam, the majority voting mechanism will label it as spam. Q13. (i) Why is sometimes Gini Impurity preferred to estimate Information Gain instead of entropy? (ii) One IT firm is hiring candidates to its product development units. A candidate would be hired to which univdomain is based on some of their professional features. Apply decision tree classifer to identify the first best probable feature for decision making by using Gini measure. (Use the following dataset). (5 marks) Page 16 of 20 _—_—_-cs _ [p= 2yeat high _[Testin cs _le2yeat we aackend hig - (eee p= 2year nes [Backend >e2yeat_| low re ayear _|_high [Network Security | oe oeayear | low [Backend I ore {ezyear | low [Backend [eee [<2year__|_ high [Tech Support ‘eee p=2year_| high [Backend ini 1.5 marks Gini vs entropy Gini for each variable 3 marks (1 marks each) (Formula, steps & Correct answer) Final Correct answer 0.5 marks Gini for Skill = High Gini(s) = Gini for Skill = Low 2 Gini(s) = 1- © + 0.694 Gini for Skill Weighted Average) 3 7 ig | + 0.694 = 722 GGa) +9644 (%) = e406 Gini Index for Skill : 0.49; Gini =1- (p,)? fa ) + +@i= gain, depends on the specific requirements and chareaeriaie oe eee Gi reasons why Gini Impurity may be preferred Computational Efficiency: Calculatin compared to entropy. Gini Impurity ont node and their squared values, in situations where efficiency is a i simpler and faster computation. i impurit characteristics of the Problem at hand. Here ar sae over entropy in certain scenarios: sates more efficient i lass label ‘ulating logarithmic functions, Therefore Gini Impurity can be preferred due to ity sion trees that are mor } produce better results for the majority class, Te robst 0 lass natn Interpretabilty: Gini Impurity is sometimes considered. more intuit, compared 10 entropy. Gini Impurity measures the probably of misery ee iy teePret element ina node, while entony quantifies the average amount of for required i mao identity the Page 17 of 20 ee eeStraightforward interpretation of the impurity or disorder in a node. Q14, (i) Write any four prominent characteristics possessed by a Perceptron. (3 marks) (ii) How is the net input computed at every neuron in an Artificial Neural Network? (iii) Write the corresponding formula and explain the variables or terms involved in it Ans: Four prominent characteristics possessed by a Perceptron: 1 Mark Net input computation method: 1.5 Mark Formula and variable explanations”: 1 Mark (i) A perceptron works by taking in some numerical inputs along with what is known as weights and a bias. It then multiplies these inputs with the respective weights (this is known as the weighted sum), These products are then added together along with the bias. The activation function takes the weighted sum and the bias as inputs and returns a final output. A perceptron consists of four parts: input values, weights and a bias, a weighted sum, and activation function. Assume we have a single neuron and three inputs x1, x2, x3 multi respectively as shown below: by the weights wi, w2, w3 The idea is simple, give 'n the numerical value of the there is a function, inside the inputs and the weights, neuron, that will produce an output -the 1 Row is, whut is this function? pun ‘One function may look like ee a ae Ee | ee Page 1802? class of a randomly selected element. In certain contexts, Gini Impurity may provide a more \— this function is called the weighted sum because itis th -ause it is the sum of the weights and inputs. This looks all into a certain range say 0 tot like a good function, but what if we wanted the outputs to E We ean do this by using something known as an activation fonction, jon is a function that converts the input given (the input in activation funcl be the weighted sum) into a certain output based on a set of - (x0 | | this case, would rules. \s /2 Pd ” There are different kinds of activation functions that exist, for example: 1 Hiyperbolic Tangent: used {0 ous ut number from-1 to 1. used t m 0 to 1. o output a number frot duced. So the final neuron 2. Logistic Function: ‘the bias a threshold the perceptron must reach before the output is Pro equation looks li /) + bias elow yg weight x input the bias is represented near the inputs) as shown Bs Represented visually we see (where typically 2) 2 XnW a ml Notice that the activation func output. Using the Logistical F tion takes in the weighted sum plus the bias as inputs €0 CF ‘unction this output will be between 0 and 2 eate a single Page 19 of 20QIS. (i) What are the two major limitations of k-means clustering algorithm? | (ii) How are these limitations addressed (2 marks) | Two limitations 1 marks. (0.5 marks eah) Corrections Imarks (0.5 marks eah) | The k-means clustering algorithm has a couple of major limitations: Dependency on Initial Centroids: The performance of the k-means algorithm is highly sensitive to the initial placement of centroids. Different initializations can lead to different final cluster assignments and outcomes. In some cases, poor initialization may result in suboptimal clustering solutions or the algorithm getting stuck in local optima. * Assumption of Spherical Clusters and Equal Variance: K-means assumes that clusters are spherical and have equal variance. It tries to minimize the within-cluster sum of squares CWCSS) by assigning data points to the nearest centroid. However, this assumption is often unrealistic for complex datasets where clusters have irregular shapes, different sizes, or varying densities. As a result, k-means may strugele to capture the true underlying structure of such datasets, leading to suboptimal or inaccurate clustering results, ‘The limitations ofthe k-means clustering algorithm can be addressed in several ways: Initialization Techniques: Various initialization techniques can be used to mitigate the sensitivity to initial centroids. Instead of randomly assigning initial centroids, more advanced methods can be employed. These methods help in obtaining better initial centroid placements and reduce the likelihood of getting stuck in local optima, * Alternative Distance Metries: Instead of relying solely on the Euclidean distance, loyed to handle different . Choosing an appropriate dist tance metric that aligns with the characteristics ofthe data can help improve clustering results, Advanced Clustering Algorithms: When dealing with complex datasets, it may be Peneficial to explore altemative clustering algorithms that are better suicet for capturing ‘regularly shaped clusters or clusters with different sizes and deneiticn clustering algorithms, can handle clu: eae Model-based Sters with different variances and non-spherical sh: by fitting probabilistic models to the data. ne Dene Mot 20
You might also like
Machine Learning Interview Questions.
PDF
50% (2)
Machine Learning Interview Questions.
43 pages
ML (1)
PDF
No ratings yet
ML (1)
6 pages
Week11_regularization and optimization
PDF
No ratings yet
Week11_regularization and optimization
75 pages
Lecture5
PDF
No ratings yet
Lecture5
26 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
PDF
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
5.Feauture Engineering
PDF
No ratings yet
5.Feauture Engineering
34 pages
Introduction To Machine Learning Lecture 2: Linear Regression
PDF
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
AML Winter 2021 Solution
PDF
No ratings yet
AML Winter 2021 Solution
6 pages
MLA TAB Lecture3
PDF
No ratings yet
MLA TAB Lecture3
70 pages
DS 1
PDF
No ratings yet
DS 1
20 pages
Interview Questions On Machine Learning
PDF
100% (4)
Interview Questions On Machine Learning
22 pages
Data Mining
PDF
No ratings yet
Data Mining
33 pages
Aiml K2
PDF
No ratings yet
Aiml K2
8 pages
Lecture16 Crossvalidation
PDF
No ratings yet
Lecture16 Crossvalidation
32 pages
ML 01
PDF
No ratings yet
ML 01
24 pages
12. ML Mid 1 Scheme
PDF
No ratings yet
12. ML Mid 1 Scheme
8 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
PDF
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Predictive Maintenance
PDF
No ratings yet
Predictive Maintenance
66 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
PDF
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Mauryan Empire
PDF
No ratings yet
Mauryan Empire
11 pages
Intro To ML RevisionNotes
PDF
No ratings yet
Intro To ML RevisionNotes
24 pages
Regression Analysis
PDF
No ratings yet
Regression Analysis
11 pages
DWM - END SEM LAB Questions
PDF
No ratings yet
DWM - END SEM LAB Questions
9 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
PDF
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Lecture Material 10
PDF
No ratings yet
Lecture Material 10
9 pages
ML Linear Model
PDF
No ratings yet
ML Linear Model
10 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
PDF
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Lecture 7
PDF
No ratings yet
Lecture 7
29 pages
ML - Module 5
PDF
No ratings yet
ML - Module 5
80 pages
Problem 1 Report Trần Minh Long 2052154 Final
PDF
No ratings yet
Problem 1 Report Trần Minh Long 2052154 Final
31 pages
mL
PDF
No ratings yet
mL
4 pages
2019-20-I MS Key
PDF
No ratings yet
2019-20-I MS Key
6 pages
Lesson 4 - Supervised Learning
PDF
No ratings yet
Lesson 4 - Supervised Learning
36 pages
Midterm F02soln
PDF
No ratings yet
Midterm F02soln
14 pages
ML 04 Validation Regularization
PDF
No ratings yet
ML 04 Validation Regularization
57 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
My Notes
PDF
No ratings yet
My Notes
15 pages
Ai Ml Exam_1march 16 2022-Michael Magreola
PDF
No ratings yet
Ai Ml Exam_1march 16 2022-Michael Magreola
8 pages
NoCA2019-ProxyML 2019nov29
PDF
No ratings yet
NoCA2019-ProxyML 2019nov29
24 pages
05-1 Supervised Learning
PDF
No ratings yet
05-1 Supervised Learning
65 pages
Lecture 19
PDF
No ratings yet
Lecture 19
25 pages
Machine Learning 20CSE09
PDF
No ratings yet
Machine Learning 20CSE09
3 pages
Machine 2021 Jan-Apr
PDF
No ratings yet
Machine 2021 Jan-Apr
45 pages
Fiches Machine Learning
PDF
No ratings yet
Fiches Machine Learning
21 pages
Lect 1
PDF
No ratings yet
Lect 1
24 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
PDF
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
13 pages
Feature selection
PDF
No ratings yet
Feature selection
19 pages
ML 2023a Midsem Solution
PDF
No ratings yet
ML 2023a Midsem Solution
9 pages
QSRI-lecture1
PDF
No ratings yet
QSRI-lecture1
45 pages
Machine Learning Qs
PDF
No ratings yet
Machine Learning Qs
10 pages
Epfl Machine Learning Final Exam 2021 Solutions
PDF
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Unit-2_MLT
PDF
No ratings yet
Unit-2_MLT
84 pages
PS Notes (Machine Learning
PDF
No ratings yet
PS Notes (Machine Learning
14 pages
Question Bank1
PDF
No ratings yet
Question Bank1
9 pages
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
PDF
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
ASSIGNMENT2
PDF
No ratings yet
ASSIGNMENT2
6 pages
Overfitting & Feature Engineering.pptx
PDF
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
EDAN96_2024_Last_lecture-1
PDF
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages