0% found this document useful (0 votes)
126 views

Questions and Solutions On Linear Regression

Questions and solutions on linear regression

Uploaded by

KHAN AZHAR ATHAR
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
126 views

Questions and Solutions On Linear Regression

Questions and solutions on linear regression

Uploaded by

KHAN AZHAR ATHAR
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 5
Problem 2: Overfitting (20 points) For each of the supervised learning methods that we have studied, indicate how the ‘method could overfit the training data (consider both your design choices as well as the training) and what you can do to minimize this possibility. There may be more than one mechanism for overfitting, make sure that you identify them all. Part A: Nearest Neighbors (5 Points) 1. How does it overfit? Every point in dataset (including noise) defines its own decision boundary The distance function can be chosen to do well on training set but less well on new data, 2. How can you reduce overfitting? Use k-NN for larger k Use cross-validation to choose k and the distance function Part B: Decision Trees (5 Points) 1, How does it overfit? By adding new tests to the tree to correctly classify every data point in the t 2, How can you reduce overfitting? By pruning the resulting tree based on performance on a validation set, Part C: Neural Nets (5 Points) 1. How does it overfit? By having too many units and therefore too many weights, thus enabling it to fit every nuance of the training set. By training too long so as to fit the training data better. 2. How can you reduce overlitting? Using cross-validation to choose a not too complex network By using a validation set to decide when to stop training. Part D: SVM [Radial Basis and Polynomial kernels] (5 Points) 1, How does it overfit? In RBF, by choosing a value of sigma (the std dev of Gaussian) too small In Polynomial, by choosing the degree of the polynomial too high By allowing the Lagrange multipliers to get too large 2. How can you reduce overfitting? Using cross-validation to choose the kernel parameters and the maximum value for the multipliers. 1 [ Points] Short Questions ‘The following short questions should be answered with at most two sentences, and/or a picture. For the (true/false) questions, answer true or false. If you answer true, provide a short justification, if false explain why or provide a small counterexample. 1. [ points] Your billionaire friend needs your help. She needs to classify job applications into good/bad categories, and also to detect job applicants who lie in their applications using density estimation to detect outliers. To meet these needs, do you recommend, using a discriminative ocgenarative classiio> Why? Br dandy esimahdy, Meek 904 \yy 2. [ points] Your billionaire friend also wants to classify software applications to detect bug-prone applications using features of the source code, This pilot project only has a few applications to be used as training data, though. To create the most accurate classifier, do you recommend using a giscriminalivDor generative classifier? Why? bow ‘oA ron 3 | points} Finally, your billionaire friend also wants to classify companies to decide which one to acquire. This project has lots of training data based on several decades rte of research. To create the most accurate classifier, do you recommend using a diserim- Cn inative orgeneratiyD classifier? Why? 4. [points] Assume that we are using some classifier of fixed complexity. Draw a graph showing two curves: test error vs. the number of training examples and cross-validation, error vs. the number of training examples Xo cA/ : dente yO { points] Assume that we are using an SVM classifier with a Gaussian keel, Draw a graph showing two curves: training error vs. kernel bandwidth and test error vs kernel bandwidth 7 ~ Pand wide [points] Assume that we are modeling a number of random variables using a Bayesian Network with n edges. Draw a graph showing two curves: Bias of the estimate of the joint probability vs. n and variance of the estimate of the joint probability vs. n. Nvorfon Le, n X bras (a) Both PCA and linear regression can be thought of as algorithms for minimizing a sum of squared errors. Explain which error is being minimized in each algorithm CCAS a/aM IN (x= FLeunyy’ "reconsiructon® [ points} nvr Un regs argming U-*6) —"Yednal” evrer [ points] A long time ago there was a village amidst hundreds of lakes. Two types of fish lived in the region, but only one type in each lake. These types of fish both looked exactly the same, smelled exactly the same when cooked, and had the exact same delicious taste - except one was poisonous and would kill any villager who ate it. The only other difference between the fish was their effect on the pHT (acidity) of the lake they occupy. The pH for lakes occupied by the non-poisonous type of fish was distributed according to a Gaussian with unknown mean (J1saye) and variance (22,;,) 4 Bias-Variance Decomposition (12 pts) 1. (6 pte) Suppose you have regression data generated by a polynomial of degreo 3. Characterize the bias-variance of the estimates of the following models on the data with respect to the true model by circling the appropriate entry. Dias ‘Variance Linear regression toe /EED Gasyhieh Polynomial regression with degree 3 igh Cowyhigh Polynomial regression with degree 10 |(owyhigh low (high 2. Let ¥ = §(X)-+6, where ¢ has mean zero and variance a2. In k-nearest neighbor (KNN) regression, the prediction of ¥ at point x9 is given by the average of the values Y at de & neighbors closest to 2. (a) (2 pts) Denote the E-nearest neighbor to zo by zy and its corresponding Y value by co. Write the prediction f(cp) of the KNN regsessiou for x in terms of yig,1 <6 he a ee fC RR Yu (b) (2 pts) What is the behavior of the bias as k increases’ increase, (e) (2 pts) What is the behavior of the variance as k ineraases? decrease

You might also like