From the course: Microsoft Azure AI Essentials: Workloads and Machine Learning on Azure

Understanding regression

- [Instructor] Regression models predict numerous outcomes using data features. Once features are gathered, training the model involves four steps. First, we split the data set into training and validation sets. The training set builds the model, while the validation set checks its performance. Typically, 70 to 80% of the data is used for training, while 20 to 30% for validation. We randomly split the data to ensure each set represents the overall data set. Next, we use an algorithm to fit the training set. Options include linear, polynomial, ridge, lasso, quantile, Bayesian, and many more, each with parameters to adjust. Once a model is built, we use it to predict values from the validation set. Finally, we compare the predicted values to the actual labels using metrics to evaluate performance. Keep in mind that training involves multiple iterations, adjusting variations until an acceptable validation performance is achieved. Iterations may involve adjusting features, such as removing land elevation from the model in our farming example and adding pest infestation as a factor in predictions of crop yield. Changing what algorithm you use may also lead you to better results. Finally, modifying parameter settings of the algorithm will also provide different results. In this photo, you will see how decision forest regression has many parameters that will influence the result of the algorithm. While we won't cover every algorithm in this course, the most commonly used ones are linear regression, which finds a linear connection between the features and labels, and polynomial regression, which finds non-lineal relationships between features and labels, drawing a curved best fit line. Some of the common methods to measure model performance are mean absolute error, MAE, which measures the average error magnitude. In other words, it measures if the model is consistently off by a small or large amount. If the label is 25 ice cream sold, but the model predicted 22 or 28, the absolute error is 3. MAE averages these errors across the validation set, reaching a value of 2.33. Mean squared error, MSE, squares each error, emphasizing large discrepancies. Note that we use the same table as before and squared the absolute errors for each record. The result is an MSE of 6. Root mean squared, RMSE, takes the square root of MSE to return the error to original units. The root square of 6 gives us 2.45, which is the root mean squared error in ice cream units sold against our validation set. Finally, coefficient of determination, R2, is a value between zero and one. The closer to one, the better the model fits the data. The calculation of the coefficient of determination, or R2, is complex. We get the sum of square differences between the predicted and actual labels and divide that with the sum of square differences between the actual label values and the mean of actual label values.

Contents