CAB430 Practical Week9_2025
CAB430 Practical Week9_2025
Week 9 Practical
We will use a small database called SkiRentalDB. It has one table called rental_data. Each
row in the table provides ski rental information in a particular day. The information includes time
(i.e., year, month, day, weekday), rental count (i.e., how many rentals in the particular day),
whether it has snow or not, and whether it is a holiday or not. In this practical, we will create
machine learning models to predict rental count and whether it has snow or not.
Download the database backup file SkiRentalDB.bak and install it in your SQL Server instance
by restoring the backup file. You can refer to Week 3 practical working sheet to restore the
database.
Before creating the machine learning models, we first create a table in the SkiRentalDB database
for storing the machine learning models that you are going to create using the database
SkiRentalDB.
Start SQL Server and connect to SQL Server Database Engine, then open a New Query panel.
USE SkiRentalDB;
DROP TABLE IF EXISTS dbo.ski_rental_models;
GO
CREATE TABLE dbo.ski_rental_models (
model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY,
model VARBINARY(MAX) NOT NULL
);
Make sure you use SkiRentalDB database for the following tasks, add the following statement on
top of your code.
USE SkiRentalDB
Task 1: Create a machine learning model to predict rental count using a
linear regression algorithm
1) Create a stored procedure called generate_rental_count_model to generate a trained
machine learning model using a linear regression algorithm. This model is to predict rental
count. linear regression algorithm is a classification algorithm particularly for predicting
numerical values.
The stored procedure has one output parameter @trained_model. It will return the
trained model as an output parameter.
The Python script in the procedure
o Import the LinearRegression algorithm from sklearn package
o Get training data including all the features and the target labels
o Create a LinearRegression model
o Train the model using the training data and target labels
o Convert the trained model to a binary object, i.e., to serialize the model
After the execution, you can find the stored procedure in the database.
Note, when retrieve the training data from table dbo.rental_data, only the records before
year 2015 were retrieved.
@input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay",
"Snow", "Holiday" from dbo.rental_data where Year < 2015'
2) Run the following code to execute the stored procedure to train the model and to insert
the trained model into the table ski_rental_models
-- Execute the stored procedure to generate a trained model and insert it into table
rental_py_models
DECLARE @model VARBINARY(MAX);
EXECUTE generate_rental_count_model @model OUTPUT;
Check the table ski_rental_models, you can find a model called 'LR_model
predicting count’ in the table
This stored procedure has one input parameter which is the name of the trained model to
be used in this procedure.
The procedure first retrieves the trained model from the table ski_rental_models using
the model name given in the input parameter, then uses a Python script to generate the
predictions.
The Python script in the procedure
• Import metrics from sklearn package for model evaluation
• Deserialize the trained model
• Get the testing data. Note, the test data are records in the year of 2015.
• Generate predictions to the testing data and store the predictions
• Assign the predictions to the output variable OutputDataSet
• Calculate accuracy of the predictions
EXECUTE sp_execute_external_script
@language = N'Python',
@script = N'
# Import the scikit-learn function to compute error.
from sklearn.metrics import mean_squared_error
import numpy
import pickle
import pandas
# Assign the predictions and the true labels to output variable OutputDataSet,
OutputDataSet = rental_score_data[["RentalCount","PredictedCount"]]
#print(OutputDataSet)
# Compute error between the test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, rental_score_data["RentalCount"])
print("Mean squared error: %.2f" % lin_mse)
'
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow",
"Holiday" from rental_data where Year = 2015'
, @input_data_1_name = N'rental_score_data'
, @params = N'@py_model varbinary(max)'
, @py_model = @py_model
with result sets (("RentalCount" float, "RentalCount_Predicted" float));
END;
Check the database, you can find the stored procedure
predict_rentalcount
In Task 2, the target column will not be included as an input column for training the
model.
Task2: Create a machine learning model to predict snow using a decision tree
algorithm
1) Create a stored procedure called generate_rental_snow_model to generate a trained
machine learning model using a decision tree algorithm. This model can be used to predict
whether it has snow or not.
GO
-- A Stored procedure that trains and generates a Python model using the rental_data
and a decision tree algorithm
CREATE PROCEDURE generate_rental_snow_model (@trained_model varbinary(max) OUTPUT)
AS
BEGIN
EXECUTE sp_execute_external_script
@language = N'Python'
, @script = N'
After the execution, you can find the stored procedure in the database.
2) Run the following code to execute the stored procedure to train the model and to insert the
trained model into the table ski_rental_models
The same as the stored procedure predict_rentalcount, this stored procedure has one
input parameter which is the name of the trained model to be used in this procedure.
The procedure first retrieves the trained model from the table ski_rental_models using
the model name given in the input parameter, then uses a Python script to generate the
predictions.
The Python script in the procedure:
• Import metrics from sklearn package for model evaluation
• Deserialize the trained model
The following 4 items in the given code below are incomplete (highlighted in red),
you need to complete them and then execute the code
• Get the testing data. For this model, ‘Snow’ is not included as an input feature.
• Generate predictions to the testing data and store the predictions
• Store the predictions to rental-score_data variable
• Assign the predictions to the output variable OutputDataSet
For the evaluation, in addition to accuracy, we will calculate other metrics including
precision, recall and F1 score. This item is complete.
• Evaluation of the predictions
EXECUTE sp_execute_external_script
@language = N'Python',
@script = N'
# Assign the predictions and the true labels to output variable OutputDataSet
### add your code
# Performance evaluation
print("\n Metrics.Accuracy=", metrics.accuracy_score(test_label, snow_predictions))
print("\n Metrics.precision_score=", metrics.precision_score(test_label,
snow_predictions, average = "weighted"))
print("\n Metrics.recall_score=", metrics.recall_score(test_label, snow_predictions,
average = "weighted"))
print("\n Metrics.f1 score=", metrics.f1_score(test_label, snow_predictions, average
= "weighted"))
'
, @input_data_1 = N'Select "RentalCount", "Year","Month", "Day", "WeekDay", "Snow",
"Holiday" from rental_data where Year = 2015'
, @input_data_1_name = N'rental_score_data'
, @params = N'@py_model varbinary(max)'
, @py_model = @py_model
with result sets (("Snow" float, "Snow_Predicted" float));
END;
After the execution, check the database, you can find the stored procedure predict_snow
4) Write an execution statement to execute the stored procedure predict_snow to generate
snow predictions. The output should be something like the following:
Task3: Create a stored procedure to predict snow for an external test dataset
In this task, you will write a stored procedure to generate snow predictions for an external
test dataset using the classification model 'DT_model for Snow Prediction' created in
Task 2, which has been stored in table ski_rental_models.
This procedure is very similar to the procedure predict_snow. You can make a copy of
the procedure, then modify the copy to use an external test dataset.
1) Make a copy of the stored procedure predict_snow, rename the procedure in the copy as
predict_snow_testData.
Different from the procedure predict_snow, this stored procedure has two input
parameters. The first parameter @model is the name of the trained model to be used in this
procedure, the second parameter @test_data is the input external dataset.
After the revision, execute the code. Check the database, you can find the stored
procedure predict_snow_testData
If your test dataset contains records of 2015, you will get the same prediction output as
procedure predict_snow did.