0% found this document useful (0 votes)
12 views11 pages

CAB430 Practical Week9_2025

This document outlines a practical exercise for using SQL Server Machine Learning with Python to create predictive models based on ski rental data. It includes steps for creating machine learning models using linear regression and decision tree algorithms to predict rental counts and snowfall, respectively, along with the necessary SQL procedures and Python scripts. The document also emphasizes the importance of excluding target columns during model training for accurate predictions.

Uploaded by

RIYA AMIT MAHATO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views11 pages

CAB430 Practical Week9_2025

This document outlines a practical exercise for using SQL Server Machine Learning with Python to create predictive models based on ski rental data. It includes steps for creating machine learning models using linear regression and decision tree algorithms to predict rental counts and snowfall, respectively, along with the necessary SQL procedures and Python scripts. The document also emphasizes the importance of excluding target columns during model training for accurate predictions.

Uploaded by

RIYA AMIT MAHATO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CAB430 Data and Information Integration

Week 9 Practical

SQL Server Machine Learning in Python


In this practical, you will write Python scripts to complete prediction tasks using linear regression
algorithm or decision tree algorithm in SQL Server.

We will use a small database called SkiRentalDB. It has one table called rental_data. Each
row in the table provides ski rental information in a particular day. The information includes time
(i.e., year, month, day, weekday), rental count (i.e., how many rentals in the particular day),
whether it has snow or not, and whether it is a holiday or not. In this practical, we will create
machine learning models to predict rental count and whether it has snow or not.

Download the database backup file SkiRentalDB.bak and install it in your SQL Server instance
by restoring the backup file. You can refer to Week 3 practical working sheet to restore the
database.

Before creating the machine learning models, we first create a table in the SkiRentalDB database
for storing the machine learning models that you are going to create using the database
SkiRentalDB.

Start SQL Server and connect to SQL Server Database Engine, then open a New Query panel.

Execute the following SQL statement to create a table called dbo.ski_rental_models.

USE SkiRentalDB;
DROP TABLE IF EXISTS dbo.ski_rental_models;
GO
CREATE TABLE dbo.ski_rental_models (
model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY,
model VARBINARY(MAX) NOT NULL
);

Make sure you use SkiRentalDB database for the following tasks, add the following statement on
top of your code.
USE SkiRentalDB
Task 1: Create a machine learning model to predict rental count using a
linear regression algorithm
1) Create a stored procedure called generate_rental_count_model to generate a trained
machine learning model using a linear regression algorithm. This model is to predict rental
count. linear regression algorithm is a classification algorithm particularly for predicting
numerical values.

Execute the following code to create the stored procedure

DROP PROCEDURE IF EXISTS generate_rental_count_model;


GO
-- A Stored procedure that trains and generates a Python model using the
rental_data and a linear regression algorithm
CREATE PROCEDURE generate_rental_count_model (@trained_model
varbinary(max) OUTPUT)
AS
BEGIN
EXECUTE sp_execute_external_script
@language = N'Python'
, @script = N'
from sklearn.linear_model import LinearRegression
import pickle

# Get the training data, including all the features


training_features = rental_train_data[["RentalCount", "Year", "Month",
"Day", "WeekDay", "Snow", "Holiday"]]

# Get the target labels, "RentalCount"


training_target = rental_train_data[["RentalCount"]].values.ravel()

# Initialize the model class


LR_model = LinearRegression()

# Fit the model to the training data


LR_model.fit(training_features, training_target)

# Before saving the model to the DB table, convert it to a binary object


trained_model = pickle.dumps(LR_model)
'
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day",
"WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
, @input_data_1_name = N'rental_train_data'
, @params = N'@trained_model varbinary(max) OUTPUT'
, @trained_model = @trained_model OUTPUT;
END;

The stored procedure has one output parameter @trained_model. It will return the
trained model as an output parameter.
The Python script in the procedure
o Import the LinearRegression algorithm from sklearn package
o Get training data including all the features and the target labels
o Create a LinearRegression model
o Train the model using the training data and target labels
o Convert the trained model to a binary object, i.e., to serialize the model
After the execution, you can find the stored procedure in the database.

Note, when retrieve the training data from table dbo.rental_data, only the records before
year 2015 were retrieved.
@input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay",
"Snow", "Holiday" from dbo.rental_data where Year < 2015'

2) Run the following code to execute the stored procedure to train the model and to insert
the trained model into the table ski_rental_models

-- Execute the stored procedure to generate a trained model and insert it into table
rental_py_models
DECLARE @model VARBINARY(MAX);
EXECUTE generate_rental_count_model @model OUTPUT;

-- Remove 'LR_model predicting count' from ski_rental_models if it exists


DELETE ski_rental_models WHERE model_name = 'LR_model predicting count';

INSERT INTO ski_rental_models (model_name, model) VALUES('LR_model predicting


count', @model);

Check the table ski_rental_models, you can find a model called 'LR_model
predicting count’ in the table

3) Create a stored procedure called predict_rentalcount to make predictions using a


trained model and a set of testing data.

This stored procedure has one input parameter which is the name of the trained model to
be used in this procedure.
The procedure first retrieves the trained model from the table ski_rental_models using
the model name given in the input parameter, then uses a Python script to generate the
predictions.
The Python script in the procedure
• Import metrics from sklearn package for model evaluation
• Deserialize the trained model
• Get the testing data. Note, the test data are records in the year of 2015.
• Generate predictions to the testing data and store the predictions
• Assign the predictions to the output variable OutputDataSet
• Calculate accuracy of the predictions

The following code is to create the stored procedure. Execute it.


DROP PROCEDURE IF EXISTS predict_rentalcount;
GO

-- A stored procedure that makes predictions


CREATE PROCEDURE predict_rentalcount (@model varchar(100))
AS
BEGIN
--Specify a variable which is the trained model retrieved from table
ski_rental_models
DECLARE @py_model varbinary(max) = (select model from ski_rental_models where
model_name = @model);

EXECUTE sp_execute_external_script
@language = N'Python',
@script = N'
# Import the scikit-learn function to compute error.
from sklearn.metrics import mean_squared_error

import numpy
import pickle
import pandas

# Deserialize the trained model


rental_model = pickle.loads(py_model)

# Get the features of the testing data


test_cases = rental_score_data[["RentalCount", "Year", "Month", "Day", "WeekDay",
"Snow", "Holiday"]]

# Generate the predictions to the testing data.


lin_predictions = rental_model.predict(test_cases)
print(lin_predictions)

# Store the predictions into rental_score_data as a new column


rental_score_data["PredictedCount"] = lin_predictions

# Assign the predictions and the true labels to output variable OutputDataSet,
OutputDataSet = rental_score_data[["RentalCount","PredictedCount"]]
#print(OutputDataSet)

# Compute error between the test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, rental_score_data["RentalCount"])
print("Mean squared error: %.2f" % lin_mse)
'
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow",
"Holiday" from rental_data where Year = 2015'
, @input_data_1_name = N'rental_score_data'
, @params = N'@py_model varbinary(max)'
, @py_model = @py_model
with result sets (("RentalCount" float, "RentalCount_Predicted" float));
END;
Check the database, you can find the stored procedure
predict_rentalcount

4) Execute the stored procedure predict_rentalcount to generate predictions, pass


'LR_model predicting count' as the input parameter value

EXEC predict_rentalcount 'LR_model predicting count'


The prediction looks perfect. However, this prediction is not reliable. This is because the
target column was included as an input column in the training, which is incorrect for this
task. You can have a try to exclude the target column from the input data for training.
The prediction result will be significantly worse.

In Task 2, the target column will not be included as an input column for training the
model.

Task2: Create a machine learning model to predict snow using a decision tree
algorithm
1) Create a stored procedure called generate_rental_snow_model to generate a trained
machine learning model using a decision tree algorithm. This model can be used to predict
whether it has snow or not.

The same as the stored procedure generate_rental_count_model, this procedure has


one output parameter @trained_model. It will return the trained model as an output
parameter.
The Python script in the procedure:
o Import the DecisionTreeClassifier algorithm from sklearn package
o Get training data including all the features and the target label
For this model, we don’t include feature ‘Snow’ in the training data
The target label is ‘Snow’
The following three items are incomplete in the given code (highlighted in red).
Complete the code and execute it to generate the model.
o Create a DecisionTreeClassifier model
o Train the model using the training data and target labels
o Convert the trained model to a binary object, i.e., to serialize the model

DROP PROCEDURE IF EXISTS generate_rental_snow_model;

GO

-- A Stored procedure that trains and generates a Python model using the rental_data
and a decision tree algorithm
CREATE PROCEDURE generate_rental_snow_model (@trained_model varbinary(max) OUTPUT)
AS
BEGIN
EXECUTE sp_execute_external_script
@language = N'Python'
, @script = N'

from sklearn.tree import DecisionTreeClassifier


import pickle

# Get the training data, the features without "Snow"


training_features = rental_train_data[["RentalCount", "Year", "Month", "Day",
"WeekDay", "Holiday"]]

# Get the target labels, "Snow"


training_target = rental_train_data[["Snow"]].values.ravel()
# Initialize the model class.
### add your code

# Fit the model to the training data.


### add your code

# Before saving the model to the DB table, convert it to a binary object


### add your code
'
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow",
"Holiday" from dbo.rental_data where Year < 2015'
, @input_data_1_name = N'rental_train_data'
, @params = N'@trained_model varbinary(max) OUTPUT'
, @trained_model = @trained_model OUTPUT;
END;

After the execution, you can find the stored procedure in the database.

2) Run the following code to execute the stored procedure to train the model and to insert the
trained model into the table ski_rental_models

DECLARE @model VARBINARY(MAX);


EXECUTE generate_rental_snow_model @model OUTPUT;

-- Remove 'DT_model for Snow Prediction' from ski_rental_models if it exists


DELETE ski_rental_models WHERE model_name = 'DT_model for Snow Prediction';

INSERT INTO ski_rental_models (model_name, model) VALUES('DT_model for Snow Prediction',


@model);
Check the table ski_rental_models, you can find a model called 'DT_model for Snow
Prediction’ in the table
3) Create a stored procedure called predict_snow to make predictions using a trained model
and a set of testing data.

The same as the stored procedure predict_rentalcount, this stored procedure has one
input parameter which is the name of the trained model to be used in this procedure.
The procedure first retrieves the trained model from the table ski_rental_models using
the model name given in the input parameter, then uses a Python script to generate the
predictions.
The Python script in the procedure:
• Import metrics from sklearn package for model evaluation
• Deserialize the trained model

The following 4 items in the given code below are incomplete (highlighted in red),
you need to complete them and then execute the code
• Get the testing data. For this model, ‘Snow’ is not included as an input feature.
• Generate predictions to the testing data and store the predictions
• Store the predictions to rental-score_data variable
• Assign the predictions to the output variable OutputDataSet

For the evaluation, in addition to accuracy, we will calculate other metrics including
precision, recall and F1 score. This item is complete.
• Evaluation of the predictions

Complete the code and execute it


DROP PROCEDURE IF EXISTS predict_snow;
GO
-- A stored procedure that makes predictions
CREATE PROCEDURE predict_snow (@model varchar(100))
AS
BEGIN
--Specify a variable which is the trained model retrieved from table
ski_rental_models
DECLARE @py_model varbinary(max) = (select model from ski_rental_models where
model_name = @model);

EXECUTE sp_execute_external_script
@language = N'Python',
@script = N'

# Import the scikit-learn functions for evaluation.


from sklearn import metrics
import numpy
import pickle
import pandas

# Deserialize the trained model


rental_model = pickle.loads(py_model)

# Get the features of the testing data


### test_cases = add your code

# Generate the predictions to the testing data.


### add your code
# Store the predictions into rental_score_data as a new column
### add your code

# Assign the predictions and the true labels to output variable OutputDataSet
### add your code

# "Snow" column is the target column, true labels


test_label = numpy.ravel(rental_score_data[["Snow"]])

# Performance evaluation
print("\n Metrics.Accuracy=", metrics.accuracy_score(test_label, snow_predictions))
print("\n Metrics.precision_score=", metrics.precision_score(test_label,
snow_predictions, average = "weighted"))
print("\n Metrics.recall_score=", metrics.recall_score(test_label, snow_predictions,
average = "weighted"))
print("\n Metrics.f1 score=", metrics.f1_score(test_label, snow_predictions, average
= "weighted"))
'
, @input_data_1 = N'Select "RentalCount", "Year","Month", "Day", "WeekDay", "Snow",
"Holiday" from rental_data where Year = 2015'
, @input_data_1_name = N'rental_score_data'
, @params = N'@py_model varbinary(max)'
, @py_model = @py_model
with result sets (("Snow" float, "Snow_Predicted" float));
END;

After the execution, check the database, you can find the stored procedure predict_snow
4) Write an execution statement to execute the stored procedure predict_snow to generate
snow predictions. The output should be something like the following:

Task3: Create a stored procedure to predict snow for an external test dataset

In this task, you will write a stored procedure to generate snow predictions for an external
test dataset using the classification model 'DT_model for Snow Prediction' created in
Task 2, which has been stored in table ski_rental_models.

This procedure is very similar to the procedure predict_snow. You can make a copy of
the procedure, then modify the copy to use an external test dataset.

1) Make a copy of the stored procedure predict_snow, rename the procedure in the copy as
predict_snow_testData.

Different from the procedure predict_snow, this stored procedure has two input
parameters. The first parameter @model is the name of the trained model to be used in this
procedure, the second parameter @test_data is the input external dataset.

CREATE PROCEDURE predict_snow_testData (@model varchar(100), @test_data


NVARCHAR(max))
2) Revise the procedure to use the external test dataset for the predictions. You can keep
much of the code unchanged, only need to change the assignment to the input variable
because the input testing data comes from the parameter @test_data rather than using a
SELECT statement.

After the revision, execute the code. Check the database, you can find the stored
procedure predict_snow_testData

3) Write an execution statement to execute the stored procedure predict_snow_testData to


generate snow predictions for an input test dataset. Before that, you need to create a test
dataset, and then pass the test dataset to the procedure as an input parameter.

If your test dataset contains records of 2015, you will get the same prediction output as
procedure predict_snow did.

You might also like