A 3

This document describes the process of training a neural network model for classification using stochastic gradient descent. It defines a function that takes in hyperparameters and data to train the model. The function initializes the model weights, loads the training and validation data, and then performs a loop of gradient descent updates for the specified number of iterations. It calculates the loss on each batch and at the end of each epoch. It implements early stopping based on the validation loss. Once training is complete, it evaluates the final model on the test set and saves the trained model weights.

Uploaded by

Cuong Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

387 views

A 3

Uploaded by

Cuong Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

function a3(wd_coefficient, n_hid, n_iters, learning_rate, momentum_multiplier, do_early_stopping, mini_batch_size) % wd_coefficient Weight decay coefficient % n_hid Number of hidden

units % n_iters Number of iterations % learing_rate % momentum_multiplier % do_early_stopping % mini_batch_size warning('error', 'Octave:broadcast'); if exist('page_output_immediately'), page_output_immediately(1); end more off; model = initial_model(n_hid); %Initialize the weights from_data_file = load('data.mat'); datas = from_data_file.data; n_training_cases = size(datas.training.inputs, 2); %size(data.training. inputs)=(256 1000) %if n_iters ~= 0, test_gradient(model, datas.training, wd_coefficient); end %test if the code for the gradient is ok % optimization theta = model_to_theta(model); momentum_speed = theta * 0; training_data_losses = []; validation_data_losses = []; if do_early_stopping, best_so_far.theta = -1; % this will be overwritten soon best_so_far.validation_loss = inf; best_so_far.after_n_iters = -1; end for optimization_iteration_i = 1:n_iters, fprintf('%d ',optimization_iteration_i); model = theta_to_model(theta); %Prepare the batcth for learning training_batch_start = mod((optimization_iteration_i-1) * mini_batch_size, n _training_cases)+1; training_batch.inputs = datas.training.inputs(:, training_batch_start : trai ning_batch_start + mini_batch_size - 1); training_batch.outputs = datas.training.outputs(:, training_batch_start : tr aining_batch_start + mini_batch_size - 1); %Compute the gradient and update the weights gradient = model_to_theta(d_loss_by_d_model(model, training_batch, wd_coeffi cient)); momentum_speed = momentum_speed * momentum_multiplier - gradient; theta = theta + momentum_speed * learning_rate; model = theta_to_model(theta); training_data_losses = [training_data_losses, loss(model, datas.training, wd _coefficient)]; validation_data_losses = [validation_data_losses, loss(model, datas.validati on, wd_coefficient)]; if do_early_stopping && validation_data_losses(end) < best_so_far.validation _loss, best_so_far.theta = theta; % this will be overwritten soon best_so_far.validation_loss = validation_data_losses(end); best_so_far.after_n_iters = optimization_iteration_i; end if mod(optimization_iteration_i, round(n_iters/10)) == 0, fprintf('\nAfter %d optimization iterations, training data loss is %f, and

validation data loss is %f\n', optimization_iteration_i, training_data_losses(e nd), validation_data_losses(end)); end end if n_iters ~= 0, test_gradient(model, datas.training, wd_coefficient); end % c heck again, this time with more typical parameters if do_early_stopping, fprintf('Early stopping: validation loss was lowest after %d iterations. We chose the model that we had then.\n', best_so_far.after_n_iters); theta = best_so_far.theta; end % the optimization is finished. Now do some reporting. model = theta_to_model(theta); clf; %Clear current figure window hold on; plot(training_data_losses, 'b'); plot(validation_data_losses, 'r'); legend('training', 'validation'); ylabel('loss'); xlabel('iteration number'); hold off; datas2 = {datas.training, datas.validation, datas.test}; data_names = {'training', 'validation','test'}; for data_i = 1:3, data = datas2{data_i}; data_name = data_names{data_i}; fprintf('\nThe loss on the %s data is %f\n', data_name, loss(model, data, wd _coefficient)); if wd_coefficient~=0, fprintf('The classification loss (i.e. without weight decay) on the %s dat a is %f\n', data_name, loss(model, data, 0)); end fprintf('The classification error rate on the %s data is %f\n', data_name, c lassification_performance(model, data)); end save -mat7-binary model.mat model; end % test_gradient is a function used to test our implementation of the % gradient calculation function test_gradient(model, data, wd_coefficient) base_theta = model_to_theta(model); h = 1e-2; correctness_threshold = 1e-5; analytic_gradient = model_to_theta(d_loss_by_d_model(model, data, wd_coefficie nt)); % Test the gradient not for every element of theta, because that's a lot of wo rk. Test for only a few elements. for i = 1:100, test_index = mod(i * 1299721, size(base_theta,1)) + 1; % 1299721 is prime an d thus ensures a somewhat random-like selection of indices analytic_here = analytic_gradient(test_index); theta_step = base_theta * 0; theta_step(test_index) = h; contribution_distances = [-4:-1, 1:4]; contribution_weights = [1/280, -4/105, 1/5, -4/5, 4/5, -1/5, 4/105, -1/280]; temp = 0; for contribution_index = 1:8, temp = temp + loss(theta_to_model(base_theta + theta_step * contribution_d istances(contribution_index)), data, wd_coefficient) * contribution_weights(cont

ribution_index); end fd_here = temp / h; diff = abs(analytic_here - fd_here); % fprintf('%d %e %e %e %e\n', test_index, base_theta(test_index), diff, fd_h ere, analytic_here); if diff < correctness_threshold, continue; end if diff / (abs(analytic_here) + abs(fd_here)) < correctness_threshold, conti nue; end error(sprintf('Theta element #%d, with value %e, has finite difference gradi ent %e but analytic gradient %e. That looks like an error.\n', test_index, base_ theta(test_index), fd_here, analytic_here)); end fprintf('Gradient test passed. That means that the gradient that your code comp uted is within 0.001%% of the gradient that the finite difference approximation computed, so the gradient calculation procedure is probably correct (not certain ly, but probably).\n'); end function ret = logistic(input) ret = 1 ./ (1 + exp(-input)); end function ret = log_sum_exp_over_rows(a) % This computes log(sum(exp(a), 1)) in a numerically stable way maxs_small = max(a, [], 1); maxs_big = repmat(maxs_small, [size(a, 1), 1]); ret = log(sum(exp(a - maxs_big), 1)) + maxs_small; end function ret = loss(model, data, wd_coefficient) % model.input_to_hid is a matrix of size (n_hid,256) % model.hid_to_class is a matrix of size (10,256) % data.inputs is a matrix of size (256,<number of data cases>) % data.outputs is a matrix of size (10,<number of data cases>) % first, do the forward pass, i.e. calculate a variety of relevant values hid_in = model.input_to_hid * data.inputs; % input to the hidden units, i.e. b efore the logistic. size: (n_hid,<number of data cases>) hid_out = logistic(hid_in); % output of the hidden units, i.e. after the logis tic. size: (n_hid,<number of data cases>) class_in = model.hid_to_class * hid_out; % input to the components of the soft max. size: (10, <number of data cases>) class_normalizer = log_sum_exp_over_rows(class_in); % log(sum(exp)) is what we subtract to get normalized log class probabilities. size: (1,<number of data ca ses>) log_class_prob = class_in - repmat(class_normalizer, [size(class_in, 1), 1]); % log of probability of each class. size: (10, <number of data cases>) class_out = exp(log_class_prob); % probability of each class. Each column (i.e . each case) sums to 1. size: (10, <number of data cases>) classification_loss = -mean(sum(log_class_prob .* data.outputs, 1)); % select the cross entropy right log class probability using that sum; then take the mean over all data cases. wd_loss = sum(model_to_theta(model).^2)/2*wd_coefficient; % very straightforwa rd: E = 1/2 * lambda * theta^2 ret = classification_loss + wd_loss; end function ret = d_loss_by_d_model(model, data, wd_coefficient)

% % % %

model.input_to_hid is a matrix of size (n_hid,256) model.hid_to_class is a matrix of size (10,n_hid) data.inputs is a matrix of size (256,<number of data cases>) data.outputs is a matrix of size (10,<number of data cases>)

% The returned object is supposed to be exactly like parameter <model>, i.e. i t has fields ret.input_to_hid and ret.hid_to_class. However, the contents of tho se matrices are gradients (d loss by d model parameter), instead of model parame ters. % This is the only function that you're expected to change. Right now, it just returns a lot of zeros, which is obviously not the correct output. Your job is to change that. ret.input_to_hid = model.input_to_hid * 0; ret.hid_to_class = model.hid_to_class * 0; % first, do the forward pass, i.e. calculate a variety of relevant values hid_in = model.input_to_hid * data.inputs; % input to the hidden units, i.e. b efore the logistic. size: (n_hid,<number of data cases>) hid_out = logistic(hid_in); % output of the hidden units, i.e. after the logis tic. size: (n_hid,<number of data cases>) class_in = model.hid_to_class * hid_out; % input to the components of the soft max. size: (10, <number of data cases>) class_normalizer = log_sum_exp_over_rows(class_in); % log(sum(exp)) is what we subtract to get normalized log class probabilities. size: (1,<number of data ca ses>) log_class_prob = class_in - repmat(class_normalizer, [size(class_in, 1), 1]); % log of probability of each class. size: (10, <number of data cases>) class_out = exp(log_class_prob); % probability of each class. Each column (i.e . each case) sums to 1. size: (10, <number of data cases>) error_deriv = class_out - data.outputs; %Error derivate w.r.t. zj size (10,<n umber of data cases> hid_to_output_weights_gradient = []; for i=1:size(error_deriv,2), hid_to_output_weights_gradient(:,:,i) = hid_out(:,i)*error_deriv(:,i)'; %G radient size must be(n_hid,10,<number of data cases> end hid_to_output_weights_gradient = mean(hid_to_output_weights_gradient,3); %We need the mean ret.hid_to_class = hid_to_output_weights_gradient'; %traspose to fit into th e model backpropagate_error_deriv = model.hid_to_class'*error_deriv; %size(n_hid,<nu mber of data cases>) %Not sure about this input_to_hidden_weights_gradient = []; for i=1:size(backpropagate_error_deriv,2), input_to_hidden_weights_gradient(:,:,i) = data.inputs(:,i)*((1-hid_out(:,i )).*hid_out(:,i).*backpropagate_error_deriv(:,i))'; %Gradient size must be(n_hid ,10,<number of data cases> end input_to_hidden_weights_gradient = mean (input_to_hidden_weights_gradient,3); ret.input_to_hid=input_to_hidden_weights_gradient'; ret.input_to_hid = ret.input_to_hid + model.input_to_hid * wd_coefficient; %ret.input_to_hid Size(256, n_hid) ret.hid_to_class = ret.hid_to_class + model.hid_to_class * wd_coefficient; %ret.hid_to_class Size(n_hid,10) end

%Theta is a column vector that holds the weights %Model contains two matrix (,) with the weights function ret = theta_to_model(theta) n_hid = size(theta, 1) / (256+10); ret.input_to_hid = transpose(reshape(theta(1: 256*n_hid), 256, n_hid)); ret.hid_to_class = reshape(theta(256 * n_hid + 1 : size(theta,1)), n_hid, 10). '; end function ret = model_to_theta(model) input_to_hid_transpose = transpose(model.input_to_hid); hid_to_class_transpose = transpose(model.hid_to_class); ret = [input_to_hid_transpose(:); hid_to_class_transpose(:)]; end function ret = initial_model(n_hid) n_params = (256+10) * n_hid; as_row_vector = cos(0:(n_params-1)); ret = theta_to_model(as_row_vector(:) * 0.1); % We don't use random initializa tion, for this assignment. This way, everybody will get the same results. end function ret = classification_performance(model, data) % This returns the fraction of data cases that is incorrectly classified by th e model. hid_in = model.input_to_hid * data.inputs; % input to the hidden units, i.e. b efore the logistic. size: <number of hidden units> by <number of data cases> hid_out = logistic(hid_in); % output of the hidden units, i.e. after the logis tic. size: <number of hidden units> by <number of data cases> class_in = model.hid_to_class * hid_out; % input to the components of the soft max. size: <number of classes, i.e. 10> by <number of data cases> [dump, choices] = max(class_in); % choices is integer: the chosen class, plus 1. [dump, targets] = max(data.outputs); % targets is integer: the target class, p lus 1. ret = mean(choices ~= targets); end

Backpropagation: Loading Data
No ratings yet
Backpropagation: Loading Data
12 pages
Adaline Sgd
No ratings yet
Adaline Sgd
4 pages
'/content/data - PKL' 'RB': Open Print
No ratings yet
'/content/data - PKL' 'RB': Open Print
5 pages
Linear Regr Gd
No ratings yet
Linear Regr Gd
3 pages
Ex4 Tutorial - Forward and Back-Propagation
No ratings yet
Ex4 Tutorial - Forward and Back-Propagation
20 pages
Experiment No
No ratings yet
Experiment No
29 pages
Week 7 - Lab
No ratings yet
Week 7 - Lab
6 pages
Machine Learning Lab (3) Report (21 CP 81)
No ratings yet
Machine Learning Lab (3) Report (21 CP 81)
7 pages
niraj dl
No ratings yet
niraj dl
15 pages
R Deep Neural Network Step by Step
No ratings yet
R Deep Neural Network Step by Step
27 pages
Code
No ratings yet
Code
8 pages
Matlab Programs
No ratings yet
Matlab Programs
40 pages
Week 2
No ratings yet
Week 2
17 pages
Homework2
No ratings yet
Homework2
3 pages
Index: Name - JINESH PRAJAPAT Class - B. Tech, III Year Branch - AI & DS Sem - V
No ratings yet
Index: Name - JINESH PRAJAPAT Class - B. Tech, III Year Branch - AI & DS Sem - V
35 pages
Sample Final Exam Solutions
No ratings yet
Sample Final Exam Solutions
30 pages
NNDL File 1-6
No ratings yet
NNDL File 1-6
27 pages
practicalMachineLearning_lecture3
No ratings yet
practicalMachineLearning_lecture3
25 pages
Nnfuzzysampleprograms
No ratings yet
Nnfuzzysampleprograms
9 pages
Lecture 6 - Multi-Layer Feedforward Neural Networks Using Matlab Part 2
No ratings yet
Lecture 6 - Multi-Layer Feedforward Neural Networks Using Matlab Part 2
3 pages
Iai 11 Ex3 Solution
No ratings yet
Iai 11 Ex3 Solution
5 pages
Additional Assignment
No ratings yet
Additional Assignment
7 pages
Implementation
No ratings yet
Implementation
14 pages
IBest_DeepLearning
No ratings yet
IBest_DeepLearning
123 pages
logistic-regression
No ratings yet
logistic-regression
4 pages
HW 3
No ratings yet
HW 3
4 pages
ML Labs
No ratings yet
ML Labs
46 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
05_optimization_basics
No ratings yet
05_optimization_basics
94 pages
HW 3
No ratings yet
HW 3
12 pages
contigency
No ratings yet
contigency
9 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Week2 DL
No ratings yet
Week2 DL
29 pages
L3_CSE256_FA24_FFN
No ratings yet
L3_CSE256_FA24_FFN
64 pages
CS335 Lab6
No ratings yet
CS335 Lab6
7 pages
Assignment 1: Q1. Task Description
No ratings yet
Assignment 1: Q1. Task Description
12 pages
deeplg3
No ratings yet
deeplg3
8 pages
Take It Easy: Created Status Last Read
No ratings yet
Take It Easy: Created Status Last Read
55 pages
DeepLearning Practice Question Answers
No ratings yet
DeepLearning Practice Question Answers
43 pages
paper
No ratings yet
paper
7 pages
DL ppt
No ratings yet
DL ppt
110 pages
Wine Classification
No ratings yet
Wine Classification
10 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Exp 4
No ratings yet
Exp 4
9 pages
Ai Assignment 2 Answer
No ratings yet
Ai Assignment 2 Answer
12 pages
Software Laboratory II Code
No ratings yet
Software Laboratory II Code
27 pages
Assignment 6 (14 10 24)
No ratings yet
Assignment 6 (14 10 24)
2 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Develop The Following Programs in The MATLAB Environment
No ratings yet
Develop The Following Programs in The MATLAB Environment
7 pages
ML Assignment-9
No ratings yet
ML Assignment-9
4 pages
Labtask1: Generatedataset
No ratings yet
Labtask1: Generatedataset
12 pages
Lab Manual DL (New)
No ratings yet
Lab Manual DL (New)
89 pages
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
No ratings yet
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
13 pages
Adaptive Linear Neuron Using Linear (Identity) Activation Function With Batch Gradient Method
No ratings yet
Adaptive Linear Neuron Using Linear (Identity) Activation Function With Batch Gradient Method
19 pages
ML_Lec-23
No ratings yet
ML_Lec-23
20 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Soft Computing Practical Teacher Manual
No ratings yet
Soft Computing Practical Teacher Manual
87 pages
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Course Code: DX DX DX X A X A A X
No ratings yet
Course Code: DX DX DX X A X A A X
2 pages
Notes On Plane Wave Expansions
No ratings yet
Notes On Plane Wave Expansions
6 pages
Ap Calculus MVT Worksheet 3: 08/7,3/ (&+2,& (Í UDSKLQJ&DOFXODWRU3HUPLWWHG
No ratings yet
Ap Calculus MVT Worksheet 3: 08/7,3/ (&+2,& (Í UDSKLQJ&DOFXODWRU3HUPLWWHG
5 pages
1.05 Graphing Polynomial Functions Using Roots (Filled In) PDF
No ratings yet
1.05 Graphing Polynomial Functions Using Roots (Filled In) PDF
2 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
EE 564-Stochastic Systems-Momin Uppal
No ratings yet
EE 564-Stochastic Systems-Momin Uppal
3 pages
430 - C - B Mathematics Basic
No ratings yet
430 - C - B Mathematics Basic
12 pages
Examination Paper For TTT4120 Digital Signal Processing: Department of Electronic Systems
No ratings yet
Examination Paper For TTT4120 Digital Signal Processing: Department of Electronic Systems
7 pages
Geogebra Quickstart en Desktop
No ratings yet
Geogebra Quickstart en Desktop
10 pages
Pappus Theorum
No ratings yet
Pappus Theorum
2 pages
1500 Goel f17
No ratings yet
1500 Goel f17
4 pages
Statistics and Computer: Tools For Analyzing of Assessment Data
No ratings yet
Statistics and Computer: Tools For Analyzing of Assessment Data
10 pages
NURBS Overview: Goutam Banerjee 9831631949
No ratings yet
NURBS Overview: Goutam Banerjee 9831631949
6 pages
Thesis Ariyadi Wijaya LR
No ratings yet
Thesis Ariyadi Wijaya LR
209 pages
Download Applying Mathematics: Immersion, Inference, Interpretation Otavio Bueno ebook All Chapters PDF
100% (9)
Download Applying Mathematics: Immersion, Inference, Interpretation Otavio Bueno ebook All Chapters PDF
66 pages
Fall 2024_MTH644_1
No ratings yet
Fall 2024_MTH644_1
1 page
Topic Wise s2
No ratings yet
Topic Wise s2
82 pages
2022 WMI Prelim G05 Paper A
100% (1)
2022 WMI Prelim G05 Paper A
4 pages
CS502 Quiz-2 File by Vu Topper RM
No ratings yet
CS502 Quiz-2 File by Vu Topper RM
45 pages
F3 - Classwiz Note by Atiqah Rani (Revised)
No ratings yet
F3 - Classwiz Note by Atiqah Rani (Revised)
16 pages
Laplace Transforms notes
No ratings yet
Laplace Transforms notes
35 pages
STEM BC11D IIIh-2
No ratings yet
STEM BC11D IIIh-2
4 pages
Chapter 2 - Mathematical Language
No ratings yet
Chapter 2 - Mathematical Language
100 pages
Sierpinski Triangle
No ratings yet
Sierpinski Triangle
2 pages
College of Computing & Information Sciences Discrete Mathematics (Assignment-1)
No ratings yet
College of Computing & Information Sciences Discrete Mathematics (Assignment-1)
3 pages
Ratio of Exponentials
No ratings yet
Ratio of Exponentials
2 pages
KS3 Maths - Pocket Posters - The Pocket-Sized Maths Revision - 9781906248444 - Anna's Archive
No ratings yet
KS3 Maths - Pocket Posters - The Pocket-Sized Maths Revision - 9781906248444 - Anna's Archive
100 pages
The Lambda Calculus
100% (1)
The Lambda Calculus
23 pages
Low Quality Problems: Jeffrey Chen and Kevin Zhao
No ratings yet
Low Quality Problems: Jeffrey Chen and Kevin Zhao
2 pages
STPM Maths T Assignment Introduction Example
No ratings yet
STPM Maths T Assignment Introduction Example
2 pages

A 3

Uploaded by

A 3

Uploaded by

function a3(wd_coefficient, n_hid, n_iters, learning_rate, momentum_multiplier, do_early_stopping, mini_batch_size) % wd_coefficient Weight decay coefficient % n_hid Number of hidden

You might also like