0% found this document useful (0 votes)

166 views

Problems Chap3

The document discusses using different machine learning algorithms to classify data from two overlapping semi-circles. Specifically: 1) Perceptron learning algorithm (PLA) and linear regression are used to initially classify 2000 data points from the two semi-circles. 2) The effect of varying the separation between the semi-circles on the number of iterations for PLA to converge is examined. 3) For non-linearly separable data, the pocket algorithm is shown to monotonically decrease error over iterations, unlike PLA. 4) Linear regression and the pocket algorithm are compared on this non-separable problem, with linear regression having better runtime but pocket algorithm having a slightly better solution. 5

Uploaded by

Mohamed Taha

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views

Problems Chap3

Uploaded by

Mohamed Taha

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Problem Solutions

Chapter 3
Pierre Paquay

Problem 3.1

First, we generate 2000 examples uniformly for the two semi-circles, this means that we will have approximately
1000 examples for each class.
set.seed(101)

init_data <- function(N, rad, thk, sep) {

D <- data.frame(x = numeric(), y = numeric())
y <- numeric()
repeat {
x1 <- runif(1, min = -25, max = 40)
x2 <- runif(1, min = -30, max = 20)
if ((x2 >= 0) && (rad^2 <= x1^2 + x2^2) && (x1^2 + x2^2 <= (rad + thk)^2)) {
D <- rbind(D, c(x1, x2))
y <- c(y, -1)
}
else if ((x2 < -sep) && (rad^2 <= (x1 - rad - thk / 2)^2 + (x2 + sep)^2) &&
((x1 - rad - thk / 2)^2 + (x2 + sep)^2 <= (rad + thk)^2)) {
D <- rbind(D, c(x1, x2))
y <- c(y, +1)
}
if (nrow(D) >= N)
break
}
colnames(D) <- c("x1", "x2")

return(cbind(D, y))
}

rad <- 10
thk <- 8
sep <- 5
D <- init_data(2000, rad, thk, sep)
p <- ggplot(D, aes(x = x1, y = x2, col = as.factor(y + 3))) + geom_point() +
theme(legend.position = "none") +
coord_fixed()
p

1
20

0
x2

−10

−20

−20 −10 0 10 20 30
x1
(a) Then, we run the PLA starting from w = 0 until it converges and we plot the data with the final
hypothesis.
h <- function(D, w) {
scalar_prod <- cbind(1, D$x1, D$x2) %*% w

return(as.vector(sign(scalar_prod)))
}

iter <- 0
w_PLA <- c(0, 0, 0)
repeat {
y_pred <- h(D, w_PLA)
D_mis <- subset(D, y != y_pred)
if (nrow(D_mis) == 0)
break
xt <- D_mis[1, ]
w_PLA <- w_PLA + c(1, xt$x1, xt$x2) * xt$y
iter <- iter + 1
}
p + geom_abline(slope = -w_PLA[2] / w_PLA[3], intercept = -w_PLA[1] / w_PLA[3])

2
20

0
x2

−10

−20

−20 −10 0 10 20 30
x1
(b) Now, we use linear regression for classification to obtain wlin .
X <- as.matrix(cbind(1, D[, c("x1", "x2")]))
y <- D$y
X_cross <- solve(t(X) %*% X) %*% t(X)
w_lin <- as.vector(X_cross %*% y)
p + geom_abline(slope = -w_lin[2] / w_lin[3], intercept = -w_lin[1] / w_lin[3])
20

0
x2

−10

−20

−20 −10 0 10 20 30
x1
T
As we may see, linear regression can also be used for classification (the values sign(wlin x) will likely make
good classification predictions). The linear regression weights wlin are also an approximate solution for the
perceptron model.

3
Problem 3.2

In this problem, we consider again the double-semi-circle of Problem 3.1 and we vary sep in the range
{0.2, 0.4, · · · , 5}, with these values we generate 2000 examples and we run PLA starting with w = 0. Below,
we plot sep versus the number of iterations PLA takes to converge.
set.seed(10)

sep_seq <- seq(0.2, 5, 0.2)

iterations <- numeric()
iterations_max <- numeric()
for (sep in sep_seq) {
D <- init_data(2000, rad, thk, sep)
iter <- 0
w_PLA <- c(0, 0, 0)
repeat {
y_pred <- h(D, w_PLA)
D_mis <- subset(D, y != y_pred)
if (nrow(D_mis) == 0)
break
xt <- D_mis[1, ]
w_PLA <- w_PLA + c(1, xt$x1, xt$x2) * xt$y
iter <- iter + 1
}
iterations <- c(iterations, iter)
R <- max(apply(cbind(1, D[, 1:2]), 1, FUN = function(x) sqrt(1 + x[2]^2 + x[3]^2)))
Rho <- min(D$y * apply(cbind(1, D[, 1:2]), 1, FUN = function(x) sum(w_PLA * x)))
iterations_max <- c(iterations_max, R^2 * sum(w_PLA^2) / Rho^2)
}
ggplot(data.frame(sep = sep_seq, iterations = iterations), aes(x = sep, y = iterations)) +
geom_line()

150
iterations

100

0
0 1 2 3 4 5
sep
We may see that the number of iterations tends to decrease when sep increases. This trend is confirmed
by the theoretical results (see Problem 1.3), to see this we plot below sep versus the theoretical maximum

4
number of iterations.
ggplot(data.frame(sep = sep_seq, iterations = iterations_max), aes(x = sep, y = iterations)) +
geom_line(col = "red") +
ylab("Maximum number of iterations (theoretical)")
Maximum number of iterations (theoretical)

6e+07

4e+07

2e+07

0e+00
0 1 2 3 4 5
sep

Problem 3.3

Here again, we consider the double-semi-circle of Problem 3.1 and we set sep = −5 (which makes the data
non linearly separable) and we generate 2000 examples.
set.seed(101)

sep <- -5
D <- init_data(2000, rad, thk, sep)
p <- ggplot(D, aes(x = x1, y = x2, col = as.factor(y + 3))) + geom_point() +
theme(legend.position = "none") +
coord_fixed()
p

5
10
x2

−10

−20 −10 0 10 20 30
x1
(a) If we run PLA on these examples, it will never stop updating.
(b) Now, we run the pocket algorithm for 100000 iterations and we plot Ein versus the iteration number t for
t = 1, · · · , 100.
set.seed(101)

start_time_pocket <- Sys.time()

E_in <- numeric()
E_in_pocket <- numeric()
w <- c(0, 0, 0)
w_pocket <- w
E_in <- c(E_in, mean(D$y != h(D, w)))
E_in_pocket <- E_in
for (iter in 1:100000) {
D_mis <- subset(D, y != h(D, w))
if (nrow(D_mis) == 0)
break
xt <- D_mis[sample(nrow(D_mis), 1), ]
w <- w + c(1, xt$x1, xt$x2) * xt$y
E_in <- c(E_in, mean(D$y != h(D, w)))
if (E_in[length(E_in)] < E_in_pocket[length(E_in_pocket)]) {
w_pocket <- w
}
E_in_pocket <- c(E_in_pocket, mean(D$y != h(D, w_pocket)))
}
end_time_pocket <- Sys.time()
ggplot(data.frame(t = 1:100000, E_in = E_in_pocket[-1]), aes(x = t, y = E_in)) +
geom_line(col = "red") +
coord_cartesian(xlim = c(1, 100))

6
0.6
E_in

0.4

0.2

0 25 50 75 100
t
We clearly see that the Ein is monotonously decreasing (as opposed to what would happen if we had used
the PLA).
(c) Below, we plot the data and the final hypothesis obtained in (b).
p + geom_abline(slope = -w_pocket[2] / w_pocket[3], intercept = -w_pocket[1] / w_pocket[3])

10
x2

−10

−20 −10 0 10 20 30
x1
(d) Here, we use the linear regression algorithm to obtain w and we compare this result with the pocket
algorithm in terms of computation time and quality of the solution.
start_time_lin <- Sys.time()
X <- as.matrix(cbind(1, D[, c("x1", "x2")]))
y <- D$y
X_cross <- solve(t(X) %*% X) %*% t(X)

7
w_lin <- as.vector(X_cross %*% y)
E_in_lin <- mean(D$y != h(D, w_lin))
end_time_lin <- Sys.time()
p + geom_abline(slope = -w_lin[2] / w_lin[3], intercept = -w_lin[1] / w_lin[3])

10
x2

−10

−20 −10 0 10 20 30
x1
For the pocket algorithm (with 100000 iterations), we have a computation time of 1.7378043, and for the
linear regression algorithm, we have a computation time of 0.0074768. The linear regression algorithm is
clearly better than the pocket algorithm in terms of computation time. When we take into account the
quality of the solution, the pocket algorithm has a (final) Ein of 0.086, and the linear regression algorithm
has a Ein of 0.0995. So, regarding the quality of the solution, the pocket algorithm is a little better than the
linear regression algorithm.
(e) Here, we repeat the points (b) to (d) with a 3rd order polynomial feature transform

Φ(x) = (1, x1 , x2 , x21 , x1 x2 , x22 , x31 , x21 x2 , x1 x22 , x32 ).

First, we run the pocket algorithm for 100000 iterations and we plot Ein versus the iteration number t.
set.seed(11)

D_trans <- data.frame(x1 = D$x1, x2 = D$x2,

x1_sq = D$x1^2, x1x2 = D$x1 * D$x2, x2_sq = D$x2^2,
x1_cub = D$x1^3, x1_sqx2 = D$x1^2 * D$ x2,
x1x2_sq = D$x1 * D$x2^2, x2_cub = D$x2^3, y = D$y)

h_trans <- function(D, w) {

scalar_prod <- as.matrix(cbind(1, D[, 1:9])) %*% w

return(as.vector(sign(scalar_prod)))
}

8
start_time_pocket <- Sys.time()
E_in <- numeric()
E_in_pocket <- numeric()
w <- rep(0, 10)
w_pocket <- w
E_in <- c(E_in, mean(D_trans$y != h_trans(D_trans, w)))
E_in_pocket <- E_in
for (iter in 1:100000) {
D_mis <- subset(D_trans, y != h_trans(D_trans, w))
if (nrow(D_mis) == 0)
break
xt <- D_mis[sample(nrow(D_mis), 1), ]
w <- w + c(1, as.numeric(xt[1:9])) * xt$y
E_in <- c(E_in, mean(D_trans$y != h_trans(D_trans, w)))
if (E_in[length(E_in)] < E_in_pocket[length(E_in_pocket)]) {
w_pocket <- w
}
E_in_pocket <- c(E_in_pocket, mean(D_trans$y != h_trans(D_trans, w_pocket)))
}
end_time_pocket <- Sys.time()
ggplot(data.frame(t = 1:100000, E_in = E_in_pocket[-1]), aes(x = t, y = E_in)) +
geom_line(col = "red") +
coord_cartesian(xlim = c(1, 200))

0.16

0.12
E_in

0.08

0.04
0 50 100 150 200
t
Then, we plot the data and the final hypothesis obtained above.
cc <- emdbook::curve3d(1 * w_pocket[1] + x * w_pocket[2] + y * w_pocket[3] + x^2 * w_pocket[4] +
x * y * w_pocket[5] + y^2 * w_pocket[6] + x^3 * w_pocket[7] +
x^2 * y * w_pocket[8] + x * y^2 * w_pocket[9] + y^3 * w_pocket[10],
xlim = c(-20, 35), ylim = c(-15, 20), sys3d = "none")
dimnames(cc$z) <- list(cc$x, cc$y)
mm <- reshape2::melt(cc$z)
p + geom_contour(data = mm, aes(x = Var1, y = Var2, z = value), breaks = 0, colour = "black")

9
20

10
x2

−10

−20 −10 0 10 20 30
x1
Finally, we use the linear regression algorithm to obtain the weights w.
start_time_lin <- Sys.time()
X <- as.matrix(cbind(1, D_trans[, 1:9]))
y <- D_trans$y
X_cross <- solve(t(X) %*% X) %*% t(X)
w_lin <- as.vector(X_cross %*% y)
E_in_lin <- mean(D_trans$y != h_trans(D_trans, w_lin))
end_time_lin <- Sys.time()
cc <- emdbook::curve3d(1 * w_lin[1] + x * w_lin[2] + y * w_lin[3] + x^2 * w_lin[4] +
x * y * w_lin[5] + y^2 * w_lin[6] + x^3 * w_lin[7] +
x^2 * y * w_lin[8] + x * y^2 * w_lin[9] + y^3 * w_lin[10],
xlim = c(-20, 35), ylim = c(-15, 20), sys3d = "none")
dimnames(cc$z) <- list(cc$x, cc$y)
mm <- reshape2::melt(cc$z)
p + geom_contour(data = mm, aes(x = Var1, y = Var2, z = value), breaks = 0, colour = "black")

10
20

10
x2

−10

−20 −10 0 10 20 30
x1
For the pocket algorithm (with 100000 iterations), we have a computation time of 2.7625689, and for the
linear regression algorithm, we have a computation time of 0.0071368. The linear regression algorithm is once
again clearly better than the pocket algorithm in terms of computation time. When we take into account the
quality of the solution, the pocket algorithm has a (final) Ein of 0.0445, and the linear regression algorithm
has a Ein of 0.021. In this case, regarding the quality of the solution, the linear regression algorithm is also
better than the pocket algorithm.

Problem 3.4

(a) We can write en (w) as a piecewise function

si yn wT xn > 1

0
en (w) = ;
(1 − yn wT xn )2 si yn wT xn < 1

and this function is actually continuous as we have

lim en (w) = 0.
w:yn wT xn →1±

This function is also differentiable, to see this we note that

si yn wT xn > 1

0
∇en (w) = ,
−2yn (1 − yn wT xn )xn si yn wT xn < 1

and
lim ∇en (w) = 0.
w:yn wT xn →1±

11
(b) Let us consider first the case where sign(wT xn ) 6= yn , which means that yn wT xn ≤ 0 < 1. In this case,
we have [[sign(wT xn ) 6= yn ]] = 1 and en (w) = (1 − yn wT xn )2 ≥ 1, consequently

[[sign(wT xn ) 6= yn ]] ≤ en (w).

Now we consider the second case where sign(wT xn ) = yn , which means that yn wT xn ≥ 0. In this case, we
have [[sign(wT xn ) 6= yn ]] = 0, en (w) = (1 − yn wT xn )2 ≥ 0 if 0 ≤ yn wT xn < 1 and en (w) = 0 if yn wT xn ≥ 1;
consequently
[[sign(wT xn ) 6= yn ]] ≤ en (w).
In conclusion, we have that
N N
1 X 1 X
Ein (w) = [[sign(wT xn ) 6= yn ]] ≤ en (w).
N n=1 N n=1

(c) If we apply SGD to our upper bound above, we get the following algorithm.
1. Select an initial w.
2. Repeat until CONDITION :
Select (xn , yn ) randomly and let sn = wT xn . If yn sn = yn wT xn ≤ 1, we have

∇en (w) = −2yn (1 − yn sn )xn ,

and we update w as

w ← w − η∇en (w) = w + 2ηyn (1 − yn sn )xn = w + 2η(yn − sn )xn = w + η 0 (yn − sn )xn .

And if yn sn = yn wT xn > 1, we have ∇en (w) = 0, and we update w as

w ← w − η∇en (w) = w − η · 0 = w.

Which is exactly the Adaline algorithm.

Problem 3.5

(a) We can write en (w) as a piecewise function

si yn wT xn > 1

0
en (w) = ;
1 − yn w T xn si yn wT xn < 1
and this function is actually continuous everywhere as we have

lim en (w) = 0.
w:yn wT xn →1±

However, this function is not differentiable everywhere, to see this we note that
si yn wT xn > 1

0
∇en (w) = ;
−yn xn si yn wT xn < 1
moreover
lim ∇en (w) = 0 and lim ∇en (w) = −yn xn 6= 0.
w:yn wT xn →1+ w:yn wT xn →1−

Thus, the function en (w) is differentiable everywhere except when yn wT xn = 1 (⇔ yn = wT xn ).

(b) Let us consider first the case where sign(wT xn ) 6= yn , which means that yn wT xn ≤ 0 < 1. In this case,
we have [[sign(wT xn ) 6= yn ]] = 1 and en (w) = 1 − yn wT xn ≥ 1, consequently

[[sign(wT xn ) 6= yn ]] ≤ en (w).

12
Now we consider the second case where sign(wT xn ) = yn , which means that yn wT xn ≥ 0. In this case, we
have [[sign(wT xn ) 6= yn ]] = 0, en (w) = 1 − yn wT xn ≥ 0 if 0 ≤ yn wT xn < 1 and en (w) = 0 if yn wT xn ≥ 1;
consequently
[[sign(wT xn ) 6= yn ]] ≤ en (w).
In conclusion, we have that
N N
1 X 1 X
Ein (w) = [[sign(wT xn ) 6= yn ]] ≤ en (w).
N n=1 N n=1

(c) If we apply SGD to our upper bound above, we get the following algorithm.
1. Select an initial w.
2. Repeat until CONDITION :
Select (xn , yn ) randomly. If yn wT xn < 1, we have

∇en (w) = −yn xn ,

and we update w as
w ← w − η∇en (w) = w + ηyn xn .
And if yn sn = yn wT xn > 1, we have ∇en (w) = 0, and we update w as

w ← w − η∇en (w) = w − η · 0 = w.

Which is a new perceptron learning algorithm.

Problem 3.6

(a) For linearly separable data, we obviously have that there exists w∗ so that yn (w∗ )T xn > 0 for all
n = 1, · · · , N . By the density property of the real numbers, we know that we may find > 0 so that
yn (w∗ )T xn ≥ for all n = 1, · · · , N , and consequently if we let w = w∗ /, we get

yn w T xn ≥ 1

for all n = 1, · · · , N .
(b) The task of finding w for separable data may be formulated as the following linear program

cT w

minw
subject to Aw ≤ b

where cT = (0, · · · , 0), bT = (−1, · · · , −1), and

− y1 xT1
 
−
 .. .. ..  (N × (d + 1)).
A = − . . .
− yN xTN −

(c) When the data is not separable, the minimization problem may be formulated as a linear program as
follows
cT (w, ξ)T

min(w,ξ)
subject to A(w, ξ)T ≤ b

13
where cT = (0, · · · , 0, 1, · · · , 1), bT = (−1, · · · , −1, 0, · · · , 0), and

− y1 xT1
 
− 1
 .. .. .. .. 
 .
 . . . 

 − yN xTN − 1 
A = −  (2N × (d + 1 + N )).
 0
 ··· 0 1 

 . .. .. ..
 ..

. . . 
0 ··· 0 1

(d) In Problem 3.5, we sought to minimize the expression

N
1 X
en (w)
N n=1

with
si yn wT xn > 1

0
en (w) = .
1 − yn w T xn si yn wT xn < 1
If we take a look at en (w), we may note that when xn is correctly classified by w and at least at a margin of
one of the linear separator (yn wT xn ≥ 1), we get en (w) = 0 which means that in this case this term does
not contribute to the overall error. However, when xn is in the margin of one, the deeper xn is into the
margin of one, the higher the term en (w) = 1 − yn wT xn contributes to the overall error. For example, if xn
is correctly classified by w but into the margin of one (0 < yn wT xn < 1), then the error term for this point is
0 < en (w) < 1; and if xn is not correctly classified by w (yn wT xn < 0), then the error term for this point
is en (w) > 1. In conclusion, the overall error characterizes the amount of violation of the margin, which is
exactly what we seek to minimize in point (c) above.

Problem 3.7

First, we use the linear programming algorithm from Problem 3.6 on the learning task in Problem 3.1 for the
separable case.
set.seed(10)

rad <- 10
thk <- 8
sep <- 5
D <- init_data(2000, rad, thk, sep)

p <- ggplot(D, aes(x = x1, y = x2, col = as.factor(y + 3))) + geom_point() +

theme(legend.position = "none") +
coord_fixed()

d <- 2
N <- nrow(D)
c_T <- rep(0, d + 1)
b_T <- rep(-1, N)
A <- -diag(D$y) %*% as.matrix(cbind(1, D[, 1:2]))
dir <- rep("<=", N)
linear_prog <- lp("min", c(c_T, -c_T), cbind(A, -A), const.dir = dir, const.rhs = b_T)
w <- linear_prog$solution[1:(d+1)] - linear_prog$solution[(d+2):(2 * (d + 1))]

X <- as.matrix(cbind(1, D[, c("x1", "x2")]))

14
y <- D$y
X_cross <- solve(t(X) %*% X) %*% t(X)
w_lin <- as.vector(X_cross %*% y)

D_trans <- data.frame(x1 = D$x1, x2 = D$x2,

x1_sq = D$x1^2, x1x2 = D$x1 * D$x2, x2_sq = D$x2^2,
x1_cub = D$x1^3, x1_sqx2 = D$x1^2 * D$ x2,
x1x2_sq = D$x1 * D$x2^2, x2_cub = D$x2^3, y = D$y)
X <- as.matrix(cbind(1, D_trans[, 1:9]))
y <- D_trans$y
X_cross <- solve(t(X) %*% X) %*% t(X)
w_pol <- as.vector(X_cross %*% y)

p + geom_abline(slope = -w[2] / w[3], intercept = -w[1] / w[3])

0
x2

−10

−20

−20 −10 0 10 20 30
x1
As we may see, our linear programming algorithm solution perfectly separates the dataset so we have an Ein
of 0; the linear regression approach gives us an Ein of 0, and the 3rd order polynomial feature transform
gives us an Ein of 0 as well.
Now, we use the linear programming algorithm from Problem 3.6 on the learning task in Problem 3.1 for the
non separable case.
set.seed(10)

rad <- 10
thk <- 8
sep <- -5
D <- init_data(2000, rad, thk, sep)

p <- ggplot(D, aes(x = x1, y = x2, col = as.factor(y + 3))) + geom_point() +

theme(legend.position = "none") +
coord_fixed()

d <- 2

15
N <- nrow(D)
c_T <- c(rep(0, d + 1), rep(1, N))
b_T <- c(rep(-1, N), rep(0, N))
A1 <- rbind(diag(D$y) %*% as.matrix(cbind(1, D[, 1:2])), matrix(0, nrow = N, ncol = d + 1))
A2 <- rbind(diag(1, nrow = N, ncol = N), diag(1, nrow = N, ncol = N))
A <- -cbind(A1, A2)
dir <- rep("<=", 2 * N)
linear_prog <- lp("min", c(c_T, -c_T), cbind(A, -A), const.dir = dir, const.rhs = b_T)
w <- linear_prog$solution[1:(d + 1 + N)] - linear_prog$solution[(d + 1 + N + 1):(2 * (d + 1 + N))]

X <- as.matrix(cbind(1, D[, c("x1", "x2")]))

y <- D$y
X_cross <- solve(t(X) %*% X) %*% t(X)
w_lin <- as.vector(X_cross %*% y)

D_trans <- data.frame(x1 = D$x1, x2 = D$x2,

p + geom_abline(slope = -w[2] / w[3], intercept = -w[1] / w[3])

10
x2

−10

−20 −10 0 10 20 30
x1
Here, our linear programming algorithm solution gives us an Ein of 0.072; the linear regression approach
gives us an Ein of 0.0855, and the 3rd order polynomial feature transform gives us an Ein of 0.0205 which is
the best of the three approaches.

16
Problem 3.8

First, we will show that h∗ (x) = Ey|x [y|x] minimizes the Eout expression. For any hypothesis h, we have that

Eout (h) = E(x,y) [(h(x) − y)2 ]

= E(x,y) [((h(x) − h∗ (x)) + (h∗ (x) − y))2 ]
= E(x,y) [(h(x) − h∗ (x))2 ] + 2 E(x,y) [(h(x) − h∗ (x))(h∗ (x) − y)] +E(x,y) [(h∗ (x) − y)2 ].
| {z }
(∗)

Let us examine the (∗) term more closely, we have

(∗) = Ex [Ey|x [(h(x) − h∗ (x))(h∗ (x) − y)|x]]

= Ex [(h(x) − h∗ (x))Ey|x [(h∗ (x) − y)|x]]
= Ex [(h(x) − h∗ (x))(Ey|x [h∗ (x)|x] − Ey|x [y|x])]
= Ex [(h(x) − h∗ (x))(h∗ (x) − h∗ (x))] = 0.

So, we may write that

Eout (h) = E(x,y) [(h(x) − y)2 ] = E(x,y) [(h(x) − h∗ (x))2 ] + E(x,y) [(h∗ (x) − y)2 ] ≥ E(x,y) [(h∗ (x) − y)2 ]

for any hypothesis h, which means that h∗ (x) is actually the one hypothesis that minimizes Eout .
Now, it is obvious that we are able to write that

y = h∗ (x) + (y − h∗ (x)) = h∗ (x) + (x)

if we let (x) = y − h∗ (x). Moreover, we immediately get that

Ey|x [(x)|x] = Ey|x [y|x] − Ey|x [h∗ (x)|x] = h∗ (x) − h∗ (x) = 0.

Problem 3.9

Let us expand the expression (∗) to consider, we get

1
(∗) = [(w − (X T X)−1 X T y)T (X T X)(w − (X T X)−1 X T y) + y T (1 − X(X T X)−1 X T )y]
N
1
= [(wT − y T X(X T X)−1 )(X T X)(w − (X T X)−1 X T y) + y T y − y T X(X T X)−1 X T y]
N
1 T T
= [w (X X)w − 2wT X T y + y T y] = Ein (w).
N

So, we may write that

1 1
Ein (w) = [(w − (X T X)−1 X T y)T (X T X)(w − (X T X)−1 X T y) + y T (1 − H)y] ≥ [y T (1 − H)y],
N N
as X T X is positive definite (which implies that xT (X T X)x > 0 for any x 6= 0). So, the minimum value for
Ein is N1 [y T (1 − H)y] which is attained when

w = (X T X)−1 X T y = wlin .

17
Problem 3.10

(a) First, we note that H is idempotent as

H 2 = X(X T X)−1 X T X(X T X)−1 X T = X(X T X)−1 X T = H.

Then, if we let v be an eigenvector of H of eigenvalue λ, we get that

H 2 v = Hv = λv

and also that

H 2 v = H(Hv) = λHv = λ2 v.
Which means that
(λ2 − λ)v = 0,
or more simply put that λ2 − λ = 0 (as v 6= 0), which means that either λ = 0 or λ = 1.
(b) Let A(N × N ) be a symmetric matrix, in this case A is diagonalizable which means that it exists an
invertible matrix V (whose columns are the eigenvectors of A) such that

V −1 AV = diag(λ1 , · · · , λN ) = D

where λi are the eigenvalues of A. Now, we may write that

X
trace(V −1 AV ) = trace(D) = λi
i

and also that

trace(V −1 AV ) = trace(AV V −1 ) = trace(A)
P
by the cyclic property of the trace. In conclusion, we get that trace(A) = i λi .
(c) It is easy to see that H is a symmetric matrix (H T = H), so by (b) above, we get that trace(H) is equal
to the sum of its eigenvalues. However, once again by the cyclic proprty of the trace, we get that

trace(H) = trace(X(X T X)−1 X T ) = trace(X T X(X T X)−1 ) = trace(Id+1 ) = d + 1.

This means that d + 1 eigenvalues are equal to 1. As H is symmetric, it is also diagonalizable, so there exists
an invertible matrix V (whose columns are the eigenvectors of H) and a diagonal matrix D (whose elements
are the eigenvalues of H) such that

rank(H) = rank(V DV −1 ) = rank(DV −1 ) = rank(D)

as V and V −1 are of maximum rank. As the rank of D is the number of eigenvalues not equal to 0, we finally
get that
rank(H) = trace(H) = d + 1.

Problem 3.11

(a) Let x0 be a test point, we may write the error at x0 as

Err(x0 ) = y0 − g(x0 ) = y0 − xT0 wlin

= y0 − xT0 (X T X)−1 X T y = y0 − xT0 (X T X)−1 X T (Xw∗ + )
= xT0 w∗ + 0 − xT0 (X T X)−1 X T Xw∗ − xT0 (X T X)−1 X T
= 0 − xT0 (X T X)−1 X T .

18
(b) By taking the expectation with respect to x0 and 0 , we get the following expression for Eout . We have

Eout (g) = E(x0 ,0 ) [(y0 − g(x0 ))2 ]

= E(x0 ,0 ) [(0 − xT0 (X T X)−1 X T )T (0 − xT0 (X T X)−1 X T )]
= E(x0 ,0 ) [T0 0 − 2T X(X T X)−1 x0 0 + T X(X T X)−1 x0 xT0 (X T X)−1 X T ]
= E(x0 ,0 ) [20 ] −2 E(x0 ,0 ) [T X(X T X)−1 x0 0 ] + E(x0 ,0 ) [T X(X T X)−1 x0 xT0 (X T X)−1 X T ] .
| {z } | {z } | {z }
(1) (2) (3)

Let us examine the expression (1) more closely, we get that

(1) = E(x0 ,0 ) [20 ] = Ex0 [E0 |x0 [20 |x0 ]] = Ex0 [E0 [20 ]] = E0 [20 ] = Var(0 ) = σ 2 .
If we do the same for the expression (2), we now get that

(2) = Ex0 [E0 |x0 [T X(X T X)−1 x0 0 |x0 ]]

= Ex0 [T X(X T X)−1 x0 E0 |x0 [0 |x0 ]]
= Ex0 [T X(X T X)−1 x0 E0 [0 ]]
= 0.

Finally, we do the same for the expression (3), first, we note that
trace(T X(X T X)−1 x0 xT0 (X T X)−1 X T ) = trace(x0 xT0 (X T X)−1 X T T X(X T X)−1 )
by the cyclic property of the trace. Then we get that

(3) = E(x0 ,0 ) [trace(T X(X T X)−1 x0 xT0 (X T X)−1 X T )]

= trace(E(x0 ,0 ) [x0 xT0 (X T X)−1 X T T X(X T X)−1 ])
= trace(Ex0 [x0 xT0 ](X T X)−1 X T T X(X T X)−1 )
= trace(Σ(X T X)−1 X T T X(X T X)−1 ).

In conclusion, we have
Eout (g) = σ 2 + trace(Σ(X T X)−1 X T T X(X T X)−1 ).

(c) We have that

21
   2 
1 2 ··· 1 N σ 0 ··· 0
E [T ] = E  ... .. ..  =  .. .. ..  = diag(σ 2 , · · · , σ 2 ).

. ··· .   . .··· .
N 1 N 2 ··· 2N 0 0 ··· σ2

(d) If we take the expectation of Eout with respect to , we get

E [Eout (g)] = σ 2 + E [trace(Σ(X T X)−1 X T T X(X T X)−1 )]

= σ 2 + trace(Σ(X T X)−1 X T E [T ]X(X T X)−1 )
= σ 2 + trace(Σ(X T X)−1 X T diag(σ 2 , · · · , σ 2 )X(X T X)−1 )
= σ 2 + σ 2 trace(Σ(X T X)−1 )
σ2 1
= σ2 + trace(Σ( X T X)−1 ).
N N

19
1 T
Moreover, if NX X = Σ, we get that
!
2 σ2 −1 2 d+1
E [Eout (g)] = σ + trace(ΣΣ ) = σ 1 + .
N N

(e) As the matrix N1 X T X is the maximum likelihood estimator of Σ, we know that N1 X T X → Σ in probability.
Moreover, by the continuity of the inverse and of the trace functions, we also have that
1 T −1
xN = trace(Σ( X X) ) → trace(Id+1 ) = d + 1
N
in probability. This means that with high probability, we have |xN − (d + 1)| ≤ η for any η > 0 provided N
is big enough; in this case, we also have that

2 2 2 σ2
σ σ σ
E [Eout (g)] − (σ 2 + (d + 1)) = (σ 2 + xN ) − (σ 2 + (d + 1)) ≤ η.

N N N N

Thus, we are now able to write that

!
2 d+1 1
E [Eout (g)] = σ 1+ + o( )
N N

with high probability.

Problem 3.12

We have already proven that H is a symmetric and indempotent matrix, which makes it a projection matrix
by definition. We may write that

ŷ = Hy = X[(X T X)−1 X T y] ∈ span(X)

where span(X) is the subspace generated by the columns of X. So, ŷ is the projection of y onto the subspace
generated by the columns of X.

Problem 3.13

(a) For the linear regression problem, we need d + 1 weights, and in the classification problem we need d + 2
weights (and also 2N violations in the non separable case).
(b) To solve this problem, we first consider the case where d = 1. The weights w obtained from the linear fit
give us the hypothesis
h(x) = w0 + w1 x,
and the weights wclass obtained from the classification problem give us the hypothesis

h0 (x) = w0class + w1class x + w2class y.

The equation of the regression line is

y = w0 + w1 x
in the first case, and in the second case

y = −w0class /w2class − w1class /w2class x.

20
We will now apply these results to the general case, we may write that
!
w0class wdclass
w = − class , · · · , − class .
wd+1 wd+1

(c) We generate a data set yn = x2n + σn with N = 50, where xn is uniform on [0, 1] and n is a zero mean
Gaussian noise (with σ = 0.1). Below, we plot the positive and negative points D+ and D− for a = (0, 0.1)T .
set.seed(10)

N <- 50
sigma <- 0.1
epsilon <- rnorm(N, 0, 1)
xn <- runif(N, 0, 1)
yn <- xn^2 + sigma * epsilon
a <- c(0, 0.1)

D <- data.frame(x = xn, y = yn)

D_plus <- D + data.frame(ax = rep(a[1], N), ay = rep(a[2], N))
D_plus <- cbind(D_plus, class = +1)
D_minus <- D - data.frame(ax = rep(a[1], N), ay = rep(a[2], N))
D_minus <- cbind(D_minus, class = -1)

p <- ggplot(NULL, aes(x = x, y = y, col = as.factor(class + 3))) +

geom_point(data = D_plus) +
geom_point(data = D_minus) +
theme(legend.position = "none")
p

0.8
y

0.4

0.0

0.25 0.50 0.75 1.00

x
(d) Here, we compare the resulting fits from running the classification approach and the analytic pseudo-inverse
algorithm for linear regression. To do this, we first plot the two lines generated by the classification approach
(in green) and by the linear regression algorithm (in red).

21
D_union <- rbind(D_plus, D_minus)

d <- 2
N <- nrow(D_union)
c_T <- c(rep(0, d + 1), rep(1, N))
b_T <- c(rep(-1, N), rep(0, N))
A1 <- rbind(diag(D_union$class) %*%
as.matrix(cbind(1, D_union[, 1:2])), matrix(0, nrow = N, ncol = d + 1))
A2 <- rbind(diag(1, nrow = N, ncol = N), diag(1, nrow = N, ncol = N))
A <- -cbind(A1, A2)
dir <- rep("<=", 2 * N)
linear_prog <- lp("min", c(c_T, -c_T), cbind(A, -A), const.dir = dir, const.rhs = b_T)
w <- linear_prog$solution[1:(d + 1 + N)] - linear_prog$solution[(d + 1 + N + 1):(2 * (d + 1 + N))]

X <- as.matrix(cbind(1, D[, c("x")]))

y <- D$y
X_cross <- solve(t(X) %*% X) %*% t(X)
w_lin <- as.vector(X_cross %*% y)

ggplot(D_union, aes(x = x, y = y, col = as.factor(class + 3))) +

geom_point() +
geom_abline(slope = w_lin[2], intercept = w_lin[1], col = "red") +
geom_abline(slope = -w[2] / w[3], intercept = -w[1] / w[3], col = "green") +
theme(legend.position = "none")

0.8
y

0.4

0.0

0.25 0.50 0.75 1.00

x
Here, our linear programming algorithm solution gives us an Ein of 0.0118877; the linear regression approach
gives us an Ein of 0.0113588. These two results are very close to each other.

Problem 3.14

(a) We have that

22
g(x) = ED [g (D) (x)]
= ED [xT wlin ]
= EX [Ey|X [xT (X T X)−1 X T y|X]]
= EX [xT (X T X)−1 X T Ey|X [y|X] ]
| {z }
Xwf +E []=Xwf
T T −1 T
= EX [x (X X) X Xwf ]
T
= x wf = f (x).

Moreover, we may write that

bias(x) = [g(x) − f (x)]2 = 0,
and consequently bias = Ex [bias(x)] = 0.
(b) If we examine the variance, we have that

var(x) = ED [(g (D) (x) − g(x))2 ] = ED [(g (D) (x) − f (x))2 ]

= ED [g (D) (x)2 ] − 2f (x)ED [g (D) (x)] + f (x)2
= ED [g (D) (x)2 ] −f (x)2 .
| {z }
(∗)

Now, we examine the expression (∗) more closely, we may write that

ED [g (D) (x)2 ] = ED [xT (X T X)−1 X T yy T X(X T X)−1 x]

= EX [xT (X T X)−1 X T Ey|X [yy T |X] X(X T X)−1 x]
| {z }
Xwf wfT X T +Xwf E [T ]+E []wfT X T +E [T ]=Xwf wfT X T +diag(σ 2 ,··· ,σ 2 )

= xT wf wfT x + σ 2 xT EX [(X T X)−1 ]x

= f (x)2 + σ 2 xT EX [(X T X)−1 ]x,

and so we get
var(x) = σ 2 xT EX [(X T X)−1 ]x.
And finally, we have that

var = σ 2 Ex [xT EX [(X T X)−1 ]x]

= σ 2 trace(Ex [xT EX [(X T X)−1 ]x])
= σ 2 Ex [trace(xT EX [(X T X)−1 ]x)]
= σ 2 Ex [trace(xxT EX [(X T X)−1 ])]
= σ 2 trace(Ex [xxT ]EX [(X T X)−1 ])]
σ2 1
= trace(ΣEX [( X T X)−1 ]).
N N

As we saw in Problem 3.11 that ( N1 X T X)−1 converges in probability to Σ−1 , we may assume that
1 T −1
( X X) ≈ Σ−1
N
and so that
σ2 σ 2 (d + 1)
var ≈ trace(ΣΣ−1 ) = .
N N

23
Problem 3.15

(a) We know that rank(X T X) < d + 1 as X T X is not invertible, this means that we can find an y =
6 0 such
that X T Xy = 0 (as the columns of X T X are not linearly independant). We may now write that

0 = y T X T Xy = ||Xy||2 ,

conseqently Xy = 0 and ρ = rank(X) < d + 1.

(b) If we let
wlin = V Γ−1 U T y,
we have that

X T Xwlin = X T XV Γ−1 U T y
= V ΓU T U ΓV T V Γ−1 U T y
= V ΓU T y
= X T y.

And so wlin is a solution of the normal equations.

w = wlin + (w − wlin ) = wlin + δ.

As w and wlin are both solutions of the normal equations, we have that

X T X(w − wlin ) = X T y − X T y =0
⇒ V ΓU T U ΓV T (w − wlin ) =0
⇒ V Γ2 V T (w − wlin ) =0
−2 T 2 T
⇒ Γ V V Γ V (w − wlin ) =0
⇒ V T (w − wlin ) = 0.

We may now see that wlin and δ = (w − wlin ) are orthogonal because
T
wlin T
δ = wlin (w − wlin ) = y T U Γ−1 V T (w − wlin ) = 0.
| {z }
=0

Finally, we have that

||w||2 = ||wlin − δ||2

T
= wlin wlin + δ T wlin + wlin
T
δ +δ T δ
| {z } | {z }
=0 =0
= ||wlin || + ||δ|| > ||wlin ||2 ,
2 2

so wlin is the minimum norm solution of the normal equations.

24
Problem 3.16

(a) The expected cost if we accept the person is

cost(accept) = 0 · P[y = +1|x] + ca · P[y = −1|x] = ca (1 − g(x))

and the expected cost if we reject the person is

cost(reject) = cr · P[y = +1|x] + 0 · P[y = −1|x] = cr g(x).

(b) Our condition on g(x) for accepting the person is

cost(accept) = cost(reject)
⇔ ca (1 − g(x)) = cr g(x)
ca
⇔ g(x) = ;
ca + cr

and so we let the threshold be

ca
κ= .
ca + cr

(c) For the Supermarket, we have κ = 1/11, this makes sense as in this case, we want to avoid false rejects;
so with κ <<, we are rejecting only when g(x) < κ which is very close to 0. And for the CIA, we have
κ = 1000/1001, here we want to avoid false accepts; so with κ >> we want to accept only when g(x) ≥ κ
which is really close to 1.

Problem 3.17

(a) We use the first order Taylor expansion to write

∆u
E(u + ∆u, v + ∆v) = E(u, v) + ∇E(u, v)T + O(||(∆u, ∆v)||2 ).
∆v

So, around (u, v) = (0, 0), we get

∆u
Ê1 (∆u, ∆v) = E(0, 0) + ∇E(0, 0)T
∆v

∆u
= 3 + (eu + euv v + 2u − 3v − 3, 2e2v + euv u − 3u + 8v − 5)|(0,0)
∆v
= au ∆u + av ∆v + a,

where au = −2, av = −3, and a = 3.

(b) To minimize Ê1 , we have that

∆u
Ê1 (∆u, ∆v) = E(0, 0) + ∇E(0, 0)T
∆v
= E(0, 0) + ||∇E(0, 0)|| · ||(∆u, ∆v)|| cos(∇E(0, 0), (∆u, ∆v))
≥ E(0, 0) − ||∇E(0, 0)|| · ||(∆u, ∆v)||.

25
Thus the optimal vector (∆u∗ , ∆v ∗ )T such that ||(∆u, ∆v)|| = 0.5 is given by
∗
∆u ∇E(0, 0) 1 2
=− ||(∆u, ∆v)|| = √ .
∆v ∗ ||∇E(0, 0)|| 2 13 3

(c) With the second order Taylor expansion, we may write that

∆u 1 ∆u
E(u + ∆u, v + ∆v) = E(u, v) + ∇E(u, v)T + (∆u, ∆v)∇2 E(u, v) + O(||(∆u, ∆v)||3 ).
∆v 2 ∆v

And around (u, v) = (0, 0), we get

∆u 1 ∆u
Ê3 (∆u, ∆v) = E(0, 0) + ∇E(0, 0)T + (∆u, ∆v)∇2 E(0, 0)
∆v 2 ∆v
u
e + e v + 2 euv (uv + 1) − 3
uv 2

∆u 1 ∆u
= 3 + (−2, −3) + (∆u, ∆v) uv
∆v 2 e (uv + 1) − 3 4e2v + euv u2 + 8 ∆v

1 3 −2 ∆u
= 3 − 2∆u − 3∆v + (∆u, ∆v)
2 −2 12 ∆v
= buu (∆u)2 + bvv (∆v)2 + buv ∆u∆v + bu ∆u + bv ∆v + b

where buu = 3/2, bvv = 6, buv = −2, bu = −2, bv = −3, and b = 3.

(d) To minimize Ê2 , we write that

∆u
∇Ê2 (∆u, ∆v) = ∇E(0, 0) + ∇2 E(0, 0) =0
∆v

if
∆u
= −(∇2 E(0, 0))−1 ∇E(0, 0).
∆v
However, for the vector above to be a mimimum, the matrix ∇2 E(0, 0) must be positive semidefinite and
invertible; in our case we have
det(∇2 E(0, 0)) = 32 6= 0,
which means that it is actually invertible, and

3 −2 u
(u, v) = 2u2 + 8v 2 + (u − 2v)2 ≥ 0
−2 12 v

for any (u, v), which means that it is actually positive semidefinite. Thus the optimal vector (∆u∗ , ∆v ∗ )T is
given by ∗
∆u 2 −1 1 30
= −(∇ E(0, 0)) ∇E(0, 0) = .
∆v ∗ 32 13

Problem 3.18

(a) Since HΦ is the perceptron in Z, we know that its VC dimension is equal to d˜ + 1 = 6, so dV C (HΦ ) ≤ 6
(the ≤ is because some points z ∈ Z may not be valid transforms of any x ∈ X , so some dichotomies may not
be realizable).
(b)

26
(c) By the same reasoning as in point (a), we get that dV C (HΦk ) ≤ d˜ + 1. Moreover, as d˜ is the number of
terms in a polynomial of degree k in d variables, it is well known that

˜ k+d
d= − 1.
d

To conclude, we have the following upper bound

k+d
dV C (HΦk ) ≤ .
d

(d) It is easy to see that

h̃(Φ̃2 (x)) = w̃0 + w̃1 x1 + w̃2 x2 + w̃3 (x1 + x2 ) + w̃4 (x1 − x2 ) + w̃5 x21 + w̃6 x1 x2 + w̃7 x2 x1 + w̃8 x22
= w̃0 + (w̃1 + w̃3 + w̃4 )x1 + (w̃2 + w̃3 − w̃4 )x2 + w̃5 x21 + (w̃6 + w̃7 )x1 x2 + w̃8 x22 ∈ HΦ2

The reverse is also true, so we actually have dV C (HΦ2 ) = dV C (HΦ̃2 ).

Problem 3.19

(a) The first problem with this feature tansform is that Z has a very large dimension (N ) which we know
to be computationnaly expensive. The second problem is that it maps each point not in the dataset to the
origin, it is not unusual for a transformation not being injective but here it is way worse.
(b) This transformation may be a good idea, actually it is a well known transformation called radial basis
function.
(c) This transformation is also another type of radial basis function.

Instant download Principles of Economics 8th Edition N Gregory Mankiw pdf all chapter
100% (3)
Instant download Principles of Economics 8th Edition N Gregory Mankiw pdf all chapter
36 pages
Trivedi Rudri Assignement 2
No ratings yet
Trivedi Rudri Assignement 2
9 pages
Solution Manual For: Numerical Computing With MATLAB by Cleve B. Moler
No ratings yet
Solution Manual For: Numerical Computing With MATLAB by Cleve B. Moler
9 pages
Final 12
No ratings yet
Final 12
11 pages
Problems Chaptr 1 PDF
No ratings yet
Problems Chaptr 1 PDF
4 pages
Van Richten's Tower
No ratings yet
Van Richten's Tower
4 pages
Solution Key Comprehensive Question Paper
No ratings yet
Solution Key Comprehensive Question Paper
8 pages
Chemical-Oil Tanker "British Ensign" - IMO 9312913 - Cargo System Operating Manual
100% (1)
Chemical-Oil Tanker "British Ensign" - IMO 9312913 - Cargo System Operating Manual
184 pages
Stat 151 - Final Review
No ratings yet
Stat 151 - Final Review
15 pages
CS 229, Summer 2019 Problem Set #3 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #3 Solutions
19 pages
Image Compression Comparison Using Golden Section Transform, CDF 5/3 (Le Gall 5/3) and CDF 9/7 Wavelet Transform by Matlab
No ratings yet
Image Compression Comparison Using Golden Section Transform, CDF 5/3 (Le Gall 5/3) and CDF 9/7 Wavelet Transform by Matlab
29 pages
Factorial Experiments R Codes
No ratings yet
Factorial Experiments R Codes
7 pages
hw7 - Sol 2
No ratings yet
hw7 - Sol 2
15 pages
2014 MI Prelim P2
No ratings yet
2014 MI Prelim P2
7 pages
Roots of Equations
No ratings yet
Roots of Equations
108 pages
Teach Like A Champion 100 Percent and No Opt Out
No ratings yet
Teach Like A Champion 100 Percent and No Opt Out
25 pages
Power Method: Eigenvalue & Eigenvector: MPHYCC-05 Unit-IV Semester-II
No ratings yet
Power Method: Eigenvalue & Eigenvector: MPHYCC-05 Unit-IV Semester-II
9 pages
Numerical Methods HW1
No ratings yet
Numerical Methods HW1
2 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Numerical Linear Algebra University of Edinburgh Past Paper 2020-2021
100% (1)
Numerical Linear Algebra University of Edinburgh Past Paper 2020-2021
4 pages
Ps 1
No ratings yet
Ps 1
5 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Chapter-5 Curve Fitting
No ratings yet
Chapter-5 Curve Fitting
48 pages
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
No ratings yet
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
1 page
Ebook Dynamic Equations On Time Scales and Applications 1e by Ravi P. Agarwal, Bipan Hazarika, Sanket Tikare
No ratings yet
Ebook Dynamic Equations On Time Scales and Applications 1e by Ravi P. Agarwal, Bipan Hazarika, Sanket Tikare
55 pages
Solutions (Stats)
No ratings yet
Solutions (Stats)
20 pages
Q2 Ans
No ratings yet
Q2 Ans
13 pages
Gauss Jordan - Algorithm and Matlab Program
No ratings yet
Gauss Jordan - Algorithm and Matlab Program
3 pages
2018 H2 Prelim Compilation (Correlation Regression)
No ratings yet
2018 H2 Prelim Compilation (Correlation Regression)
21 pages
Laboratory+Activity+3+ +Gaussian+and+Gauss Jordan
No ratings yet
Laboratory+Activity+3+ +Gaussian+and+Gauss Jordan
8 pages
Extrapolation
No ratings yet
Extrapolation
3 pages
Final Project: Power Method
No ratings yet
Final Project: Power Method
9 pages
K Kiran Kumar IIM Indore
100% (1)
K Kiran Kumar IIM Indore
115 pages
Chapter 01 (Error Analysis)
No ratings yet
Chapter 01 (Error Analysis)
11 pages
Finite Volume Method
No ratings yet
Finite Volume Method
21 pages
Midterm Exam Solutions
100% (1)
Midterm Exam Solutions
26 pages
Method of Moment
No ratings yet
Method of Moment
53 pages
Hw3sol 21015 PDF
No ratings yet
Hw3sol 21015 PDF
13 pages
NM Unit - 3 (Notes) 31.8.20
100% (1)
NM Unit - 3 (Notes) 31.8.20
53 pages
Assignment 3 Sol 1 Series and Matrices Iitm
No ratings yet
Assignment 3 Sol 1 Series and Matrices Iitm
3 pages
Dca2101 - Computer Oriented Numerical Methods
No ratings yet
Dca2101 - Computer Oriented Numerical Methods
9 pages
Random Variable Generation
No ratings yet
Random Variable Generation
5 pages
Sample Midterm Exam 6
No ratings yet
Sample Midterm Exam 6
11 pages
Least Square Fit
100% (2)
Least Square Fit
16 pages
Some Notes On Least Squares, QR-factorization, SVD and Fitting
No ratings yet
Some Notes On Least Squares, QR-factorization, SVD and Fitting
12 pages
FDM 01
100% (1)
FDM 01
5 pages
Practice Homework Set
No ratings yet
Practice Homework Set
58 pages
Appendix B Hand Out Gauss Newton Derivation
No ratings yet
Appendix B Hand Out Gauss Newton Derivation
8 pages
Newton Raphson Method: Solved Examples
No ratings yet
Newton Raphson Method: Solved Examples
60 pages
EE263s Homework 3
No ratings yet
EE263s Homework 3
12 pages
CS 229, Autumn 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
No ratings yet
CS 229, Autumn 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
8 pages
Experiment 3-A: 1. Draw The Surface of The Function Using Ezsurf. Solution
No ratings yet
Experiment 3-A: 1. Draw The Surface of The Function Using Ezsurf. Solution
8 pages
Labrpt2 Math505 New Format
No ratings yet
Labrpt2 Math505 New Format
5 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Phy 3.1 PDF
No ratings yet
Phy 3.1 PDF
7 pages
Introduction To Ordinary Differential Equations With: Mathematica®
No ratings yet
Introduction To Ordinary Differential Equations With: Mathematica®
9 pages
STQP2034 Tutorial 3
25% (4)
STQP2034 Tutorial 3
2 pages
Experiment No:01 Name of The Experiment: Finding The Average of CT Marks Using MATLAB
No ratings yet
Experiment No:01 Name of The Experiment: Finding The Average of CT Marks Using MATLAB
2 pages
Worked Out Examples and Exercises With Solutions Chapter 17 PDF
100% (1)
Worked Out Examples and Exercises With Solutions Chapter 17 PDF
11 pages
Newton's Divided Difference Interpolation Formula
No ratings yet
Newton's Divided Difference Interpolation Formula
31 pages
Problems Chap1
No ratings yet
Problems Chap1
20 pages
Homework 1
No ratings yet
Homework 1
8 pages
AMATH301 Homework3 Writeup Solutions
No ratings yet
AMATH301 Homework3 Writeup Solutions
8 pages
أفضل 20 أداة اختراق والقرصنة ألاخلاقية مفتوحة المصدر في 2020 - GNU-Linux Revolution
No ratings yet
أفضل 20 أداة اختراق والقرصنة ألاخلاقية مفتوحة المصدر في 2020 - GNU-Linux Revolution
1,033 pages
SET225 Tutorial 1
No ratings yet
SET225 Tutorial 1
26 pages
Problems Chap7
No ratings yet
Problems Chap7
13 pages
Problems Chap9
No ratings yet
Problems Chap9
23 pages
Problems Chap5
No ratings yet
Problems Chap5
2 pages
Problems Chap2
No ratings yet
Problems Chap2
26 pages
Problems Chap8
No ratings yet
Problems Chap8
22 pages
C - An Introductory Guide For - Francis John Thottungal
100% (1)
C - An Introductory Guide For - Francis John Thottungal
61 pages
Node N
No ratings yet
Node N
5 pages
Internet of Things, Smart Spaces, and Next Generation Networks and Systems (2014)
100% (1)
Internet of Things, Smart Spaces, and Next Generation Networks and Systems (2014)
729 pages
Data Structures - Data Structures - : Lecture 1: Introduction
No ratings yet
Data Structures - Data Structures - : Lecture 1: Introduction
55 pages
Classes in C++
No ratings yet
Classes in C++
73 pages
Data Structures, Sample Test Questions For The Material After Test 2, With Answers
No ratings yet
Data Structures, Sample Test Questions For The Material After Test 2, With Answers
12 pages
File Structures: ÷ Öof Ú Êëov F
No ratings yet
File Structures: ÷ Öof Ú Êëov F
9 pages
Social Media Tools For Researchers
No ratings yet
Social Media Tools For Researchers
3 pages
Name: Grade: Division: Roll No: School Code:: Continuous Assessment - I
No ratings yet
Name: Grade: Division: Roll No: School Code:: Continuous Assessment - I
6 pages
OpenStack Grizzly Architecture
No ratings yet
OpenStack Grizzly Architecture
10 pages
Lan Jorhat Ec 08-05-2021
No ratings yet
Lan Jorhat Ec 08-05-2021
13 pages
US5591213
No ratings yet
US5591213
14 pages
When Language and Literacy Touch Our Hearts Implications for Refugee Education from a Biliteracy Stance
No ratings yet
When Language and Literacy Touch Our Hearts Implications for Refugee Education from a Biliteracy Stance
19 pages
The Temple - Powerpoint
No ratings yet
The Temple - Powerpoint
42 pages
Get Practical Recording Techniques the Step by Step Approach to Professional Audio Recording 6th Edition Bruce Bartlett free all chapters
100% (5)
Get Practical Recording Techniques the Step by Step Approach to Professional Audio Recording 6th Edition Bruce Bartlett free all chapters
76 pages
Guideline For The Assessment of Mosh Moah PDF
100% (1)
Guideline For The Assessment of Mosh Moah PDF
52 pages
Social Innovation in Education System by Using RPA
100% (1)
Social Innovation in Education System by Using RPA
4 pages
Parking Sensor Innova 2
No ratings yet
Parking Sensor Innova 2
6 pages
An Introduction To Excel-2019
No ratings yet
An Introduction To Excel-2019
24 pages
RCCG Festival of Life - Berlin
No ratings yet
RCCG Festival of Life - Berlin
12 pages
CV For HR Manager
100% (2)
CV For HR Manager
3 pages
Chapter 5 Review Quiz Solutions
No ratings yet
Chapter 5 Review Quiz Solutions
6 pages
706 CAT Serial
No ratings yet
706 CAT Serial
13 pages
Chapter 6 Linear Inequalities
100% (1)
Chapter 6 Linear Inequalities
48 pages
Latin For CE One ISBN 9781471867378 - 1
No ratings yet
Latin For CE One ISBN 9781471867378 - 1
10 pages
I.Find The Word Which Has Different Sound in The Part Underlined
No ratings yet
I.Find The Word Which Has Different Sound in The Part Underlined
5 pages
Cultural Representations: Strange or Stranger? Displaced Identities in V.S. Naipaul
No ratings yet
Cultural Representations: Strange or Stranger? Displaced Identities in V.S. Naipaul
7 pages
CELMA-ELC LED WG (SM) 011 ELC CELMA Position Paper Optical Safety LED Lighting Final 1st Edition July2011 PDF
No ratings yet
CELMA-ELC LED WG (SM) 011 ELC CELMA Position Paper Optical Safety LED Lighting Final 1st Edition July2011 PDF
18 pages
Properties of Peg 1500
No ratings yet
Properties of Peg 1500
7 pages
Bang Notation and Dot Notation in VBA and MS
No ratings yet
Bang Notation and Dot Notation in VBA and MS
4 pages
Master 1 - Final Oral Test - Type A - Teacher S
No ratings yet
Master 1 - Final Oral Test - Type A - Teacher S
5 pages
FOI Franchisee Presentation - 3.0.1
No ratings yet
FOI Franchisee Presentation - 3.0.1
21 pages
Iop - Bystander Effect
No ratings yet
Iop - Bystander Effect
23 pages
If-I-Ruled-The-World-1st-And-2nd-Conditional-Conversation-Topics-Dialogs-Reading-Comprehension - 85709
No ratings yet
If-I-Ruled-The-World-1st-And-2nd-Conditional-Conversation-Topics-Dialogs-Reading-Comprehension - 85709
2 pages