100% found this document useful (3 votes)
14 views

(Ebook) Sparse Estimation with Math and Python: 100 Exercises for Building Logic by Joe Suzuki ISBN 9789811614378, 9811614377 - The ebook is ready for download with just one simple click

The document promotes instant access to various eBooks related to mathematical concepts and programming, particularly focusing on sparse estimation and machine learning. It highlights the book 'Sparse Estimation with Math and Python' by Joe Suzuki, which includes 100 exercises aimed at developing logic and understanding in the field. The content emphasizes practical programming alongside theoretical knowledge, making it suitable for graduate students and data scientists.

Uploaded by

miekotelcel63
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
14 views

(Ebook) Sparse Estimation with Math and Python: 100 Exercises for Building Logic by Joe Suzuki ISBN 9789811614378, 9811614377 - The ebook is ready for download with just one simple click

The document promotes instant access to various eBooks related to mathematical concepts and programming, particularly focusing on sparse estimation and machine learning. It highlights the book 'Sparse Estimation with Math and Python' by Joe Suzuki, which includes 100 exercises aimed at developing logic and understanding in the field. The content emphasizes practical programming alongside theoretical knowledge, making it suitable for graduate students and data scientists.

Uploaded by

miekotelcel63
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Instant Ebook Access, One Click Away – Begin at ebooknice.

com

(Ebook) Sparse Estimation with Math and Python:


100 Exercises for Building Logic by Joe Suzuki
ISBN 9789811614378, 9811614377

https://ebooknice.com/product/sparse-estimation-with-math-
and-python-100-exercises-for-building-logic-35997448

OR CLICK BUTTON

DOWLOAD EBOOK

Get Instant Ebook Downloads – Browse at https://ebooknice.com


Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Start reading on any device today!

(Ebook) Sparse Estimation with Math and Python: 100 Exercises for Building Logic by
Joe Suzuki ISBN 9789811614378, 9811614377

https://ebooknice.com/product/sparse-estimation-with-math-and-
python-100-exercises-for-building-logic-35997450

ebooknice.com

(Ebook) Statistical Learning with Math and Python: 100 Exercises for Building Logic
by Joe Suzuki ISBN 9789811578762, 9811578761

https://ebooknice.com/product/statistical-learning-with-math-and-
python-100-exercises-for-building-logic-33792268

ebooknice.com

(Ebook) Kernel Methods for Machine Learning with Math and Python: 100 Exercises for
Building Logic by Joe Suzuki ISBN 9789811904004, 9811904006

https://ebooknice.com/product/kernel-methods-for-machine-learning-with-math-and-
python-100-exercises-for-building-logic-43837738

ebooknice.com

(Ebook) Sparse Estimation with Math and R by Joe Suzuki ISBN 9789811614453,
9811614458

https://ebooknice.com/product/sparse-estimation-with-math-and-r-33795076

ebooknice.com
(Ebook) doing math with python doing math with python by kan

https://ebooknice.com/product/doing-math-with-python-doing-math-with-
python-50196050

ebooknice.com

(Ebook) The Definitive Guide to Masonite: Building Web Applications with Python by
Christopher Pitt, Joe Mancuso ISBN 9781484256015, 1484256018

https://ebooknice.com/product/the-definitive-guide-to-masonite-building-web-
applications-with-python-50194580

ebooknice.com

(Ebook) Math for Programmers: 3D graphics, machine learning, and simulations with
Python MEAP V10 by Paul Orland

https://ebooknice.com/product/math-for-programmers-3d-graphics-machine-learning-
and-simulations-with-python-meap-v10-11069540

ebooknice.com

(Ebook) Math Adventures with Python: An Illustrated Guide to Exploring Math with
Code by Peter Farrell ISBN 9781593278670, 1593278675

https://ebooknice.com/product/math-adventures-with-python-an-illustrated-guide-
to-exploring-math-with-code-36151058

ebooknice.com

(Ebook) Math Adventures with Python: An Illustrated Guide to Exploring Math with
Code by Peter Farrell ISBN 9781593278670, 1593278675

https://ebooknice.com/product/math-adventures-with-python-an-illustrated-guide-
to-exploring-math-with-code-10674096

ebooknice.com
Joe Suzuki

Sparse Estimation with Math and


Python
100 Exercises for Building Logic
1st ed. 2021
Joe Suzuki
Graduate School of Engineering Science, Osaka University, Toyonaka,
Osaka, Japan

ISBN 978-981-16-1437-8 e-ISBN 978-981-16-1438-5


https://doi.org/10.1007/978-981-16-1438-5

© The Editor(s) (if applicable) and The Author(s), under exclusive


license to Springer Nature Singapore Pte Ltd. 2021

This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other
physical way, and transmission or information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks,


service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer
Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04
Gateway East, Singapore 189721, Singapore
Preface
I started considering the sparse estimation problems around 2017
when I moved from the mathematics department to statistics in Osaka
University, Japan. I have been studying information theory and
graphical models for over thirty years.
The first book I found is “Statistical Learning with Sparsity” by T.
Hastie, R. Tibshirani, and M. Wainwright. I thought it was a monograph
rather than a textbook and that it would be tough for a non-expert to
read it through. However, I downloaded more than fifty papers that
were cited in the book and read them all. In fact, the book does not
instruct anything but only suggests how to study sparsity. The contrast
between statistics and convex optimization gradually attracted me as I
understand the material.
On the other hand, it seems that the core results on sparsity have
come out around 2010–2015 for research. However, I still think further
possibilities and expansions are there. This book contains all the
mathematical derivations and source programs, so graduate students
can construct any procedure from scratch by getting help from this
book.
Recently, I published books “Statistical Learning with Math and R”
(SLMR), “Statistical Learning with Math and Python” (SLMP), and
“Sparse Estimation with Math and R” (SEMR). The common idea is
behind the books (XXMR/XXMP). They not only give knowledge on
statistical learning and sparse estimation but also help build logic in
your brain by following each step of the derivations and each line of the
source programs. I often meet data scientists engaged in machine
learning and statistical analyses for research collaborations and
introduce my students to them. I recently found out that almost all of
them think that (mathematical) logic rather than knowledge and
experience is the most crucial ability for grasping the essence in their
jobs. Our necessary knowledge is changing every day and can be
obtained when needed. However, logic allows us to examine whether
each item on the Internet is correct and follow any changes; we might
miss even chances without it.
What makes SEMP unique?
I have summarized the features of this book as follows.
1. Developing logic

To grasp the essence of the subject, we mathematically formulate


and solve each ML problem and build those programs. The SEMP
instills “logic” in the minds of the readers. The reader will acquire
both the knowledge and ideas of ML, so that even if new technology
emerges, they will be able to follow the changes smoothly. After
solving the 100 problems, most of the students would say “I learned
a lot”.

2. Not just a story

If programming codes are available, you can immediately take


action. It is unfortunate when an ML book does not offer the source
codes. Even if a package is available, if we cannot see the inner
workings of the programs, all we can do is input data into those
programs. In SEMP, the program codes are available for most of the
procedures. In cases where the reader does not understand the
math, the codes will help them understand what it means.

3. Not just a how to book: an academic book written by a university


professor.

This book explains how to use the package and provides examples
of executions for those who are not familiar with them. Still,
because only the inputs and outputs are visible, we can only see the
procedure as a black box. In this sense, the reader will have limited
satisfaction because they will not be able to obtain the essence of
the subject. SEMP intends to show the reader the heart of ML and is
more of a full-fledged academic book.

4. Solve 100 exercises: problems are improved with feedback from


university students
The exercises in this book have been used in university lectures and
have been refined based on feedback from students. The best 100
problems were selected. Each chapter (except the exercises)
explains the solutions, and you can solve all of the exercises by
reading the book.
5. Self-contained

All of us have been discouraged by phrases such as “for the details,


please refer to the literature XX.” Unless you are an enthusiastic
reader or researcher, nobody will seek out those references. In this
book, we have presented the material in such a way that consulting
external references is not required. Additionally, the proofs are
simple derivations, and the complicated proofs are given in the
appendices at the end of each chapter. SEMP completes all
discussions, including the appendices.

6. Readers’ pages: questions, discussion, and program files

The reader can ask any question on the book via https://​bayesnet.​
org/​books.

Acknowledgments The author wishes to thank Tianle Yang, Ryosuke


Shinmura, Tomohiro Kamei and Daichi Kashiwara for checking the
manuscript in Japanese. The author thanks Professors Shu-ichi Kawano,
Hidetoshi Matsui, and Kei Hirose for their helpful advices for three
years before the current book has been published. This English book is
largely based on the Japanese book published by Kyoritsu Shuppan Co.,
Ltd. in 2020. The author would like to thank Kyoritsu Shuppan Co., Ltd.,
in particular its editorial members Mr. Tetsuya Ishii and Ms. Saki Otani.
The author also appreciates Ms. Mio Sugino, Springer, for preparing the
publication and providing advice on the manuscript.
Joe Suzuki
Osaka, Japan
September 2021
Contents
1 Linear Regression
1.​1 Linear Regression
1.​2 Subderivative
1.​3 Lasso
1.​4 Ridge
1.​5 A Comparison Between Lasso and Ridge
1.​6 Elastic Net
1.7 About How to Set the Value of

Exercises 1–20
2 Generalized Linear Regression
2.​1 Generalization of Lasso in Linear Regression
2.​2 Logistic Regression for Binary Values
2.​3 Logistic Regression for Multiple Values
2.​4 Poisson Regression
2.​5 Survival Analysis
Appendix Proof of Proposition
Exercises 21–33
3 Group Lasso
3.​1 When One Group Exists
3.​2 Proxy Gradient Method
3.​3 Group Lasso
3.​4 Sparse Group Lasso
3.​5 Overlap Lasso
3.​6 Group Lasso with Multiple Responses
3.​7 Group Lasso Via Logistic Regression
3.​8 Group Lasso for the Generalized Additive Models
Appendix Proof of Proposition
Exercises 34–46
4 Fused Lasso
4.​1 Applications of Fused Lasso
4.​2 Solving Fused Lasso Via Dynamic Programming
4.​3 LARS
4.​4 Dual Lasso Problem and Generalized Lasso
4.​5 ADMM
Appendix Proof of Proposition
Exercises 47–61
5 Graphical Models
5.​1 Graphical Models
5.​2 Graphical Lasso
5.​3 Estimation of the Graphical Model Based on the Quasi-
Likelihood
5.​4 Joint Graphical Lasso
Appendix Proof of Propositions
Exercises 62–75
6 Matrix Decomposition
6.​1 Singular Decomposition
6.​2 Eckart-Young’s Theorem
6.​3 Norm
6.​4 Sparse Estimation for Low-Rank Estimations
Appendix Proof of Propositions
Exercises 76–87
7 Multivariate Analysis
7.​1 Principal Component Analysis (1):​SCoTLASS
7.​2 Principle Component Analysis (2):​SPCA
7.3 -Means Clustering
7.​4 Convex Clustering
Appendix Proof of Proposition
Exercises 88–100
References
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
J. Suzuki, Sparse Estimation with Math and Python
https://doi.org/10.1007/978-981-16-1438-5_1

1. Linear Regression
Joe Suzuki1
(1) Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka, Japan

Joe Suzuki
Email: [email protected]

In general statistics, we often assume that the number of samples N is greater than the
number of variables p. If this is not the case, it may not be possible to solve for the best-
fitting regression coefficients using the least squares method, or it is too
computationally costly to compare a total of models using some information
criterion.
When p is greater than N (also known as the sparse situation), even for linear
regression, it is more common to minimize, instead of the usual squared error, the
modified objective function to which a term is added to prevent the coefficients from
being too large (the so-called regularization term). If the regularization term is a
constant times the L1-norm (resp. L2-norm) of the coefficient vector, it is called Lasso
(resp. Ridge). In the case of Lasso, if the value of increases, there will be more
coefficients that go to 0, and when reaches a certain value, all the coefficients will
eventually become 0. In that sense, we can say that Lasso also plays a role in model
selection.
In this chapter, we examine the properties of Lasso in comparison to those of Ridge.
After that, we investigate the elastic net, a regularized regression method that combines
the advantages of both Ridge and Lasso. Finally, we consider how to select an
appropriate value of .

1.1 Linear Regression


Throughout this chapter, let and be integers, and let the (i, j) element of
the matrix and the kth element of the vector be denoted by and

, respectively. Using these X, y, we find the intercept and the slope


that minimize . Here, the L2-norm of

is denoted by .

First, for the sake of simplicity, we assume that the jth column of X,
and y have already been centered. That is, for each , define

, and assume that has already been subtracted from each so that is
satisfied. Similarly, defining , we assume that was subtracted in advance

from each so that holds. Under this condition, one of the parameters

for which we need to solve, say, , is always 0. In particular,

holds. Thus, from now, without loss of generality, we may assume that the intercept
is zero and use this in our further calculations.
We begin by first observing the following equality:

(1.1)

In particular, the jth element of each side can be rewritten as follows:

Thus, when we set the right-hand side of (1.1) equal to 0, and if is invertible, then
becomes

(1.2)
For the case where , write each column of X as ; we see that
(1.3)

If we had not performed the data centering, we would still obtain the same slope ,

though the intercept would be

(1.4)

Here, and are the means before data centering.


We can implement the above using the Python as follows:

In this book, we focus more on the sparse case, i.e., when p is larger than N. In this
case, a problem arises. When , the matrix does not have an inverse. In fact,
since

is singular. Moreover, when X has precisely the same two columns, ,


and the inverse matrix does not exist.
On the other hand, if p is rather large, we have p independent variables to choose
from for the predictors of the variable when carrying out model selection. Thus, the
combinations are

(that is, whether we choose each of the variables or not). This means we have to
compare a total of models. Then, to extract the proper model combination by using
an information criterion or cross-validation, the computational resources required will
grow exponentially with the number of variables p.
To deal with this kind of problem, let us consider the following. Let be a
constant. We add a term to that penalizes for being too large in size.

Specifically, we define

(1.5)

or

(1.6)

Our work now is to solve for the minimizer to one of the above quantities. Here,
for , is the L1-norm and is the L2-

norm of . To be more specific, we plug mean-centered and into

either (1.5) or (1.6), then minimize it with respect to the slope (this is called

Lasso or Ridge, respectively), and finally, we use (1.4) to compute .

1.2 Subderivative
To address the minimization problem for Lasso, we need a method for optimizing
functions that are not differentiable. When we want to find the points x of the maxima
or minima of a single-variable polynomial, say, , we can differentiate

it and find the solution to . However, what should we do when we encounter


functions such as , which contain an absolute value? To address

this, we need to extend our concept of differentiation to a more general one.


Throughout the following claims, let us assume that f is convex [4, 6]. In general, we
say that f is convex (downward)1 if, for any and ,

holds. For instance, is convex (Fig. 1.1, left) because


is satisfied. To check this, since both sides are nonnegative, the RHS squared minus the
LHS squared gives . As another example, consider

(1.7)

This function satisfies the following:

Therefore, it is not convex (Fig. 1.1, right). If functions f, g are convex, then for any
, the function has to be convex since the following holds:

Fig. 1.1 Left: is convex. However, at the origin, the derivatives from each side differ; thus, it

is not differentiable. Right: We cannot simply judge from its shape, but this function is not convex
Next, for any convex function , fix arbitrarily. For all , we
say that the set of all that satisfies

(1.8)
is a subderivative of f at .
If f is differentiable at , then the subderivative will be a set that contains only 1
element, say, .2 We prove this as follows.
First, the convex function f is differentiable at ; thus, it satisfies
. To see this, since f is convex,
This can be rewritten as

In fact, whether or , we have that

holds. Thus, the above inequation is true.


Next, when the convex function f is differentiable at , we can show that is
the one and only value of z that satisfies (1.8). In particular, when , for (1.8) to be
satisfied, we need . Similarly, when , for (1.8) to be satisfied, we

need . Thus, z needs to be larger than or equal to the derivative on the

left and, at the same time, be less than or equal to the derivative on the right at . Since
f is differentiable at , those 2 derivatives are equal; this completes the proof.
The main interest of this book is specifically the case where and .
Hence, by (1.8), its subderivative is the set of z such that for any , . These
values of z lie in the interval greater than or equal to and less than or equal to 1, and

is true. Let us confirm this. If for any x, holds, then for and ,
and , respectively, need to be true. Conversely, if , then
is true for any arbitrary .
Example 1 By dividing into 3 cases , , and , find the values x that
attain the minimum of and . Note that for , we can find

their usual derivatives, but for at , its subderivative is the interval


.

Therefore, has a minimum at (Fig. 1.2, left).

Therefore, has a minimum at (Fig. 1.2, right). We use the following

code to draw the figures.


Fig. 1.2 (left) has a minimum at , and (right) has a minimum at

. Neither is differentiable at . Despite not being differentiable, the point is a minimum for

the figure on the right


The subderivative of at is the interval [−1,1]. This fact
summarizes this chapter.

1.3 Lasso
As stated in Sect. 1.1, the method considered for the minimization of

is called Lasso [28].


From the formularization of (1.5) and (1.6), we can tell that Lasso and Ridge are the
same in the sense that they both try to control the size of regression coefficients .
However, Lasso also has the property of leaving significant coefficients as nonnegative,
which is particularly beneficial in variable selection. Let us consider its mechanism.
Note that in (1.5), the division of the first term by 2 is not essential: we would obtain
an equivalent formularization if we double the value of . For the sake of simplicity,
first let us assume that

(1.9)

holds, and let . With this assumption, the calculations are made much

simpler.
Solving for the subderivative of L with respect to gives

(1.10)

which means that

Fig. 1.3 Shape of the function when

Thus, we have that

Here, the RHS can be rewritten using the following function:

(1.11)

and hence becomes . We plot the graph of when in Fig. 1.3


using the code provided below:
Next, let us consider the case where (1.9) is not satisfied. We rewrite (1.10) by

Here, we denote by , and let be . Next, fix ( ),

and update . We do this repeatedly from to until it converges


(coordinate descent). For example, we can implement the algorithm as follows:

Note that here, after we obtain the value of , we use and to


calculate the value of . The following centralize function performs data centering
and returns a list of 5 results.
Thus, we may standardize the data first, then perform Lasso, and finally restore the
data. The aim of doing this is to examine our data all at once. Because the algorithm sets
all less than or equal to to 0, we do not want it to be based on the indices

. Each jth column of X is divided by scale[j], and consequently, the


estimated will be larger to that extent. We then divide by scale[j] as well.

Example 2 Putting U.S. crime data


https://web.stanford.edu/ hastie/StatLearnSparsity/data.htm into
the text file crime.txt, we set the crime rate per 1 million residents as the target
variable and then select appropriate explanatory variables from the list below by
performing Lasso.

Column Cov./Res. Definition of variable


1 Response Crime rate per 1 million residents
2 (we currently do not use this)
3 Covariate Annual police funding
4 Covariate % of people 25 years+ with 4 yrs. of high school education
5 Covariate % of 16–19-year-old persons not in high school and not high school graduates
6 Covariate % of 18–24-year-old persons in college
7 Covariate % of people 25 years+ with at least 4 years of college education

We call the function linear_lasso and execute as described below:


Fig. 1.4 Result of Example 2. In the case of Lasso, we see that as increases, the coefficients decrease.

At a certain , all the coefficients will be 0. The at which each coefficient becomes 0 varies

As we can see in Fig. 1.4, as increases, the absolute value of each coefficient
decreases. When reaches a certain value, all coefficients go to 0. In other words, for
each value of , the set of nonzero coefficients differs. The larger becomes, the
smaller the set of selected variables.
When performing coordinate descent, it is quite common to begin with a large
enough that every coefficient is zero and then make smaller gradually. This method is
called a warm start, which utilizes the fact that when we want to calculate the
coefficient for all values of , we can improve the calculation performance by setting
the initial value of to the estimated from a previous . For example, we can write
the program as follows:

Example 3 We use the warm start method to reproduce the coefficient for each
in Example 2.

First, we make the value of large enough so that all are 0 and
then gradually decrease the size of while performing coordinate descent. Here, for
simplicity, we assume that for each , we have ; moreover, the
values of are all different. In this case, the smallest that makes

can be calculated by . Particularly, for any

larger than this formula, it will be satisfied that for all , . Then, we
have that and

hold. Thus, when we decrease the size of , one of the values of j will satisfy
. Again, since ( ), if we continue to make it

smaller, we still have that , and thus, the value of for

that j becomes smaller than .

Fig. 1.5 The execution result of Example 4. The numbers at the top represent the number of estimated
coefficients that are not zero
The glmnet package is often used [11].
Example 4 (Boston) Using the Boston dataset and setting the variable of the 14th
column as the target variable and the other 13 variables as predictors, we plot a graph
similar to that of the previous one (Fig. 1.5).

Column Variable Definition of variable


1 CRIM Per capita crime rate by town
2 ZN Proportion of residential land zoned for lots over 25,000 sq.ft.
3 INDUS Proportion of nonretail business acres per town
4 CHAS Charles River dummy variable (1 if tract bounds river; 0 otherwise)
5 NOX Nitric oxide concentration (parts per 10 million)
6 RM Average number of rooms per dwelling
7 AGE Proportion of owner-occupied units built prior to 1940
8 DIS Weighted distances to five Boston employment centers
9 RAD Index of accessibility to radial highways
10 TAX Full-value property-tax rate per $10,000
11 PTRATIO Pupil-teacher ratio by town
12 BLACK Proportion of blacks by town
13 LSTAT % lower status of the population
14 MEDV Median value of owner-occupied homes in multiples of $1,000

So far, we have seen how Lasso can be useful in the process of variable selection.
However, we have not explained the reason why we seek to minimize (1.5) for .
Why do we not instead consider minimizing the following usual information criterion
(1.12)

for ? Here, represents the number of nonzero elements of that vector.


Lasso, as well as Ridge, which is to be discussed in the next chapter, has the
advantage that minimization is convex. For a globally convex function, because the
minimum and minimal points coincide, we can search the optimal solution effectively.
On the other hand, the minimization by (1.12) requires an exponential time for the
number of variables, p. In particular, since the function in (1.7) is not convex,
cannot be convex either. Because of this, we need to consider instead the minimization
of (1.5). An optimization problem becomes meaningful only after there exists an
effective search algorithm.

1.4 Ridge
In Sect. 1.1, we made an assumption about that the matrix is

invertible, and, based on this, we showed that the that minimizes the squared error
is given by .

First, when , the possibility that is singular is not that high, though we
may have another problem instead: the confidence interval becomes significant when
the determinant is small. To cope with this, we let be a constant and add to the
squared error the norm of times . That is, the method considering the minimization
of

($1.6$)

is commonly used. This method is called Ridge. Differentiating L with respect to gives

If is not singular, we obtain

Here, whenever , even for the case , we have that is

nonsingular. In particular, since the matrix is positive semidefinite, we have that


its eigenvalues are all nonnegative. Therefore, the eigenvalues of
can be calculated by

and thus, all of them are positive.


Again, when all the eigenvalues are positive, their product, , is also

positive, which is the same as saying that is nonsingular. Note that this

always holds regardless of the sizes of p, N. When , the rank of is

less than or equal to N, and hence, the matrix is singular. Therefore, for this case, the
following conditions are equivalent:

As an example of the Ridge case, we can write the following program:

Example 5 We use the same U.S. crime data as that of Example 2 and perform the
following analysis. To control the size of the coefficient of each predictor, we call the
function ridge and then execute.
In Fig. 1.6, we plot how each coefficient changes with the value of .

Fig. 1.6 The execution result of Example 5. The changes in the coefficient with respect to based

on Ridge. As becomes larger, each coefficient decreases to 0


1.5 A Comparison Between Lasso and Ridge
Next, let us compare Fig. 1.4 of Lasso to Fig. 1.6 of Ridge. We can see that they are the
same in the sense that when becomes larger, the absolute value of each coefficient
approaches 0. However, in the case of Lasso, when reaches a certain value, one of the
coefficients becomes exactly zero, and the time at which that occurs varies for each
variable. Thus, Lasso can be used for variable selection.
So far, we have shown this fact analytically, but it is also good to have an intuition
geometrically. Figures similar to those in Fig. 1.7 are widely used when one wants to
compare Lasso and Ridge.
Fig. 1.7 Each ellipse is centered at , representing the contour line connecting all the points that

give the same value of (1.13). The rhombus in the left figure is the L1 regularization constraint
, while the circle in the right figure is the L2 regularization constraint

Let such that is composed of 2 columns . In

the least squares process, we solve for the values that minimize
. For now, let us denote them by , respectively. Here,

if we let , we have

However, since for any ,

holds, we can rewrite the quantity to be minimized, , as follows:

(1.13)
and, of course, if we let here, we obtain the minimum ( RSS).

Thus, we can view the problems of Lasso and Ridge in the following way: the
minimization of quantities (1.5) and (1.6) is equivalent to finding the values of
that satisfy the constraints and , respectively, that also

minimize the quantity of (1.13) (here, the case where is large is equivalent to the case
where are small).
The case of Lasso is the same as in the left panel of Fig. 1.7. The ellipses are centered
at and represent contours on which the values of (1.13) are the same. We

expand the size of the ellipse (the contour), and once we make contact with the
rhombus, the corresponding values of are the solution to Lasso. If the rhombus
is small ( is large), it is more likely to touch only one of the four rhombus vertices. In
this case, one of the values will become 0. However, in the Ridge case, as in the
right panel of Fig. 1.7, a circle replaces the Lasso rhombus; hence, it is less likely that
will occur.
In this case, if the least squares solution lies in the green zone of Fig. 1.8,

then we have either or as our solution. Moreover, when the rhombus is


small ( is large), even when remains the same, the green zone will become

larger, which is the reason why Lasso performs well in variable selection.

Fig. 1.8 The green zone represents the area in which the optimal solution would satisfy either

or if the center of the ellipses lies within it


We should not overlook one of the advantages of Ridge: its performance when
dealing with the case of collinearity. That is, it can handle well even the case where the
matrix of explanatory variables contains columns that are highly related. Let us define
by

the VIF (variance inflation factor). The larger this value is, the better the jth column
variable is explained by the other variables. Here, denotes the coefficient of

determination squared in which is the target variable, and the other variables are
predictors.

Example 6 We compute the VIF for the Boston dataset. It shows that the 9th and 10th
variables (RAD and TAX) have strong collinearity.
In usual linear regression, if the VIF is large, the estimated coefficient will be

unstable. Particularly, if two columns are precisely the same, the coefficient is
unsolvable. Moreover, for Lasso, if two columns are highly related, then generally one of
them will be estimated as 0 and the other as nonzero. However, in the Ridge case, for
, even when the columns j, k of X are the same, the estimation is solvable, and both
of them will obtain the same value.
In particular, we find the partial derivative of

with respect to and make it equal to 0.


Then, plugging into each gives
Example 7 We perform Lasso for the case where the variables and
have strong correlations. We generate groups of data distributed in
the following way:

Then, we apply linear regression analysis with Lasso to . We plot how

the coefficients change relative to the value of in Fig. 1.9. Naturally, one might expect
that similar coefficient values should be given to each of the related variables, though it
turns out that Lasso does not behave in this way.
Fig. 1.9 The execution result of Example 7. The Lasso case is different from that of Ridge: related
variables are not given similar estimated coefficients. When we use the glmnet package, its default
horizontal axis is the L1-norm [11]. This is the value , which becomes smaller as increases.

Therefore, the figure here is left-/right-reversed compared to that where or is set as the

horizontal axis

1.6 Elastic Net


Up until now, we have discussed the pros and cons of Lasso and Ridge. This section
studies a method intended to combine the advantages of the two, i.e., the elastic net.
Specifically, the method considers the problem of finding that minimizes

(1.14)

The that minimizes (1.14) for the case of Lasso ( ) is ,

while for the case of Ridge ( ), it is . In general cases, it is


(1.15)

This is called the elastic net. For each , if we find the subderivative of (1.14)
with respect to , we obtain

Here, let , , and . We have used the

fact that

where is a function that returns for each element.


Then, we can write the program for the elastic net based on (1.15), as shown in the
following. Here, we added a parameter to the line of #, and the only essential change
lies in the three lines of ##.
Example 8 If we add the additional parameter alpha = 0, 0.25, 0.5, 0.75
to the function elastic_net of Example 7, we obtain the graph of Fig. 1.10. As
approaches 0, we can observe how the coefficients of related variables become closer to
one another. This outcome reveals how Ridge responds to collinearity.
Fig. 1.10 The execution result of Example 8. The closer is to 0 (the closer the model is to Ridge), the

more it is able to handle collinearity, which is in contrast to the case of (Lasso), where the

related variable coefficients are not estimated equally

1.7 About How to Set the Value of


To set an appropriate value for , the method of cross-validation (CV) is often used.3
For example, the 10-fold CV for each divides the data into ten groups, with 9 of
them used to estimate and 1 used as the test data, and then evaluates the model.
Switching the test group, we can perform this evaluation ten times in total and then
calculate the mean of these evaluation values. Then, we choose the that has the
highest mean evaluation value. If we plug the sample data of the target and the
explanatory variables into the CV procedure, it evaluates each value of and returns
the of the highest evaluation value as an output.
Fig. 1.11 The execution result of Example 9. We observe that was the best value

Example 9 We apply the function cv.glmnet to the U.S. crime dataset from
Examples 2 and 5, obtain the optimal , use it for the usual Lasso, and then obtain the
coefficients of . For each , the function also provides the value of the least squares of
the test data and the confidence interval (Fig. 1.11). Each number above the figure
represents the number of nonzero coefficients for that .

For the elastic net, we have to perform double loop cross-validation for both .
The function cv_glmnet provides us the output cvm, which contains the evaluation
values from the cross-validation.

Example 10 We generate random numbers as our data and try conducting the double
loop cross-validation for .
For the problem, we changed the values and compared the evaluation for the best
to find that the difference is within 1% through .
Exercises 1–20
In the following exercises, we estimate the intercept and the coefficients
from N groups of p explanatory variables and a target variable data
. We subtract from each the
and from each the such that each of

them has mean 0 and then estimate (the estimated value is denoted by ). Finally, we
let . Again, we let and

1. Prove that the following 2 equalities hold.

Moreover, when is invertible, prove that the value of that minimizes


is given by . Here, for

, we define . Then, write a program

for a function called linear with Python. This function accepts a matrix

and a vector as inputs and then calculates the estimated intercept

and the estimated slope as outputs. Please complete blanks (1), (2) below.

Here, the function centralize conducting mean-centering to X, y is used. In the


3rd and 4th lines below, X, y has already been centered.

2. For a function that is convex (downward), for , we have


The set of (subderivative) is denoted by . Show that the following
function f is convex. In addition, at , find .

3.
(a)

(b)

Hint For any and , if holds,


then we say that f is convex. For (b), show that .

(a) If the functions g(x), h(x) are convex, show that for any , the function
is convex.

(b) Show that the function is not convex.

(c) Decide whether the following functions of are convex or not. In addition,
provide the proofs.
(i)

(ii)

(iii)

Here, denotes the number of nonzero elements, denotes the sum of


the absolute values of all elements, and denotes the square root of the sum
of the square of each element. Moreover, let .
Another Random Scribd Document
with Unrelated Content
Fig. 60.—Siemens’ Optical Pyrometer, on Stand.
Fig. 61.—Siemens’ Optical Pyrometer, Portable
Form.
The adjustment in this pyrometer is simple, and the condition of
equality sharply defined. Whereas, in matching the colours of two
contiguous fields, separate observers may disagree to an extent
representing 40° C. or more, a divergence of 10° C. is seldom
exceeded when different operators adjust the tip of the filament to
extinction. In a special test to decide this point, the author compared
the observations of five persons, some trained and others untrained,
with the result that all agreed to within 10° at a steady temperature
in the vicinity of 1200° C.; and in this respect the Holborn-Kurlbaum
pyrometer is superior to other forms of optical pyrometer. The
continuous accuracy of the readings depends upon the permanence
of the standard lamp, which is ensured by over-burning for 20 hours,
after which the lamp may be used at its proper voltage for a long
period without further change. As used for occasional readings in the
workshop, such a lamp will last for a year or more without varying in
brightness by an amount representing 10° C. at a temperature of
1800° C. When a new lamp is used, a fresh calibration is necessary;
the makers, however, in such case send out a new temperature scale
with the lamp.

Lovibond’s Pyrometer.—It is possible, by the use of


coloured glasses superposed, to match closely any given colour; and
Lovibond, whose tintometer for this purpose is well known, has
applied this method to temperature measurement. Taking the case
of a block of steel in a furnace, it is possible to arrange combinations
of glasses which, when illuminated by a standard light, will give the
same tint as the steel at any specified temperature. If it be desired
to work the steel at 850° C., for example, glasses are provided
which, when viewed by the light transmitted from a 4-volt glow-
lamp, using a constant current, represent the tint of steel at 840°,
850°, and 860° respectively. The image of the steel is reflected by a
mirror through one hole in a brass plate, which forms the end of a
wooden box, at the opposite end of which an eye-piece is placed. A
second hole in the brass plate receives light from the standard lamp,
after passing through the glasses; and the appearances of the two
lights are then compared. A skilled eye can readily detect a
disagreement in the two fields corresponding to 10° C.; and by
introducing the glasses in turn it can be observed whether the steel
is within 10° C. of the temperature required. This instrument is
cheap and simple, but is obviously only useful in deciding a pre-
arranged temperature, as to take a measurement at an undefined
temperature would involve an unwieldy number of glasses, and
absorb a considerable time. The correct glasses to use for a given
operation are decided under working conditions at temperatures
measured by a standard pyrometer; after which any number of
instruments may be made from glasses of the same colour and
absorptive power as those used in the calibration. Correct matching
is difficult below 700° C.
Mesuré and Nouel’s Pyrometer.—This instrument,
shown in fig. 62, consists of two Nicol prisms, between which is
placed a piece of quartz cut perpendicularly to its axis. Light from
the source, in passing through the first Nicol prism, is all polarised in
the same plane; but on passing through the quartz is polarised in
various planes, according to the wave-length. The colour seen after
passing through the second prism, used as analyser, will depend
upon the angle between this and the first or polarising prism. The
analyser is connected to a rotating disc, divided into angular
degrees; and on viewing the heated source the colour will appear
red if the analyser be turned in one direction, and green if rotated in
the opposite. The intermediate colour is a lemon-yellow; and the
adjustment consists in rotating the analyser until this tint is
obtained. The angular reading is then taken, and the temperature
read off from a table prepared by making observations at known
temperatures. Observers may disagree by as much as 100° C. in
using this pyrometer, owing to differences in eyesight and judgment
of the lemon-yellow tint; but a given operator, who has trained
himself to the use of the instrument, may obtain much closer results
with practice. The chief use of this device is to enable a judgment to
be formed as to whether a furnace is above or below an assigned
temperature, within limits of 25° C. on either side at the best; and
hence it is convenient for a foreman or metallurgist to carry about
for this purpose when other pyrometers are not in use. A great
advantage is that the instrument is always ready for use, and has no
accessories.
Fig. 62.—Mesuré and Nouel’s Pyrometer.
Colour-extinction Pyrometers.—Various attempts
have been made to produce superposed glasses, or cells of coloured
fluids, which will have the effect of extinguishing the colour of a
heated source. As an example, three cells containing various dyes in
solution may be prepared which, when looked through, will
extinguish the colour at 840°, 850°, and 860° C. respectively. If it be
desired to work at 850°, a difference of 10° on either side may be
detected by a trained eye; but to follow a changing temperature a
large number of cells would evidently be necessary. Heathcote’s
extinction pyrometer, in its early form, consisted of an eye-shade in
front of which two pairs of cells containing coloured fluid were
mounted. In bringing a furnace to an assigned temperature,
observation was made from time to time until a faint red image was
perceived through one pair of cells, when the heat supply was
regulated so as to maintain the existing temperature. When viewed
through the second pair of cells, which contained a slightly darker
fluid, no red image was to be seen at the correct temperature. With
training, a workman could control a furnace to a fair degree of
accuracy by this means, but the operation was tedious, and useful
only for the attainment of a single temperature. In a later
instrument, known as the “Pyromike” (fig. 63), Heathcote employs a
single cell with flexible walls, so that by turning the screw-end, the
length of the column of fluid interposed between the eye and the
furnace can be altered. In taking a reading, the furnace is sighted
and the screw turned so as to increase the length of the column of
coloured fluid, until the image is no longer visible. A direct reading of
the temperature is then obtained on a spiral scale marked on the
cylindrical body of the instrument, over which the screwed portion
rotates. This forms a simple and convenient temperature gauge for
workshop use.

Fig. 63.—Heathcote’s Extinction Pyrometer or


“Pyromike.”
Fig. 64.—“Wedge” Pyrometer.
The “Wedge” Pyrometer, designed by Alder and Cochrane (fig.
64), consists of a small telescope through which a prism of darkened
glass may be moved, and which is focused on the heated object. By
turning a head the wedge may be moved so as to interpose a thicker
layer of dark glass between the eye and the furnace, and the same
operation causes a temperature scale to pass in front of a fixed
pointer. When the image of the hot source is just extinguished, the
temperature is read from the mark opposite the fixed point. Training
is needed to enable an observer to judge the exact point of
extinction, when it becomes possible to obtain results of 20° C. in
the region of 1300° C. On the other hand, when used by one
unaccustomed to the instrument, the reading may be wrong by 50°
C. or more. As an aid to the judgment near the extinction point, the
hand may be interposed between the telescope and furnace, when,
if extinction be complete, no alteration should be observed in the
field of view. The simple construction of this pyrometer is an
advantage, no accessories being needed; and when used with the
precautions stated above, readings sufficiently close for many
processes can easily be obtained.
Management of Optical Pyrometers.—Careful
usage is essential with optical pyrometers, which are liable to get out
of adjustment with rough handling; and for this reason a trained
observer should be in charge of such instruments. Skilled attention is
equally requisite in taking readings, as the matching of tints correctly
is an operation which demands a high degree of judgment. Careful
attention must be paid to the standard lights; if flames, regulation to
the standard height is essential; if electric lamps, care must be taken
not to use them for a longer period than necessary, in order to
increase the useful life. Accumulators should be recharged regularly
—say once in two weeks—to keep in good order. Separate parts,
such as absorption glasses, should be kept in a place of safety, as
their destruction may involve a new calibration. It should be kept in
mind that the temperatures indicated by optical pyrometers are
“black” temperatures; that is, they correspond to the readings that
would be given by a black-body of the same degree of brightness. In
consequence, readings should always be taken under black-body
conditions, the precautions in this respect being identical with those
necessary for total-radiation pyrometers, given on page 163. In
some special cases the connection between the apparent and true
temperatures has been worked out for a given type of pyrometer,
but, owing to the different emissive powers of different substances,
no general relation can be given.
Special Uses of Optical Pyrometers.—The
advantageous use of optical pyrometers is restricted to observations
at temperatures beyond the scope of instruments which have the
working part in the furnace; or to cases in which occasional readings
of temperature suffice. To follow a changing temperature continuous
adjustment is necessary, involving labour, and therefore costly.
Amongst workshop uses may be mentioned: (1) ascertaining the
temperature of pottery kilns and glass and steel furnaces; (2) in the
treatment of steels at very high temperatures, to which end the
pyrometer may be set to a given reading, and the process carried
out when the steel is observed to attain such assigned temperature;
(3) to take casual readings when a number of furnaces are in use, or
when a number of sighting-holes are provided, as in large
brickmaking furnaces; and (4) for occasional observations of the
firing end of rotary cement kilns. As an instrument of research in the
laboratory, a good form of optical pyrometer is very useful, as, for
example, in investigating the working temperatures of electric lamps,
and taking observations in electric furnaces. It is a great drawback
that records cannot be taken by optical pyrometers, as much
valuable information can be gathered from an accurate knowledge of
temperature fluctuations in most operations. This disadvantage must
always militate against the general use of these instruments.
CHAPTER VII
CALORIMETRIC PYROMETERS
General Principles.—If a piece of hot metal, of known
weight and specific heat, be dropped into a known weight of water
at a temperature t1, which rises to t2 in consequence, the
temperature of the hot metal, t0, can be obtained by calculation, as
shown by the following example:—
Example.—A piece of metal weighing 100
grams, and of specific heat 0·1, is heated in
a furnace and dropped into 475 grams of
water, contained in a vessel which has a
capacity for heat equal to 25 grams of water.
The temperature of the water rises from 5°
to 25° C. To find the temperature of the
furnace.
The heat lost by the metal is equal to that
gained by the
water and vessel. Equating these,
100 × 0·1 × (t0 - 25) = (475 +
25) × (25 - 5)
from which t0 = 1025° C.
The above calculation, which applies generally to this method,
depends for its accuracy upon a correct knowledge of the specific
heat of the metal used. This value is far from constant, increasing as
the temperature rises, and the result will only be correct when the
average value over a given range is known.
The metal used in the experiment should not oxidise readily, and
should possess a high melting point. Platinum is most suitable, but
the cost of a piece sufficiently large would considerably exceed that
of a thermo-electric or other outfit. Nickel is next best in these
respects, and is now generally used for the calorimetric method, up
to 1000° C. The specific heat varies to some extent in different
specimens, but can be determined for the ranges involved in
practical use. This may be done by heating a given weight to known
temperatures and plunging into water, the result being obtained as
in the foregoing example, t0 in this case being known and the
specific heat calculated. From a series of such determinations, a
curve may be plotted connecting specific heat and temperature
range, from which intermediate values may be read off.
Fig. 65.—Specific Heat of Nickel over Ranges from
0° C.
Regnault, who first suggested the calorimetric method for high
temperature measurement, attempted to measure the specific heat
of iron over different ranges, with a view to using this metal in the
process. Owing to the absence of reliable means of determining the
experimental temperatures, however, Regnault’s values were
considerably in error. For the range 0 to 1000° C. he gave the
average specific heat of iron as 0·126, a figure much below the
truth. Thus, if a piece of iron be heated to 970° C., as measured by
the thermo-electric method, and dropped into water, the
temperature calculated from an assumed specific heat of 0·126 will
be found to be 1210°, or 240° too high. The values now employed
are obtained by experiments with a thermo-electric pyrometer, so
that temperatures deduced by the calorimetric method agree, within
the limits of manipulative error, with those of the standard scale. The
accompanying curve, fig. 65, shows the average specific heat of
nickel over all ranges between 0° and 1000° C., and from this curve
the correct figure to use in the calculation for any range may be
determined. Thus for a furnace between 800° and 900° C. the
specific heat would be taken as 0·136; and although the choice of
the value to be taken involves a knowledge of the temperature
within 100°, no difficulty arises in practice, as it is easy to judge this
limit by experience at temperatures below 1000° C In the most
approved forms of calorimetric pyrometers for industrial purposes
the temperature of the hot metal may be read directly from a scale,
prepared in accordance with the value applying to the specific heat
at various ranges.
Copper and iron are still used to a limited extent in these
pyrometers, but lose continuously in weight by oxidation, the scales
of oxide falling off when quenched, necessitating weighing before
each test to ensure accuracy. Nickel oxidises very little below 1000
C., and as the thin film of oxide which forms does not readily peel
off, the weight may increase slightly. Quartz would probably be more
suitable than metals, not being altered by heating and quenching,
but does not appear to have been tried for this purpose. Another
possible material is nichrom, which resists oxidation below 1000° C.
The weight of the solid should be at least 1/20 of that of the water,
in order to ensure a tangible rise in temperature, and the
thermometer should be capable of detecting 1/20 of a degree C. The
rise in temperature should not be so great as to cause the water to
exceed the atmosphere in temperature by more than 4° or 5° C., as
otherwise radiation losses would have a marked effect. The limits of
accuracy of the method will be shown by reference to examples.
Example I.—A piece of nickel, weighing 100
grams, is placed in a furnace, and after
heating dropped into 2000 grams of water
at 10° C., contained in a vessel of water
equivalent 50 grams. The temperature rises
to 16·25° C. The specific heat of nickel for
the range is 0·137. To find the temperature
of the furnace and the limits of accuracy, the
thermometer being readable to 1/20° C.
Equating heat lost by the nickel to that
gained by the water and vessel:—

100 × 0·137 × (x - 16·25) = 2050


× (16·25 - 10·0)
from which x = 952° C.
If the error in each thermometer reading
amounted to 1⁄40° the maximum difference
in the above calculation is obtained by
introducing the altered values as under:—
100 × 0·137 × (x - 16·225) =
2050 × (16·225 - 10·025)
when x = 944° C.
The maximum error due to a possible
incorrect reading of 1⁄40° is therefore less
than 1 per cent.

Example II.—The loss of heat by radiation in


transferring 100 grams of nickel at 927° C.,
possessing a surface of 30 square
centimetres, and with radiating power 0·7 of
a black body, may be shown by the fourth-
power law to be 50 calories per second (see
page 139). If two seconds were occupied in
the transfer, the error from this cause would
be 1 in 130; and adding this to the
thermometric error, the total is less than 2
per cent.
Practical Forms of Calorimetric Pyrometers.
—When required to estimate the temperature of a muffle furnace or
other laboratory appliance, a sheet-copper vessel of about 1500 c.c.
capacity may be used. This should rest on wooden supports in a
second similar vessel, about 2 inches wider, which acts as a shield
against radiation. A cylinder of nickel about 1½ inches long, and 1¼
inches in diameter, with a hole of ½-inch diameter in the centre, is
suitable for test purposes. This may conveniently be heated in a
nickel crucible; and when transferring to the water the crucible may
be grasped with a pair of tongs, and tilted so as to allow the cylinder
to drop into the water. When used in a tube furnace, a length of thin
nickel wire may be attached to the cylinder to enable withdrawal to
be accomplished rapidly, allowance being made for the weight of the
heated wire. The transfer should be accomplished as speedily as
possible, to avoid radiation errors. The figure to be used to represent
the specific heat of nickel may be obtained from the curve (fig. 65),
when the range to be measured is approximately known. The water
equivalent of the vessel and thermometer should be determined as
follows:—Place in the vessel one-half the quantity of cold water used
in the experiment—say 750 c.c.—and note the temperature (t1) after
stirring with the thermometer. Then add an equal quantity of water
at a temperature (t2) about 10° higher than t1 Mix thoroughly with
the thermometer, and note the temperature of the mixture (t3).
Check results may be obtained by varying the proportions of cold
and warm water, the total quantity always being equal to that used
for quenching the hot nickel. If W1 = the weight of cold water, and
W2 that of the warm, the water equivalent (x) is obtained from the
equation
W2 (t2 - t3) - W1(t3 - t1)
x = ———————————— .
t3 - t1
This figure represents the weight of water equal in thermal
capacity to the vessel, and in a pyrometric measurement is added to
the weight of water taken.
In industrial practice, it is desirable to dispense, if possible, with
the necessity for calculations, so that a reading may be taken by an
unskilled observer. The earliest form of calorimetric pyrometer,
patented by Byström in 1862, consisted of a lagged zinc vessel into
which a piece of platinum was dropped, and a table was provided
from which the temperature of the furnace could be read by noting
the rise in temperature of the water. The modern industrial form,
made by Messrs Siemens, will now be described.

Siemens’ Calorimetric or “Water”


Pyrometer.—Fig. 66 shows this instrument in longitudinal and
transverse section. It consists of a double copper vessel, the inner
containing water, and the outer provided with a handle. The space
between is lagged with felt, to prevent escape of heat from the
water. The thermometer, b, is protected by a perforated brass tube
from damage that might be caused on dropping in the hollow nickel
cylinder, d. Opposite the stem of the thermometer is placed a
sliding-piece c, on which a temperature scale is marked. In using the
instrument, the specified quantity of
water is placed in the inner vessel, and
the pointer on c brought opposite to the
top of the mercury column in the
thermometer. The nickel cylinder, which
has been heated in a crucible or muffle
in the furnace, is then dropped in, and
the vessel shaken to secure an equal
temperature throughout the water.
When the thermometer is stationary,
the mark on c opposite the top of the
mercury gives the temperature of the
furnace, the scale on c having
Fig. 66.—Siemens’ previously been marked from
calculations made for each 50 degrees.
Calorimetric or
The correctness of the reading evidently
“Water” Pyrometer. depends upon the accuracy with which
c has been calibrated, an operation
which involves taking into account the
water equivalent of the vessel and the variation of the specific heat
of nickel at different temperatures. Allowing for the sources of error
attaching to the method, results by this pyrometer cannot be
guaranteed to better than 2 or 3 per cent, at 900° or 1000° C., but
in cases where this degree of inaccuracy is not of importance, the
instrument may be used with advantage. As no calculation is
necessary, the determination may be made in the workshop by any
workman who exercises care in conducting the operation. Copper
and iron cylinders are sometimes supplied instead of nickel, but are
not to be recommended, as they decrease in weight with each test,
and necessitate the use of a multiplying factor to convert the reading
on c into the true temperature.

Special Uses of Calorimetric Pyrometers.—The


great drawback to the calorimetric method is that each observation
necessitates a separate experiment, involving time and labour. The
accuracy, moreover, is not comparable with that obtainable by the
use of a thermo-electric or resistance pyrometer; and practically the
only recommendation is the low initial cost of the outfit. When an
occasional reading of temperature, true to 3 per cent., suffices, the
calorimetric pyrometer may be used; and in special laboratory
determinations the method will frequently be found of value.
Considering the low cost of thermo-electric pyrometers at the
present time, it is probable that the calorimetric method will be
entirely superseded in industrial practice, as the former method
gives a continuous, automatic reading, and is capable of furnishing
records. Many firms have already replaced their “water” pyrometers
by the more accurate and useful appliances now available.
CHAPTER VIII
FUSION PYROMETERS
General Principles.—If a number of solids, possessing
progressive melting points, be placed in a furnace and afterwards
withdrawn, some may be observed to have undergone fusion whilst
others would be unaffected. The temperature of the furnace would
then be known to be higher than that of the melting point of the last
solid melted, and lower than that of the first which remained intact.
Taking, for example, a series of salts, the following might be used:-

Melting Point.
Salt.
Deg. Cent. Deg. Fahr.
1 molecule common salt + 650 1202
1 molecule potassium chloride
Common salt 800 1472
Anhydrous sodium carbonate 850 1562
” ” sulphate 900 1652
Sodium plumbate 1000 1832
Anhydrous potassium sulphate 1070 1958
” magnesium sulphate 1150 2102

If, on inspection, it were found that the sodium sulphate had


melted, whilst the sodium plumbate had survived, the temperature
of the furnace would be known to lie between 900° C. and 1000° C.
If a number of salts or other solids could be found with melting
points ranging between 900° and 1000°, it would be possible to
obtain a reading within narrower limits. The accuracy of the method
in all cases is decided by the interval between the melting points of
successive test materials.
Wedgwood, the famous potter, appears to have been the first to
apply this method of determining the condition of a furnace, his test-
pieces consisting of special clay compositions. The effect of the
furnace on these was noted, and the suitability of the temperature
for the work in hand deduced from the observations. Wedgwood in
this manner investigated the variations in temperature at different
levels in his firing-kilns, and was thus enabled to place the various
wares at the positions best suited for their successful firing. Modern
potters still use such test-pieces, as the information gained is not
merely the degree of heat, but the effect of such heat on the articles
undergoing firing. The fusion method, however, is now used to
determine the temperature of all kinds of furnaces, and the chief
modifications will now be described.

Fig. 67.—Seger Pyramids or “Cones.”


Seger Pyramids or “Cones.”—Seger, of Berlin,
published in 1886 an investigation dealing with the production of
silicates of progressive melting points. By varying the composition,
he was able to produce a series of materials with melting points
ranging from 1890° C. to 590° C., the interval between successive
compositions being 20° between 1890° and 950°, and 30° from the
latter temperature to 590°. The highest member of the series has
the composition Al2O3, SiO2; and the lowest member 2SiO2, B2O3.
For convenience in use the materials are made in the form of
triangular pyramids, 5 cms. in height, and each side of the base 1·5
cms. long. Each pyramid is stamped with a distinguishing number,
and altogether 60 are made to cover the range 1890° to 590°. When
conducting a test, several pyramids are selected with melting points
known to be near the temperature of the furnace, as discovered by
previous trials. These are inserted in the furnace standing on a slab
of refractory material, as in fig. 67, and may be watched through a
sight-hole or withdrawn from the furnace for examination after
attaining the existing temperature. If the right pyramids have been
chosen, the appearance presented will be as in fig. 67, in which D is
seen to have collapsed completely, C has bent over, B has been
rounded at the top, whilst A is intact. The temperature of the
furnace is then taken to correspond to the melting point of C, which
is found by reference to a table in which the melting points
corresponding to the different distinguishing numbers are given. The
pyramids are extremely cheap, and only those with melting points
near to the working temperature need be purchased. In cases where
it is desired to increase the heat to a specified point, and then to
allow the furnace to cool, these pyramids fulfil all requirements; an
examination through a sight-hole closed with darkened glass
enabling the furnace attendant to discover when the requisite
temperature has been attained. The procedure is more difficult when
it is desired to maintain a steady temperature, as this involves
frequent renewal of pyramids already melted. These appliances are
sold under the name of Seger “cones,” the latter word being
evidently a misnomer.

Watkin’s Heat Recorder.—This arrangement consists


of a small block of fireclay, having a number of cylindrical holes in its
upper face. Pellets of materials of progressive melting points are
placed in the holes, in which they fit loosely. The block is placed in
the furnace, and afterwards withdrawn and examined, when those
which have completely melted will be seen to have sunk, and to
possess a concave surface; others which have been superficially
fused, will show rounded edges, whilst others will be intact. The
melting point of the highest member of the series which is observed
to have rounded edges is taken as the temperature of the furnace.
The materials used in the manufacture of the pellets are
approximately the same as those employed by Seger, being the
same in number (60), and differing progressively by similar intervals.
It is not evident that the method of observation is superior to the
use of pyramids, although some workers may prefer it, and the
arrangement is merely an alternative plan of using the Seger
compositions. Watkin has also introduced a modification in which
straight bars of clay compositions are supported at the edges, the
temperature being deduced by observing which numbers melt,
droop, or remain intact.

“Sentinel” Pyrometers.—Under this name, Brearley, of


Sheffield, has introduced a number of compositions, chiefly of salts,
which possess definite melting points. These are made in the shape
of cylinders, about 1 inch long and ¾ inch in diameter, which
collapse completely when the melting point is attained. Compositions
have been found which melt at certain temperatures known to give
the best results in the treatment of different kinds of steel, and a
cylinder of correct melting point, placed in the furnace on a small
dish near to the steel, furnishes a simple and correct clue to the
attainment of the desired temperature. The existing condition of a
furnace may be discovered by taking a number of cylinders, having
progressive melting points, and making observations after the
manner described under the heading of Seger pyramids. A few
“Sentinel” cylinders are frequently of use in the workshop or
laboratory for other purposes, such as a rapid check of a given
temperature in confirmation of the reading of an indicating
pyrometer, or in discovering whether a certain temperature has been
exceeded in a given case. “Sentinel” cylinders have been used in
such a manner as to give audible warning of the attainment of a
given temperature by means of a metal rod, which is made to rest
on the cylinder, and which, when the cylinder melts, falls and
completes the circuit of an electric bell. The upper range attainable
by the use of ordinary metallic salts is not so great as in the case of
silicates, but up to 1100° C. metallic sulphates, chlorides, etc., or
mixtures of these, give results quite as good as those obtained with
Seger pyramids.

Stone’s Pyrometer.—This instrument is intended to


indicate the correct temperature at which a metal or alloy should be
poured, and consists of a silica tube at the bottom of which is placed
an alloy melting at the temperature at which the material operated
on should be poured. A silica rod rests on this alloy, and is connected
at its upper end to an iron extension, the extremity of which
engages a pointer moving over a scale. When the alloy in the silica
tube melts, the rod falls through the molten mass and moves the
pointer over the scale, thus giving a certain indication that the
desired temperature has been attained. Arrangements exist for
adjusting the pointer to zero at the commencement of a test.
Fusible Metals.—Instead of clays or salts, a number of
metals and alloys are sometimes used. These are placed in the form
of short rods in numbered holes in a piece of firebrick and inserted
in the furnace, and on withdrawal those which have undergone
fusion will be seen to have taken the form of the holes in which they
were placed. The temperature of the furnace is considered to lie
between the melting points of the last of the series to undergo
fusion and the first which remains unchanged. A series of metals of
this description is more costly than clays or salts, but is more rapid
in action, owing to the superior conductivity of metals.

Fusible Pastes.—These consist of salts incorporated with


vaseline or other suitable fat, and are used to detect the attainment
of a specified temperature by a piece of metal. If, for example, it
were desired to heat a piece of steel to 800° C. for a given purpose,
a paste containing common salt might be smeared on its surface
before placing in the furnace. On heating, the vaseline burns away,
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebooknice.com

You might also like