Coursework 5 - Web
Coursework 5 - Web
Luis Da Silva∗
1 Introduction
In this coursework we’re going to explore the concept of Gaussian Processes
(GPs) and apply them to sea ice cover on the northern polar region data to
investigate whether there is evidence for global warming or not. Thus, our
main objectives are:
• Discuss whether there is a change in sea ice cover between 1990 and
2004.
2 Gaussian Processes
By Gaussian Process we refer to an stochastic process in which we assume
that the vector of targets comes from a multivariate normal distribution, thus
∗
More info on: http://luisdasilva.me
1
we don’t choose a parametric function but a mean vector (µ) and covariance
matrix (C) [4]. That is a generalization of the normal probability distribution
in the sense that, to get a proper sense of how the data is connected (or about
the underlying generative function) we don’t need to compute the infinite
possible functions, but just its properties [1].
As we have to choose a mean and covariance function to be able to com-
pute the GP, this decision is call a GP Prior, in which we reflect our current
knowledge (or beliefs) about the data. A (maybe) good choice for the mean
function is µ(Xn ) = 0, which reflects no previous knowledge on the data. For
the covariance function, a popular choice is to use the Radial Basis Function.
A Radial Basis Function (RBF) is one that uses a distance function (usually
Euclidean distance) to return a distance from a predefined point (usually
origin)[6]. We can build the covariance matrix needed by implementing:
2
c(xn , xm ) = αe−γ(xn −xm ) (1)
1
Or alternatively, by making α = A2 and γ = 2 we could rewrite equa-
2l
tion 1 as:
(xn −xm )2
kRBF (xn , xm ) = A2 e− 2l2 (2)
3 Data
Global warming is everywhere these days since it seems to be a very harm-
ful process the Earth is going through. Worst part is that ’Most climate
scientists agree the main cause of the current global warming trend is hu-
man expansion of the ”greenhouse effect”’ [2]. Also, NASA claims there are
already observable effects of the phenomenon, let’s dig deeper into it.
2
The website https://www.kaggle.com has data available on sea ice cover
on the northern polar region. The dataset we’re working on consists of 6
variables: Day, Month, Decade, Mean.Extent and Var.Extent; and 48 obser-
vations: 2 per month for 1990 and 2004.
# Fit model
gp . fit (x , y )
3
fitted_kernel = gp . kernel_
fitted_params = fitted_kernel . get_params ()
A_list [ decade ] = ( math . sqrt ( fitted_params [ " k1_ _c on st an t_ va lu e " ]))
L_list [ decade ] = ( fitted_params [ " k2__length_scale " ])
# Get samples
y_samples = gp . sample_y ( x [: np . newaxis ] , n_samples )
minima [ decade ] = np . array ([ min ( sample ) for sample in
np . transpose ( y_samples )])
# Plot
date_col_vec = x_dates [: , np . newaxis ]
y_mean , y_std = gp . predict ( date_col_vec ,
return_std = True )
pdf = PdfPages ( ’ Coursework /{} _samples . pdf ’. format ( str ( decade )))
pdf . savefig ( fig )
pdf . close ()
4
Figure 1: GP models per decade
16 1990
2004
14
12
Ice extention
10
5
Figure 2: Ice minima distribution
KDE 1990
KDE 2004
Histogram 1990
2.0 Histogram 2004
1.5
Density
1.0
0.5
0.0
4.5 5.0 5.5 6.0 6.5 7.0 7.5
Ice minima distribution
6
sp . stats . ttest_ind ( minima [1990] , minima [2004] ,
equal_var = False )
# PDF
pdf = PdfPages ( ’ Coursework / dif_minima_dist . pdf ’)
pdf . savefig ( fig )
pdf . close ()
While the meaning of parameter A and l might seem fussy, we could interpret
them as a measure of how flexible our function is. A is usually taken from
sample standard deviation and will then measure how far one would expect
7
Figure 3: Minima difference distribution with 95% confidence intervals
2500
2000
1500
Counts
1000
500
0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Ice difference between 1990 and 2004
our functions to get away from the mean; thus, the higher this parameter is,
functions becomes more ’explosive’. On the other hand, l can be thought of
as a measure of how many constraints will the function has, thus lowering
its value will lead to a more complex function.
The whole point of fitting these parameters is to maximize likelihood
of our data being observed from final function. Having that objective in
mind, figure 4a tries to give us insight into how log likelihood changes as we
change parameters A and l. As this 3D graph is difficult to visualise on a 2D
screen/sheet, figure 4b gives us a representation of how likelihood changes as
we move parameter A given a fixed value for l (chosen such that optimizes
likelihood); black dot represents likelihood maximum. Similarly, figure 4c
shows the effect of parameter l on likelihood given a fixed value for A.
4 Full Code
import pandas as pd
import numpy as np
8
Figure 4: A and l effect on likelihood
200
400 250
600
800 500
1000
1200 750
1400
1000
200
150
5 100
10 50 l
A 15 0
20
50
100
Log Likelihood
150
200
20
30
40
50
Log Likelihood
60
70
80
90
9
import scipy as sp
from sklearn . gaussian_process import G a u s s i a n P r o c e s s R e g r e s s o r
from sklearn . gaussian_process . kernels import RBF
from matplotlib import cm
import matplotlib . pyplot as plt
from matplotlib . backends . backend_pdf import PdfPages
import matplotlib . patches as mpatches
from mpl_toolkits . mplot3d import Axes3D
import math
import pathlib
10
x_dates = np . linspace (1 , 365 , 365)
# Fit model
gp . fit (x , y )
fitted_kernel = gp . kernel_
fitted_params = fitted_kernel . get_params ()
A_list [ decade ] = ( math . sqrt ( fitted_params [ " k1_ _c on st an t_ va lu e " ]))
L_list [ decade ] = ( fitted_params [ " k2__length_scale " ])
# Get samples
y_samples = gp . sample_y ( x [: np . newaxis ] , n_samples )
minima [ decade ] = np . array ([ min ( sample ) for sample in
np . transpose ( y_samples )])
# Plot
date_col_vec = x_dates [: , np . newaxis ]
y_mean , y_std = gp . predict ( date_col_vec ,
return_std = True )
pdf = PdfPages ( ’ Coursework /{} _samples . pdf ’. format ( str ( decade )))
pdf . savefig ( fig )
11
pdf . close ()
12
plt . xlabel ( ’ Ice minima distribution ’)
plt . ylabel ( ’ Density ’)
plt . legend ()
# PDF
pdf = PdfPages ( ’ Coursework / minima_dist . pdf ’)
pdf . savefig ( fig )
pdf . close ()
# PDF
pdf = PdfPages ( ’ Coursework / minima_dist . pdf ’)
pdf . savefig ( fig )
pdf . close ()
# PDF
pdf = PdfPages ( ’ Coursework / dif_minima_dist . pdf ’)
pdf . savefig ( fig )
pdf . close ()
13
np . array ( dif )/ minima [1990]. mean ())
# ### Loglikelihood
14
fig = plt . figure ( figsize =[8 ,5])
References
[1] Carl E. Rasmussen and Christophet K. Williams. Gaussian Processes for
Machine Learning. First edition, 2006. MIT Press.
[4] Mark Girolami, Simon Rogers. A First Course in Machine Learning. 2nd
Edition, 2017. CRC Press.
15
[5] Dr. Mark Muldoon. Statistics and Machine Learning 1, lecture 5 slides
(dated 29/10/2018)
16