Data Science 1 2023 - Lecture 02 - Mathematical Preliminaries and Correlation
Data Science 1 2023 - Lecture 02 - Mathematical Preliminaries and Correlation
Data Science 1:
Supervised Learning
Errors & Artifacts
Introduction to Data Science
Correlation
Variance
Gradient Descent
Sampling
Data Bias
Probability
Probability, Statistics &
Significance
Skew
Precision
Correlation
Classification
F-Score Recall
Data Science 1:
Supervised Learning
Errors & Artifacts
Introduction to Data Science
Correlation
Variance
Gradient Descent
Sampling
Data Bias
Probability
Probability, Statistics &
Significance
Skew
Precision
Correlation
Classification
F-Score Recall
Probability
Probability theory provides a formal framework
for reasoning about the likelihood of events.
The probability p(s) of an outcome s satisfies:
● 0 <= p(s) <= 1
Conditional Probability
The conditional probability P(A|B) is defined:
Bayes Theorem
Bayes theorem is an important tool which
reverses the direction of the dependences:
?
1 1
∙
= 2 2=1∙1∙4=1
3 2 2 3 3
4
8
(q.e.d.) 😎
9
Probability/Cumulative Distributions
The cdf is the running sum of the pdf:
Descriptive Statistics
Descriptive statistics provides ways to capture
the properties of a given data set / sample.
● Central tendency measures describe the
center around the data is distributed.
● Variation or variability measures describe
data spread, i.e. how far the measurements
lie from the center.
16
Parameterizing Distributions
Regardless of how data is distributed, at least
1
(1 − 2 )th of the points must lie within k sigma
𝑘
of the mean (Chebyshev's inequality).
Thus at least 75% must lie within two sigma of
the mean.
Even tighter bounds apply for normal
distributions.
24
Data Science 1:
Supervised Learning
Errors & Artifacts
Introduction to Data Science
Correlation
Variance
Gradient Descent
Sampling
Data Bias
Probability
Probability, Statistics &
Significance
Skew
Precision
Correlation
Classification
F-Score Recall
Correlation Analysis
Two factors are correlated when values of x
has some predictive power on the value of y.
The correlation coefficient of X and Y measures
the degree to which Y is a function of X (and
visa versa).
Correlation ranges from -1 (anti-correlated) to
1 (fully correlated) through 0 (uncorrelated).
29
Interpreting Correlations: r²
The square of the sample correlation coefficient
r2 estimates the fraction of the variance in Y
explained by X in a simple linear regression.
Thus the predictive value of a correlation
decreases quadratically with r.
The correlation between height and weight
is approximately 0.8, meaning it explains
about ⅔ of the variance.
33
Generally speaking,
1-r² = V(r) / V(y)
Here r = 0.94,
explaining 88.4% of
V(y).
34
Interpreting Correlations: r²
Logarithms
The logarithm is the inverse exponential
function, i.e.
We will use them here for reasons different
than in algorithms courses:
Summing logs of probabilities is more
numerically stable than multiplying them:
44