0% found this document useful (0 votes)
4 views

Coding Final Study Guide Notes

Uploaded by

antadiiagne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Coding Final Study Guide Notes

Uploaded by

antadiiagne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Lecture 5: Stats & Probability Lecture 7: Hypothesis Testing

Population vs Sample Central Limit Theorem


population: all possible values that could’ve been collected Distro of sample mean as sample size increases → approaches normal
sample: each singular data point actually collected Small N: sampling distro resembles original pop distro
rand num gen: pop= range of values that could’ve been, Moderate N (8): distro smooths, clusters toward true pop.mean (bell)
sample =values gen Large N >30: distro approaches normal
Calculate Stats & Discuss their Meaning Distro of raw data → approaches original pop distro
if np.mean & np.median = similar → distribution is not skewed Drawing Random Samples
np.std(name, ddof=1): measurements +/- std away from mean
range: np.max() - np.min() if large relative to mean → outliers
scipystats.mode: helpful if data = discrete values, unhelpful if
data= decimaled Manipulating Random Sample
scipystats.skew: negative means tail to left, positive =tail to right Np.random.rand(N): draws from uniform distro with default interval [0, 1]
scipystats.kurtosis(name, fisher=False): 3 = normal, <3 = flatter 0.5 * np.random.rand(N): multiply by decimal make interval smaller [0, 0.5]
(platykurtic), >3 =peaked (leptokurtic) 6.0 + np.random.rand(N): add a number shifts interval [6, 7]​
Plotting Histogram w/ Correct Bins Calculate Bounds for 99% Confidence Interval:​

Occurrence Probability for Theoretical Distros:


Prob that sample from norm distro w/mean 6.5 will be > than
5.5:

Performing Hypothesis Test for 2 : comparing 2 slices within dataset

Sampling Distribution, Sample Size & Number of Samples:​


Population distr: total set of measurements ​
Sample distr of sample mean: distr of means collected from
diff samples​
Number of Samples = # sets of data → increasing will make
distro converge at normal, no effect on mean​
Sample size = # of measurements w/in each set → increasing
will make sample distro narrower & decrease uncertainty of
mean SEM = sigma/sqrt(n)​
Practice Problems:
select data along specific coordinate values →sel()​
timeseries = temp.mean(dim=('lon','lat'))
Best way to select data at specific lon & lat:
ds.temperature.sel(lat=34.05, lon=-118.25, method="nearest")​
plot time-averaged spatial heatmap using temp variable from ds:
ds.temperature.mean(dim="time").plot()
“The t-stat x > the crit value y at a 90% significance level. At this sig level,
ds = xr.open_dataset(“path”) we reject the null hypothesis that noon mean pH is similar or < in the
morning and adopt the alt hypo that pH > in the afternoon”
Lecture 6: Time Series Analysis​ Lecture 7: Hypothesis Testing Continued
Fitting Polynomial Functions to Data:​ SubPlot Sample Distr of Sample Mean @ Sample Sizes:


Overfitting: model too complex & captures noise → poor generalization
to new data.​
Underfitting: model too simple & fails to capture true pattern

Linear Interpolation:


easy to implement & no extreme oscillations, use on sparse data points​
Spline Interpolation:​

Lecture 8: Multi-Dimensional Data Analysis

Same as linear, add cubic argument to 3rd code line​
Use when data has natural continuous variation & need smooth curve

Global Fit & Applied to a Value:​


Extrapolation:​
interp.interp1d(x, y, bounds_error=False, fille_value=”extrapolate”​
How Polynomial Functions Fit Data to Curves: (LSR)​
1 specify function form (polynomial, exponential, constant)​
2 guess initial values for constants in function​
3 define squared error residual metric quantifying mismatch between
observed data & current function values​
4 use algorithm to change coefficient values to minimize error metric→
finds least-square solution best fitting data ​
Quality of Functional Fit Quality: ​
improves when quantity of data points increases or noise decreases​
Higher order fits have extreme oscillations between data points, even if
data seems perfectly matched by a higher order fit → default is to
choose SIMPLEST fit matching data → less prone to high frequency
oscillations​ Using Xarray.plot(), .contour, etc.
Calculate Correlation Coefficient between Datasets:​


always linear relationship, >0.7 strong, 0.3-0.7 moderate, <0.3 weak​
2 independent datasets can still have strong correlation, indicating they
are impacted by a common 3rd variable ​
Other​
Ddof: If pop std → Ddof = 1/n, if sample std → Ddof = 1/(n-1)​
-matrices in format (#rows, #columns)
Calculating Degrees of Freedom
For confidence interval→ dof = n-1
For 2-sample t-test→ dof =n1​+n2​−2

You might also like