Coding Final Study Guide Notes
Coding Final Study Guide Notes
Overfitting: model too complex & captures noise → poor generalization
to new data.
Underfitting: model too simple & fails to capture true pattern
Linear Interpolation:
easy to implement & no extreme oscillations, use on sparse data points
Spline Interpolation:
Lecture 8: Multi-Dimensional Data Analysis
Same as linear, add cubic argument to 3rd code line
Use when data has natural continuous variation & need smooth curve
Extrapolation:
interp.interp1d(x, y, bounds_error=False, fille_value=”extrapolate”
How Polynomial Functions Fit Data to Curves: (LSR)
1 specify function form (polynomial, exponential, constant)
2 guess initial values for constants in function
3 define squared error residual metric quantifying mismatch between
observed data & current function values
4 use algorithm to change coefficient values to minimize error metric→
finds least-square solution best fitting data
Quality of Functional Fit Quality:
improves when quantity of data points increases or noise decreases
Higher order fits have extreme oscillations between data points, even if
data seems perfectly matched by a higher order fit → default is to
choose SIMPLEST fit matching data → less prone to high frequency
oscillations Using Xarray.plot(), .contour, etc.
Calculate Correlation Coefficient between Datasets:
always linear relationship, >0.7 strong, 0.3-0.7 moderate, <0.3 weak
2 independent datasets can still have strong correlation, indicating they
are impacted by a common 3rd variable
Other
Ddof: If pop std → Ddof = 1/n, if sample std → Ddof = 1/(n-1)
-matrices in format (#rows, #columns)
Calculating Degrees of Freedom
For confidence interval→ dof = n-1
For 2-sample t-test→ dof =n1+n2−2