CaseStudy1 (2)
CaseStudy1 (2)
March 1, 2024
Data
Model (with parameters) that fits our data = A function
A notion of "Goodness" we want to achieve
Note that it is very important to set the benchmark for the measure
of performance we want to try and capture from the beginning
Small Percentiles
Real life example: Child weight
Bivariate analysis
- empirical relationship of two variables
- statistical significance
- depending on the type of the fearures involved
Missing Values
Are the missing values completely at random or not?
delete observations- if large number of observations still available
Note that if a feature has lots of missing values, you might consider deleting the
feature, not lots of rows
imputation methods
▶ mean/mode for cont/categ. variables
▶ predictions
Feature engineering
using domain knowledge to extract new variables from raw data
Aims:
make machine learning algorithms work more effectively and/or
provide deeper insights.
Data trends
Multimodality
Stratified sampling
Pit Falls
Is the data really representative? Do the results generalize?
Data Literacy
is becoming more and more important across various Jobs
Asking the right questions is as important as answering them.
Data Literacy
is becoming more and more important across various Jobs
Asking the right questions is as important as answering them.
Data Literacy
is becoming more and more important across various Jobs
Asking the right questions is as important as answering them.
Confounding variables
Example
eating ice cream and getting sunburned
Confounding variables
Example
eating ice cream and getting sunburned
There’s a correlation between eating ice cream and getting sunburned, but
neither event actually causes the other. Instead, both events are caused by
something else—sunny weather