Chapter 4
Chapter 4
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N
Maggie Matsui
Content Developer, DataCamp
Relationships between two variables
x = explanatory/independent variable
y = response/dependent variable
0.751755
msleep['sleep_rem'].corr(msleep['sleep_total'])
0.751755
x̄ = mean of x
σx = standard deviation of x
n
(xi − x̄)(yi − ȳ )
r=∑
σx × σy
i=1
Maggie Matsui
Content Developer, DataCamp
Non-linear relationships
r = 0.18
df['x'].corr(df['y'])
0.081094
0.3119801
sns.lmplot(x='log_bodywt',
y='awake',
data=msleep,
ci=None)
plt.show()
msleep['log_bodywt'].corr(msleep['awake'])
0.5687943
Reciprocal transformation ( 1 / x )
sqrt(x) and 1 / y
Linear regression
Maggie Matsui
Content Developer, DataCamp
Vocabulary
Experiment aims to answer: What is the effect of the treatment on the response?
Treatment: advertisement
Placebo
Resembles treatment, but has no effect
In clinical trials, a sugar pill ensures that the effect of the drug is actually due to the drug
itself and not the idea of receiving the drug
There are ways to control for confounders to get more reliable conclusions about
association
Maggie Matsui
Content Developer, DataCamp
Overview
Chapter 1 Chapter 2
What is statistics? Measuring chance
Chapter 3 Chapter 4
Normal distribution Correlation