0% found this document useful (0 votes)
28 views39 pages

Lecture 02. Statistics Draft

The document covers fundamentals of data analytics including data terminology, descriptive statistics, probability distributions and sampling. It introduces key concepts such as data types, measures of center and variability, and common probability distributions like binomial, Poisson, normal and explains related concepts like expectation, variance and the central limit theorem.

Uploaded by

dantie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views39 pages

Lecture 02. Statistics Draft

The document covers fundamentals of data analytics including data terminology, descriptive statistics, probability distributions and sampling. It introduces key concepts such as data types, measures of center and variability, and common probability distributions like binomial, Poisson, normal and explains related concepts like expectation, variance and the central limit theorem.

Uploaded by

dantie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Fundamentals of Data Analytics

Lecture 02. Statistics


Instructional Team
Content
➔ Data Terminology & Descriptive Statistics - Measuring
➔ Random Variable & Distributions

2
Data Terminology
Observation Name Age Income Job Title Gender

1 Francis 35 65000 CEO assistant M Observation

2 Christ 32 54500 IT F

3 Jaime 28 43000 PM F

4 Jon 19 21000 Desk Help F

5 Robbin 25 27000 Procurement M

6 Jack 30 47800 DB Admin M

7 Teddy 29 46000 Product F

8 Marshall 31 60700 Sales F

Univariate Bivariate Multivariate


Dataset Dataset Dataset 3
Data Types
Observation Name Age Income Job Title Gender

1 Francis 35 65000 CEO assistant M

2 Christ 32 54500 IT F

3 Jaime 28 43000 PM F

4 Jon 19 21000 Desk Help F

5 Robbin 25 27000 Procurement M

6 Jack 30 47800 DB Admin M

7 Teddy 29 46000 Product F

8 Marshall 31 60700 Sales F

Categorical Discrete Continuous


4
Data Types

5
Measure of Center

6
Measures of Center - Python example

7
Measure of Variability

8
Measure of Variability - Python Example

9
Measure of Skewness

10
Measure of kurtosis

11
Moments

12
Percentiles and Quartiles

13
Boxplot

14
Percentile and Boxplot - Python example

15
Random Variable

Definition
A random variable X is a function or rule (mapping) that assigns a numerical value
X(⍵) to each outcome ⍵ in the sample space of a random experiment.
X: Ω → ℝ

Roughly speaking, a random variable is the output of a random experiment.

Example:
- Output of rolling a dice.

16
Probability (Mass) Function

Definition
Probability Function assigns a probability to an event in the sample space.

Example:
- Probability of a fair coin land on head is 50%.
- Probability of the event that the temperature measured by a particular
thermometer (e.g. the one located in your house) at a given time (e.g. 8:00AM
GMT 12-Jun-2020) is a specific value (e.g. 34oC) is 0.
- Why?

17
PDF and CDF

18
Expectation and Variance

Expectation (Mean value) and Variance we described before are special cases of
uniform discrete distributions.

19
Expectation and Variance in Continuous Case

Some properties:
● E (X + Y) = E (X) + E (Y)
● E (aX) = aE(X)
● Var (X+Y) = Var(X) + Var(Y) + 2Cov(X,Y)
● Var (aX) = a2X

20
Expectation and Variance - Examples

21
Uniform Distribution

22
Uniform Distribution - An example

23
Binomial Distribution

24
Binomial Distribution

25
Binomial Distributions - Shape

26
Binomial Distributions - Python example

27
Poisson Distribution

Poisson Distribution describes the number of occurrences in a random period of


time or space, given that these occurrences are independent to each others.

28
Poisson Distribution

29
Poisson Distribution - Shape

30
Poisson Distribution - Python example

31
Hypergeometric Distribution
Hypergeometric Distribution is similar to Binomial Distribution except that the sampling is
without replacement.

32
Hypergeometric Distribution - Python example

33
Geometric Distribution
Geometric distribution describes the number of experiments required until the first
success.

34
Normal Distribution

35
Normal Distribution

36
Central Limit Theorem

37
Central Limit Theorem

38
That’s it …
… for now

39

You might also like