0% found this document useful (0 votes)
8 views

Lesson 1 EDA

The document discusses data analysis which involves 6 steps: data requirement gathering, data collection, data cleaning, data analysis, data interpretation, and data visualization. It also discusses different measurement scales used in data analysis.

Uploaded by

2022-205418
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lesson 1 EDA

The document discusses data analysis which involves 6 steps: data requirement gathering, data collection, data cleaning, data analysis, data interpretation, and data visualization. It also discusses different measurement scales used in data analysis.

Uploaded by

2022-205418
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

 is a process of inspecting, cleansing, transforming and modeling data with the goal of

discovering useful information, informing conclusion and supporting decision-making,


(Wikipedia)

1. Data Requirement Gathering


 decide what to analyze and how to measure it
2. Data Collection
 collect your data based on requirements
3. Data Cleaning
 data should be cleaned and error free
4. Data Analysis

5. Data Interpretation
 Express or communicate your data analysis

6. Data Visualization
 They often appear in the form of charts and graphs

D A
Qualitative D A M

D A
D A

 Nominal Scale
 Ordinal Scale
 Interval Scale
 Ratio Scale
 NOMINAL SCALE

Examples:

2–

A–
 ORDINAL SCALE
M
E
Examples:
A
S
U
R
E
M
E S
N C
T A
L
E
S
M O
E R
A D
S I
U N
R A
E L
M
E S
S
N C
C
T A
A
L
L
E
E
S
M  INTERVAL SCALE
E
A Examples:
S
U
R
E
M
E S
N C
T A
L
E
S
M  RATIO SCALE
E
A Examples:
S
U
R
E
M
E S
N C
T A
L
E
S
n = sample

e = marginal error

Marginal error is the probability of committing a mistake


n=?
SIMPLE
RANDOM CONVENIENCE
SAMPLING SAMPLING

CLUSTER PURPOSIVE
SAMPLING SAMPLING

PROBABILITY SAMPLING NON-PROBABILITY


SAMPLING SAMPLING
METHOD METHODS
METHOD
SNOWBALL
SYSTEMATIC SAMPLING
SAMPLING

STRATIFIED QUOTA
RANDOM SAMPLING
SAMPLING
m, random picker, etc.
b. Cluster Sampling . The process of randomly selecting intact groups,
not individuals, with the de ined population sharing similar
characteristics

e sample?

Therefore, from the population, take the


elements tagged as the 4th, 8th, 12th, 16th,…
d
d
1. Direct or Interview – researcher prepares a set of questions
and respondents will answer verbally and directly.
2. Indirect or Questionnaire – researcher prepares a well- planned,
written questions.

C
O
L
L
E
C
T
I
O
N
C
O
L
L
E
C
T
I
O
N
C  Determine the goal of your survey S
O  Identify the sample population U
L R
L  Choose an interviewing method V
E  Decide what questions you will ask in what order, and E
C how to phrase them. Y
T
 Conduct the interview and collect the information.
I
O
N
We want to construct a survey that shows which sports students at
your school like to play the most.
C S
O (a) List the goal of the survey
U
L R
L  The goal of the survey is to ind the answer to the question: V
E “Which sports do students at your school like to play the most?” E
C Y
(b) What population sample should we interview?
T
I  A sample of the population would include a random sample of the
O student population in your school. A good strategy would be to
N randomly select students
(c) How should you administer the survey?

 Face-to-face interviews are a good choice in this case. Interviews


C S
will be easy to conduct since the survey consists of only one
O question which can be quickly answered and recorded, and asking U
L the question face to face will help eliminate non-response bias. R
L V
E (d) Create a data collection sheet that she can use to record her results
E
C Y
T  In order to collect the data to this simple survey you can design a data
I collection sheet .
O
N
C S
O U
L R
L V
E E
C Y
T
I
O (e) Display, analyze and present the data
N
FOCUS GROUP DISCUSSION

FGD is an in-depth ield method that brings together a


C small homogeneous group (usually six to twelve persons) to
O discuss topics on a study agenda.
L
General Principles to consider:
L
E
C  Standardization of questions
T The number of focus groups conducted
I Number of participants in a group
O Level of moderator involvement
N
3. Registration – document and records

C
O
4. Observation – collects information on the characteristics of the
L units under study by actual measurement.
L
E
C
5. Experimental Method – makes trials and tests; it is used to
T describe any process that generates a set of data
I
O
N
C D
O
L
L O
E
 Design of Experiments (DOE), is a tool to develop an
C experimentation strategy that maximizes learning using a
T minimum resources. E
I
O
N
C D
O
L
L O
E
C
T E
I
O
N
1. PLANNING
 carefully plan for the course of experimentation before
embarking upon the process of testing and data collection.
C D
O A team composed of individuals from different disciplines related to the
L product or process should be used to identify possible factors to
L investigate and the most appropriate response(s) to measure. O
E
C
T “Carefully planned experiments always E
I lead to increased understanding of the
O product or process.”
N
C D
O
L
L O
E 3. OPTIMIZATION
C
T  determine the best setting of these factors to achieve the desired E
I objective.
O
N
C D
O
L
L O
5. VERIFICATION
E
C
T  validation of the best settings by conducting a few follow-up E
experimental runs to con irm that the process functions as desired
I and all objectives are met.
O
N
C D
O
L
L O
E
C
T E
I
O
N
Presentation of Data

1. Textual – Data gathered are presented in paragraph form

Example:

Of the 150 sample interviewed, the following


complaints were noted: 27 for lack of books in the
library, 25 for a dirty cafeteria, 20 for lack of
laboratory equipment, 17 for a not well
maintained university building.
2. Tabular – using a statistical table where data is systematically
organized in columns and rows

Parts of a statistical table

Title Body
Heading Footnotes
Stubs Source Notes
Box head
Example: 20 applicants were given a performance
evaluation appraisal.

The data set is:


Performance Evaluation Appraisal
3. GRAPHICAL – using a statistical table where data is systematically
organized in columns and rows

Types
Bar Graph
Pie or Circle Graph
Line Graph
Pictograph
Etc.
Bar Graph:

Selected Causes of Death in the Philippines:


Pie Chart:

Three Leading Causes of Child Mortality


Among Filipinos Ages 5- 9
Line Graph:

Distribution of Enrolment at a Day Care, 1999-2006


Pictogram:
Number of
Persons Who
have Excessive
Depression by
Cluster
Frequency Distribution Table

Frequency tells you how often something happened. The frequency


of an observation tells you the number of times the observation
occurs in the data.

Example: Let’s say you did a survey on number of households


to ind out how many pets they own

3, 0, 1, 4, 4, 1, 2, 0, 2, 1
2, 0, 2, 0, 1, 3, 1, 2, 1, 3
Step 1: Construct the table. Write the Categories:
3, 0, 1, 4, 4, 1, 2, 0, 2, 1, 2, 0, 2, 0, 1, 3, 1, 2, 1, 3
Step 2: Tally the numbers (raw data)

0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4
Step 3: Write the data into numerical frequencies
Step 4: Determine the percentage

n=20
Frequency of
the class
Percentage Formula:

Percentage Total number


of values
GROUPED Frequency Distribution Table

Construction of a Grouped Frequency Distribution Table

Example:

Jake measured the lengths of leaves from a certain tree (to the nearest cm):

9,16,13,7, 8,4,18,10,17, 18,9,12,5,9, 9,16,18,17,


1,10,5,9,11, 15,6,14,9,1, 12,5,16,4,16, 8,15,14,17
Construction of a Grouped Frequency Distribution Table

Step 1: Put the numbers in order, then ind the smallest and largest
values in your data. Calculate the Range.

1, 1, 1, 4, 4, 5, 5, 5, 6, 7, 8 ,8 ,8 ,9 ,9 ,9 ,9 ,9 ,9 ,10, 10, 11, 12,


12, 13, 14, 14, 15, 15, 16, 16, 16, 16, 17, 17, 17, 18, 18

Range = largest value – smallest value

Example: Range =18 cm – 1 cm = 17 cm


Step 2: Calculate the approximate number of classes K.

K = 1 + 3.322 Log N , N is the number of values

Example: K = 1 + 3.322 Log 38 = 6.24 = 6

Step 3: Determine the class size C.

C = R/K
Example: C = 17/6.33 = 2.72 = 3
Step 4:

Starting at 1 with a class size


of 3 we get:
1, 4, 7, 10, 13, 16

Write down the groups,


include the end value of each
group (must be less than the
next group)
Step 5: Write the frequency for each group

1,1,1,4,4,5,5,5,6,7,8,8,8,
9,9,9,9,9,9,10,10,11,12,
12,13,14,14,15,15,16,
16,16,16,17,17,17,18,18
Step 6: Write the relative frequencies (rf)

10  38 ≈ 0.2631
Step 7 : Write the percentage (%f)

(10  38) x 100%


≈ 26.31
Step 8 : Determine the cumulative frequencies (cf)

= 3+6 = 9
= 9 + 10 = 19
Step 10: Class boundaries (Real Limits), less than cumulative frequency,
greater then cumulative frequency
Step 11: Present the distribution graphically

 Frequency polygon (x, f) – line graph


 Histogram
 Frequency Polygon Superimposed on a histogram
 Ogive – cumulative frequencies (LCB, CF)

You might also like