0% found this document useful (0 votes)
13 views

Chapter 4

1) Statistics can be divided into descriptive statistics, which describes data, and inferential statistics, which analyzes inferences to make conclusions about populations. Descriptive statistics involves collecting, tabulating, presenting, and summarizing information, while inferential statistics involves analyzing samples to make inferences about populations. 2) A population consists of all elements being studied, while a sample is a portion of the population selected for study. Variables can be quantitative or qualitative, and quantitative variables can be discrete or continuous depending on whether their values are countable or can assume any value. 3) Common sampling methods include random sampling, systematic sampling, stratified sampling, and cluster sampling. Data collection methods include interviews,

Uploaded by

fcpvhbdpb7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Chapter 4

1) Statistics can be divided into descriptive statistics, which describes data, and inferential statistics, which analyzes inferences to make conclusions about populations. Descriptive statistics involves collecting, tabulating, presenting, and summarizing information, while inferential statistics involves analyzing samples to make inferences about populations. 2) A population consists of all elements being studied, while a sample is a portion of the population selected for study. Variables can be quantitative or qualitative, and quantitative variables can be discrete or continuous depending on whether their values are countable or can assume any value. 3) Common sampling methods include random sampling, systematic sampling, stratified sampling, and cluster sampling. Data collection methods include interviews,

Uploaded by

fcpvhbdpb7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

TOPIC 4

DATA ORGANISATION AND DESCRIPTION

4.1 INTRODUCTION

STATISTICS

 a scientific method of collecting, organizing, summarizing, analyzing, interpreting


and data presenting.
 can be divide into two branches:
 Descriptive statistic
 Inferential Statistics

DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS

Technique of Technique of
collecting analyzing
tabulating inferences
presenting
summarizing
information
to make conclusion
about population

to describe data

Population

Sample

DEFINITIONS EXPLANATIONS
Populations A population consists of all elements—individuals, items, or
objects—whose characteristics are being studied.
Sample A portion of the population selected for study is referred to as a
sample
Census (bancian) A survey that includes every member of the population.
Parameter A parameter of population is some quantity that relates to the
population, such as its mean or median.
EXAMPLE 1

Explain whether each of the following constitutes a population or a sample.


a) Pounds of bass caught by all participants in a bass fishing derby
b) Credit card debts of 100 families selected from a city
c) Number of home runs hit by all Major League baseball players in the
2009 season
d) Number of parole violations by all 2147 parolees in a city
e) Amount spent on prescription drugs by 200 senior citizens in a large
city

DATA

DEFINITIONS EXPLANATIONS
Can be count
Discrete Data Examples:
number of houses, cars

Can be obtains through by measuring and the accuracy depends on the


measuring instruments.
Continuous Data
Examples:
length, age, height, weight, time

Raw data that have not been organized numerically


Example:
These are the number of rooms in each of 20 houses in particular town
5 6 6 4 5 4 6 8 2 4
7 8 3 5 4 2 4 8 8 3
From above data also can be formed with the help of tally chart shown
below:
Number of of room in each Tally Number of
Ungrouped Data house house
2 2
3 2
4 5
5 3
6 3
7 1
8 4
DEFINITIONS EXPLANATIONS
We can summarize data above by grouping them into classes.
Example :
Number of rooms Frequency
Grouped Data 2-3 4
4-5 8
6-7 4
8-9 4

VARIABLES

Type of Variable

Quantitative Qualitative or categorical

Continuous Discrete

DEFINITIONS EXPLANATIONS
Variable A variable is a characteristic under study that assumes different values for
different elements. In contrast to a variable, the value of a constant is fixed.
Quantitative A variable that can be measured numerically is called a quantitative
variable.
Discrete A variable whose values are countable is called a discrete variable. In other
words, a discrete variable can assume only certain values with no
intermediate values.
Continuous A variable that can assume any numerical value over a certain interval or
intervals is called a continuous variable.
Qualitative or A variable that cannot assume a numerical value but can be classified into
categorical two or more nonnumeric categories is called a qualitative or categorical
variable. The data collected on such a variable are called qualitative data

EXAMPLE 2:

Indicate which of the following variables are quantitative and which are qualitative. Hence,
classify the quantitative variables as discrete or continuous.
a) Number of typographical errors in newspapers
b) Monthly TV cable bills
c) Spring break locations favored by college students
d) Number of cars owned by families
e) Lottery revenues of states
LEVEL OF MEASUREMENT

DEFINITIONS EXPLANATIONS
The nominal level of measurement classifies data into mutually exclusive
(nonoverlapping) categories in which no order or ranking can be imposed on
Nominal the data.
Example:
- gender, religion, political party, marital status.

The ordinal level of measurement classifies data into categories that can be
ranked; however, precise differences between the ranks do not exist.
Example:
Ordinal - from student evaluations, guest speakers might be ranked as superior,
average, or poor.
- Floats in a homecoming parade might be ranked as first place, second
place, etc.

The interval level of measurement ranks data, and precise differences


between units of measure do exist; however, there is no meaningful zero.
Example:
Interval - There is a meaningful difference of 1 point between an IQ of 109 and an
IQ of 110.
- Temperature is another example of interval measurement, since there is a
meaningful difference of 1F between each unit, such as 72 and 73F.

The ratio level of measurement possesses all the characteristics of interval


measurement, and there exists a true zero. In addition, true ratios exist when
Ratio the same variable is measured on two different members of the population.
Example:
- height, weight, area, and number of phone calls received

EXAMPLE 3:

What level of measurement would be used to measure each variable?


a) The ages of authors who wrote the hardback versions of the top 25
fiction books sold during a specific week
b) The colors of baseball hats sold in a store for a specific year
c) The highest temperature for each day of a specific month
d) The ratings of bands that played in the homecoming parade at a
college
METHODS OF SAMPLING

SAMPLING EXPLANATIONS
METHOD
Random Sampling Subject are selected by random numbers.
Systematic Subject are selected by using every kth number after the first subject is
Sampling randomly selected from 1 through k.
Stratified Sampling Subject are selected by dividing up the population into groups (strata),
and subjects are randomly selected within groups.
Cluster Sampling Subject are selected by using an intact group that is representative of
the population.

DATA COLLECTION

• The next step after the sample is identified and selected by using the appropriate sampling
technique is to determine the best way to reach the respondents in order to obtain the
required data.
• There are several methods of collecting data and each has its own advantages and
disadvantages.
• A researcher must choose the methods that provide the most information at minimum cost.
• The common methods of data collection are as follows:
a) Face-to-face interview (personal interview)
b) Telephone interview
c) Direct questionnaire (questionnaires are distributed and collected personally)
d) Mail or postal questionnaire (questionnaires are sent and received back through the
post)
e) Direct observation (respondents are observed and data recorded)
f) Other methods (e-mail, video recording)

EXAMPLE 4

State which sampling method was used.

a) Out of 10 hospitals in a municipality, a researcher selects one and collects


records for a 24-hour period on the types of emergencies that were treated
there.
b) A researcher divides a group of students according to gender, major field,
and low, average, and high grade point average. Then she randomly selects
six students from each group to answer questions in a survey.
c) The subscribers to a magazine are numbered. Then a sample of these
people is selected using random numbers.
d) Every 10th bottle of Energized Soda is selected, and the amount of liquid in
the bottle is measured. The purpose is to see if the machines that fill the
bottles are working properly.
4.2 DATA ORGANISATION

QUALITATIVE DATA

Type of data representation

Example:

Twenty-five army inductees were given a blood test to determine their blood type. The data set
is

A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

We can represent the data using following type of data representation:

1) Categorical Frequency Distributions

Data collected by forming categories of values and indicating the number of data fall into
each category.

Type of Blood Tally Frequency Percent


A IIII 5 20
B IIII II 7 28
O IIII IIII 9 36
AB IIII 4 16
Total 25 100
From frequency distribution:

i. more people have type O blood than any other type


ii. less people have type AB

2) PIE CHART

Consist of a circle that divided into sectors to show the number of objects or percentage in
each group or category. The angle in the sector is proportional to the number or percentage of
elements in the category.

From pie chart:

20
i. Number of type A blood is  25  5 people
100
i. 36% have most people have type O blood than
any other type
3) BAR CHART

A bar chart uses the length of vertical columns or horizontal bars to represent quantities
or percentages.

Type of Blood
10 9
8 7
frequency

6 5
4
4
2
0
A B O AB
Type of blood

From bar chart:

i. 9 peoples have type O blood than any other type


ii. 4 peoples have type AB

EXAMPLE 5

The Brunswick Research


Organization surveyed 50 randomly selected individuals and asked them the primary way they
received the daily news. Their choices were via newspaper (N), television (T), radio (R), or
Internet (I). Construct a categorical frequency distribution for the data and interpret the results.

N N T T T I R R I T
I N R R I N N I T N
I R T T T T N R R I
R R I N T R T I I T
T I N T T I R N R T

Type of way Tally Frequency Percent


N
T
R
I
Total
EXAMPLE 6

The pie chart shows the population of an area. If the number of employees is 1500, find the
number of

a) Widowed

b) Single

EXAMPLE 7

The graphs show that first-year college students spend the most on electronic equipment.

a) Calculate the percentage of students


in spend on clothing.

b) What is the difference between the


highest and the lowest spent of
electronic equipment.
QUANTITATIVE DATA

Type of data representation

The following data are the scores in Mathematics Account test for 29 student of Class 2A.

50 50 50 50 53 53 53 54 61 62

64 64 64 68 68 70 79 79 79 79

79 79 80 80 83 83 83 95 95

We can represent the data using following type of data representation:

1) Stem and leaf plot

This plot separates data entries into leading digits and trailing digits. The guidelines for
constructing stem-and-leaf plots are as follows.
i. Split each score or value into two sets of digits. The first (or leading) set of digits
is the stem, and the second (or trailing) set of digits is the leaf.
ii. List all the possible stem digits from the lowest to the highest.
iii. For each score in the mass of data, write down the leaf numbers on the line
labelled by the appropriate stem number.

Leading digit (Stem) Trailing digit (Leaf)


5 0 0 0 0 3 3 3 4
6 1 2 4 4 4 8 8
7 0 9 9 9 9 9 9
8 0 0 3 3 3
9 5 5
2 I 7 means 27 marks

From stem and leaf plot:

i. mode = 79
ii. median = 68
iii. min = 50
iv. maximum = 95

2) Frequency distribution table

 A frequency table summarizes the data collected by forming intervals of values and
indicating the number of data that falls into each interval.
 This frequency table with class intervals is known as the frequency distribution of
grouped data.
 The grouping of data is often desirable because it reduces the complexity of the
data and helps to smoothen out irregularities in the distribution.
 There are several guidelines that can be followed in constructing a grouped
frequency distribution.
i. Firstly, the class interval should be mutually exclusive. This means that the
class intervals should not overlap and must be clearly defined.
ii. Secondly, it is a good practice to ensure that class intervals are of equal
width except for open-ended classes. If there are no observations in a
particular interval, it should still be included to avoid a misleading
impression of the data.
iii. Thirdly, there should neither be too few classes nor too many classes. The
rule of thumb is, the number of classes should not be less than 5 and should
not be more than 15.

From previous example, we can make a frequency distribution table below:

Class limit Frequency


50-59 8
60-69 7
70-79 7
80-89 5
90-99 2

3) Histogram and frequency polygon

 Histograms look like bar charts but are actually different.


 The area of a rectangle in a histogram is proportional to the frequency in a particular
class.
 All rectangles are drawn side by side and any empty space in between two rectangles
means that the class has zero frequency.
 To sketch histogram:
i. Mark the class boundaries on the horizontal axis.
ii. Mark the frequency on the vertical axis.
iii. For every class, draw the rectangle with the same height as the class.

From previous example, the resulting histogram is given:


Class limit Midpoint Frequency
40-49 44.5 0 Histogram
50-59 54.5 8
10
60-69 64.5 7
8

Frequency
70-79 74.5 7
6
80-89 84.5 5
4
90-99 94.5 2 2
100-109 104.5 0 0
49 59 69 79 89 99 109
score

Frequency polygon

A frequency polygon is obtained by connecting the midpoint (or class mark) of each class at the
top of the bar in the histogram.

Frequency Polygon
9
8
7
6
5
4
3
2
1
0
40-49 50-59 60-69 70-79 80-89 90-99 100-109

4) Ogive

The two types of cumulative frequency curves or ogive are

i. “more than” cumulative frequency curve, where the cumulative frequency is the sum of
the frequencies for classes above that class.
ii. “less than” cumulative frequency curve, where the cumulative frequency is the sum of
the frequencies for classes below that class.
Ogive more than 100.00%
Ogive less than
93.10%
100% 100% 100.00%
75.86%
Cummulative Frequency
80% 80%
51.72% 72.41%

Frequency
60% 60%
27.59% 48.28%
40% 40%
20% 0.00% 20% 24.14%
0% 6.90%
0% 0.00%
49 59 69 79 89 99 59 69 79 89 99 109
Upper limits Lower limit

EXAMPLE 8
A listing of calories per 1 ounce of selected salad dressings (not fat-free) is given below.
Construct a stem and leaf plot for the data.

100 130 130 130 110 110 120 130 140 100
140 170 160 130 160 120 150 100 145 145
145 115 120 100 120 160 140 120 180 100
160 120 140 150 190 150 180 160

Stem Leaf

EXAMPLE 9

Using the histogram shown here, answer these questions.


a) How many values are in the class 27.5–30.5?

b) How many values fall between 24.5 and 36.5?

c) How many values are below 33.5?

d) How many values are above 30.5?


EXAMPLE 10

Shown is an ogive depicting the cumulative frequency of the average mathematics SAT scores
by state.

a) How many students have an average


score is 549.5?

b) How many students have an average


score more than or equal to 522.5 but
less than 603.5?
4.3 MEASURE OF CENTRAL TENDENCY

Type Formula

Mean of a set data x1, x 2 , x3 , ... x n is written x and defined as

x1  x2  x3  ...  xn
x
Mean ( X ) n
n

x i
 i 1
n

Mode The mode of a set of data is the value that occurs most

The median of a data set is the middle value when the original
data values are arranged in descending or ascending
numerical order.
Median/Quartile/Percentile

Interquartile Range,

IQR  Q3  Q1
Q1  P25  X 1
n
4

Semi Interquartile Range,


med  Q2  P50  X 1
n
2
1
SIQR  Q3  Q1  Q3  P75  X 3
2 n
4

 If the location is not integer, take the next location.

Min Lowest value

Max Highest value

Range Maximum - Minimum


SHAPE OF DATA DISTRIBUTION

Symmetry and Skewness for the Data Distribution

 Position of the mean, median, and mode on the histogram or frequency curve can be
determine the general shape of the data distributions

 3 important shapes are :


 Symmetrical
 Skewed to the right or positive skewed
 Skewed to the left or negative skewed

positively skewed

mode < median < mean

negatively skewed

mode > median > mean

Symmetrical

Mean = median = mode


Mean = median = mode

4.4 MEASURE OF DISPERSION AND SKEWNESS

Variance Standard Deviations

 n 2 1  n 2   n 2 1  n 2 
 xi    xi    xi    xi  
 i1 n  i1    i1 n  i1  
s 
2
s
n 1 n 1
COEFFICIENT OF VARIATION

The coefficient of variation is the standard deviation divided by the mean of the same
data set, and expressed as a percentage.

Formula:

standard deviation
Coefficient of Variation   100%
mean

A larger coefficient of variation means that the data is more dispersed and less
consistent.

The Pearson’s Coefficient of Skewness

3mean  median mean  mode


Sk  or Sk 
standard deviation standard deviation

4.5 EXPLORATORY DATA ANALYSIS

Boxplot

 Another graphical representation of data.


 Construct based on the lowest value, lower quartile, Q1  , median, Q2 , upper quartile,
Q3  and the highest value.
 Can be represented horizontally or vertically.

Lower boundary/fence, Upper boundary/fence,


Q1  1.5Q3  Q1  Q3  1.5Q3  Q1 
Q1  1.5IQR Q3  1.5IQR

1.5 Q3  Q1  1.5 Q3  Q1 


The skewness by Pearson’s Coefficient

Skewness Skewed to the LEFT Symmetrical Skewed to the RIGHT

Pearson’s Coefficient Sk  0.1 Sk  0 Sk  0.1

Interpretation on the shape of the distribution

Skewness Skewed to the LEFT Symmetrical Skewed to the RIGHT

Graphs

Measure of
Mean  Median  Mode Mean  Median  Mode Mode  Median  Mean
Location

Box-Plot
Q2  Q1  Q3  Q2 Q2  Q1  Q3  Q2 Q3  Q2  Q2  Q1

Central
Median Mean Median
Tendency
EXAMPLE 11
The stem and leaf diagram shows the number of flies caught in an insect trap for 27 days.

Stem Leaf
0 1 1 2
1 2 3 5 5 6
2 2 2 3 5 8 8
3 4 4 4 4 5 7 7 9
4 2 6 7 7 8

Key: 1 2 means 12

(a) Find

(i) mean, mode and median.

(ii) Q1, Q3 and semi-interquartile range.

(iii) 81t h percentile.


(iv) variance and standard deviation.

(b) Illustrate the above data by constructing a box and whisker plot. Hence, describe the
skewness of the distribution.
EXAMPLE 12
The table shows the distribution of grades of students for a certain subject in an examination.

Grade 1 2 3 4 5 6 7 8 9
Number of Students 7 13 9 7 7 2 1 1 1

(a) Find

(i) mean, mode and median.

(ii) first quartile, third quartile and P12.


(iii) standard deviation.

(b) Construct the box and whisker plot. Hence, state the shape of distribution.
EXAMPLE 13
The following is the systolic blood pressure, in mm Hg, of 10 patients in a hospital.

146 135 151 155 158 146 149 124 162 173

(a) Find the mean and mode. Describe the shape of the distribution.

(b) Find the standard deviation of the systolic blood pressure of the 10 patients. Hence, find the
Pearson’s coefficient of skewness. Comment on the distribution.
(c) Find the number of patients whose systolic blood pressures exceed one standard deviation
above or below the mean.

EXAMPLE 14

Calculate the coefficient of variation


(a) for a set of data having mean 14.0 and standard deviation 2.3.

(b) for a set of data having mean 7, and variance 0.6.

You might also like