Statistical Analysis With Software Application - Week2
Statistical Analysis With Software Application - Week2
ANALYSIS WITH
SOFTWARE
APPLICATION
WEEK 2
The frequency or the frequency count for a data value is the number of times
the value occurs in the data set.
Categorical frequency distribution represents data that can be placed in
specific categories.
Example:
Twenty-five incoming freshmen were given a blood test to determine their blood type.
The data set is
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
CLASS TALLY FREQUENCY PERCENT
5
A 5 25
=20 %
7
B 7 25
=28 %
9
O 9 25
=36 %
AB 4 4
25
=16 %
TOTAL 25 100
Ungrouped frequency distribution lists the data values with the corresponding
number of times or frequency count with which each value occurs.
Example:
The following data represent the number of defective bulbs observed each day over a
25-day period for a manufacturing process. Summarize the information with a frequency
distribution.
CLASS FREQUENCY
(DEFECTS)
6 4
7 1
9 2
10 5
11 6
12 3
14 1
16 1
20 1
21 1
𝐓𝐎𝐓𝐀𝐋 𝟐𝟓
Grouped frequency distribution is obtained by constructing classes (or
intervals) for the data, and then listing the corresponding number of values
(frequency count) in each interval.
k 1 2 3 4 5 6 7 8 9
2 4 8 16 32 64 128 256 512
NOTE
the class limits should have the same decimal place value as the data, but the class boundaries
should have one additional place value and end in a 5.
Example:
The data below represent the record high temperatures in degrees Fahrenheit (F) for
each of the 50 cities in the Philippines this April. Construct a grouped frequency distribution for
the data, using 7 classes.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
◦ Find the highest value and lowest value:
and .
so
◦ Find the class width by dividing the range by the number of classes.
CLASS LIMITS CLASS BOUNDARIES TALLY FREQUENCY
100 −104 99.5 − 10 4.5 II 2
105 9 1 04.5 −10 9.5 IIIII − III 8
110 −114 1 09.5 −1 14.5 IIIII − IIIII − IIIII − III 18
115 9 1 14.5 −119.5 IIIII − IIIII − III 13
120 −124 1 19.5 −1 24.5 IIIII − II 7
125 9 1 24.5 −1 29.5 I 1
130 −134 1 29.5 −1 34.5 I 1
Sometimes it is necessary to use a cumulative frequency distribution. A cumulative
frequency distribution is a distribution that shows the number of data values less than or equal
to a specific value (usually an upper boundary).
Go to Formula menu.
◦ In the previous slide, we have selected column F as data array and Student marks as Bin array
=FREQUENCY (F3:F9,C3:C22) and press CTRL+SHIFT+ENTER.
◦ Once we hit the CTRL+SHIFT+ENTER we can see the open and closing parenthesis as shown below.
Example
For creating a pivot table we have to go to the insert menu and select pivot table.
Drag down the Sales in Row Labels. Drag down the same sales in Values.
Set the pivot field setting to count to get the sales count numbers.
Click on the row label sales number and right click then Choose Group option.
We will get the grouping dialogue box below:
Edit the grouping numbers starting at 5000 and Ending at 18000 and it Group By 1000 and then click ok.
We will obtain the result below.
To create a graphical representation of this result, go to insert menu and select the Column chart.
We obtain the graph below.
Example
Go to Data Menu on the right top, we can find the data analysis. Click on the data analysis which is
highlighted as shown below.
We obtain a dialogue box as shown below. Choose Histogram option and Click ok.
We then obtain the histogram dialogue box as shown below.
Select the Input Range and Bin Range as shown below.
Make sure to tick the boxes for label option, Cumulative Percentage, Chart Output and then Click OK.
We obtain this result…
THE MEASURES OF
CENTRAL
TENDENCY
Data Description
Central Position
Tendency
Variation
Measures of Central Tendency
◦ The mean is the sum of the values, divided by the total number of values.
◦ The median is the midpoint of the data array.
◦ The midrange is defined as the sum of the lowest and highest values in the data set, divided by 2.
◦ The value that occurs most often in a data set is called the mode.
Median
Mode
Mean
Mean (Arithmetic Mean)
◦ Mean (Arithmetic Mean) of Data Values
◦ Sample mean
Sample Size
n
◦ Population mean
X i
X1 X 2 X n
X i 1
n n
Population Size
N
X i
X1 X 2 X N
i 1
N N
Mean (Arithmetic Mean)
◦ The Most Common Measure of Central Tendency
◦ Affected by Extreme Values (Outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
Median
◦ Robust Measure of Central Tendency
◦ Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
Mode = 9
No Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
THE MEASURES OF
CENTRAL TENDENCY
USING EXCEL
To compute for the mean, simply use the average function by typing =average(data array) on an empty cell,
then press ENTER.
After pressing the ENTER button, we obtain the result below:
◦ Types:
1. Quartiles
2. Percentiles
3. Deciles
Quartiles
Q1, Q2, Q3
divides ranked scores into four equal parts
25% 25% 25% 25%
(minimum)
Q1 Q2 Q3 (maximum)
(median)
Q1 – Lower Quartile
At most, 25% of data is smaller than Q1.
◦ 50% of the data values fall below the median and 50% fall above.
Q3 – Upper Quartile
◦ At most, 25% of data is larger than Q3.
𝑛 +1
𝑄1 = 𝑡h
4
2(𝑛+1) 𝑛+1
𝑄 2= = 𝑡h
4 2
3(𝑛 +1)
𝑄 3= 𝑡h
4
Interquartile Range
◦ 50% of the observations in the distribution are in the inter quartile range.
◦ The following figure shows the interaction between the quartiles, the median and the inter
quartile range.
Deciles
D 1, D 2, D 3, D 4, D 5, D 6, D 7, D 8, D 9
divides ranked data into ten equal parts
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5 D6 D7 D8 D9
Percentiles
◦ Values of the variable that divide a ranked set into 100 subsets.