Lecture-1-Numerical Representation of Data
Lecture-1-Numerical Representation of Data
ON
MATH-208
BY
Kiran Kumar Shrestha
Department of Mathematics
School of Science
Kathmandu University
TO
CIVE - II – II Group
Topics Covered
Data
Data can be defined as some numeric or literal value describing some attribute of one or more entities.
In crude form data are meaningless and no decision can be made with them. For making any decision
with the help of data we need to process them. After processing data we get some meaningful value called
information.
Data Processing
It is the activity of working on data to get some meaningful value called information so that we can use
data to make some decision.
Mean
Median
Mode
Mean
Types-
Arithmetic mean
Geometric mean
Harmonic mean
Arithmetic Mean
#.1 For individual series -
Or,
̅ ∑ ∑ ∑
̅ ∑
Or,
̅ ∑
̅ ∑
where
Median
Median of a data distribution is the value which divides it into two equal parts (halves) so that 50% of
data lie above it and 50% lie below it.
Mean is preferred when actual values are important and median is preferred when some attribute of the
values are important. For example- if actual time is important then mean is used, however, if timing is
important then median is preferred.
Measurement of median
#.1 Individual series
( )
Example-
23 24 43 44 45 53 67 82
Here n = 8,
Now,
( ) ( )
( )
( )
Mode
#.1 For individual series
If no value is repeated then mode is not defined. Mode is not also defined if two or more values are
repeated same number of times.
Partition Values
Types-
1. Median
2. Quartiles
3. Deciles
4. Percentiles
Quartiles
Quartiles are 3 values which divide given set of data into four equal parts and they are denoted as Q1, Q2
and Q3.
Notes:
#.1
#.2 Below Q1, 25% of data lie, above Q3, 25% of data lie and between Q1 and Q3 50% of data lie.
Measurement
#.1 for individual series
( )
{ ( )}
( )
{ ( )}
Variation of a data distribution can be defined as a measure of heterogeneity (or homogeneity) of data.
Different types of absolute measure of variation (with which unit used to measure data are associated)
are-
Range
(ii) Range is absolute measure of variation (since unit used to express data are associated). A relative
measure of range (with whihc unit of measurement is not associated) is given by
Inter-quartile Range
Quartile Deviation (Q.D.)/ Semi-interquartile range
Mean Deviation
∑| ̅|
∑ | ̅|
Standard Deviation
√ ∑ ̅ √ ∑ ̅ √ ∑ ( ∑ )
Example –
Method I –
Here, mean is
̅ ∑
Now, s.d., is
√ ∑ ̅ √
√ √
Method II –
We have
√ ∑ ( ∑ )
√ ( )
√ ( )
√ √
√ ∑ ̅ √ ∑ ̅ √ ∑ ( ∑ )
Problem:
Given data
Solution-
Working Table-
√ ∑ ( ∑ ) √ ( )
√ √
Notes:
#.2 The relative measure of s.d. is called coefficient of standard deviation and is given by
#.3 If coefficient of s.d. is multiplied by 100 to express as percentage, then it is called coefficient of
variation (C.V.), so
#.4 Coefficient of variation is used to compare variations of two or more sets of data values.
Problem/ Example
Wage Frequency
10000 51
20000 128
30000 248
40000 356
50000 95
60000 22
Calculate following values of weekly wage: (a) mean (b) median (c) quartiles (d) mode (e) range (f)
coefficient of range (g) quartile deviation (h) coefficient of quartile deviation (i) standard deviation (j)
variance (k) coefficient of variation.
Solution-
#(a)
#(b)
Cum.
Wage Frequency Freq.
10000 51 51
20000 128 179
30000 248 427
40000 356 783
50000 95 878
60000 22 900
( ) ( )
#.(c)
( ) ( )
( ) ( )
√ ∑ ̅
Wage Frequency
(x) (f) fx x-mean (x-mean)2 f(x-mean)2
10000 51 510000 -24244.4 587792871 29977436417
20000 128 2560000 -14244.4 202904071 25971721077
30000 248 7440000 -4244.44 18015271 4467787187
40000 356 14240000 5755.56 33126471 11793023645
50000 95 4750000 15755.56 248237671 23582578737
60000 22 1320000 25755.56 663348871 14593675160
900 30820000 110386222222.24
Here,
Solution-
Working table-
Calculation of mean
̅ ∑
̅ ∑
Calculation of s.d.
We have
√ ∑ ̅ √
√ ∑ ̅ √
Calculation of C.V.
Conclusion-
More uniform – Model B