0% found this document useful (0 votes)
16 views14 pages

Lecture-1-Numerical Representation of Data

Uploaded by

lines4u91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

Lecture-1-Numerical Representation of Data

Uploaded by

lines4u91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Lecture 1

ON
MATH-208

(Probability and Statistics)

BY
Kiran Kumar Shrestha

Department of Mathematics

School of Science

Kathmandu University

TO
CIVE - II – II Group

Topics Covered

 Numerical representation of data


 Measurement of Central Values
 Measurement of Variation

Date: Friday, Nov.19, 2021


MATH 208 (Probability and Statistics)

Chapter I – Data Representation

Section I – Numerical Representation of Data

Section II – Graphical Representation of Data

Numerical Representation of Data

Data
Data can be defined as some numeric or literal value describing some attribute of one or more entities.

Exm – Age of Ram is 19.

In crude form data are meaningless and no decision can be made with them. For making any decision
with the help of data we need to process them. After processing data we get some meaningful value called
information.

Data Processing
It is the activity of working on data to get some meaningful value called information so that we can use
data to make some decision.

Some examples are –

 Arranging data in order


 Find maximum/ minimum value
 Finding average/ variation , etc.

Measuring Central Values of Data


Types-

 Mean
 Median
 Mode

Mean
Types-
 Arithmetic mean
 Geometric mean
 Harmonic mean

Arithmetic Mean
#.1 For individual series -

Or,

̅ ∑ ∑ ∑

#.2 For discrete frequency distribution

̅ ∑

where ∑ is the sum of frequencies.

Or,

̅ ∑

#.3 For Continuous Frequency Distribution (with classes defined)

̅ ∑

where

Median
Median of a data distribution is the value which divides it into two equal parts (halves) so that 50% of
data lie above it and 50% lie below it.

Mean is preferred when actual values are important and median is preferred when some attribute of the
values are important. For example- if actual time is important then mean is used, however, if timing is
important then median is preferred.

Measurement of median
#.1 Individual series

( )

Example-

23 24 43 44 45 53 67 82

Here n = 8,

Now,

( ) ( )

( )

#.2 For discrete frequency distribution-

( )

#.3 For continuous frequency distribution

Mode
#.1 For individual series

If no value is repeated then mode is not defined. Mode is not also defined if two or more values are
repeated same number of times.

#.2 For discrete frequency distribution


#.3 For continuous frequency distribution-

Partition Values
Types-

1. Median
2. Quartiles
3. Deciles
4. Percentiles

Quartiles
Quartiles are 3 values which divide given set of data into four equal parts and they are denoted as Q1, Q2
and Q3.

Notes:

#.1

#.2 Below Q1, 25% of data lie, above Q3, 25% of data lie and between Q1 and Q3 50% of data lie.

Measurement
#.1 for individual series

( )

{ ( )}

#.2 for discrete freq. dist.

( )

{ ( )}

#.3 continuous freq. dist.


Measurement of Variation/ Scatteredness/ Dispersion/ Uniformity

Variation of a data distribution can be defined as a measure of heterogeneity (or homogeneity) of data.

Different measures of variation can be divided into two broad classes :

1. Absolute measure of variation


2. Relative measure of variation

Different types of absolute measure of variation (with which unit used to measure data are associated)
are-

 Range/ coefficient of range


 Interquartile range/ quartile deviation
 Mean deviation
 Standard deviation/ variance/ coefficient of variation

Range

Note:(i) For continuous freq. distr. with classes defined,

(ii) Range is absolute measure of variation (since unit used to express data are associated). A relative
measure of range (with whihc unit of measurement is not associated) is given by

Inter-quartile Range
Quartile Deviation (Q.D.)/ Semi-interquartile range

A relative measure of Q.D. is given by

Mean Deviation

For individual series:

∑| ̅|

For discrete and continuous frequency distribution –

∑ | ̅|

Standard Deviation

For Individual series

√ ∑ ̅ √ ∑ ̅ √ ∑ ( ∑ )

Example –

Find s.d. of following data – 30,40,35,22,25,48,45.

Method I –

Here, mean is

̅ ∑

Now, s.d., is

√ ∑ ̅ √
√ √

Method II –

We have

√ ∑ ( ∑ )

√ ( )

√ ( )

√ √

For Discrete and Continuous F.D.

√ ∑ ̅ √ ∑ ̅ √ ∑ ( ∑ )

Problem:

Given data

Marks 0-10 10-20 20-30 30-40 40-50


No. of Student 7 12 24 10 7

Solution-

Working Table-

Marks Mid-Value (x) No. of Students (f) fX fX2


0-10 5 7 35
10-20 15 12 180
20-30 25 24 600
30-40 35 10 350
40-50 45 7 245
Total N=60 1480 44300
Now

√ ∑ ( ∑ ) √ ( )

√ √

Notes:

#.1 The square of standard deviation is called variance of data, i.e.,

#.2 The relative measure of s.d. is called coefficient of standard deviation and is given by

#.3 If coefficient of s.d. is multiplied by 100 to express as percentage, then it is called coefficient of
variation (C.V.), so

#.4 Coefficient of variation is used to compare variations of two or more sets of data values.

Problem/ Example

#.(A) For individual data values-


Discussed in the previous problem.

#.(B) For discrete frequency distribution-


Following is the frequency distribution of the weekly wages of 900 workers in construction project.

Wage Frequency
10000 51
20000 128
30000 248
40000 356
50000 95
60000 22
Calculate following values of weekly wage: (a) mean (b) median (c) quartiles (d) mode (e) range (f)
coefficient of range (g) quartile deviation (h) coefficient of quartile deviation (i) standard deviation (j)
variance (k) coefficient of variation.

Solution-

#(a)

#(b)

Working table for median

Cum.
Wage Frequency Freq.

10000 51 51
20000 128 179
30000 248 427
40000 356 783
50000 95 878
60000 22 900

( ) ( )

#.(c)

( ) ( )

( ) ( )
√ ∑ ̅

Working table (for calculation of s.d.)-

Wage Frequency
(x) (f) fx x-mean (x-mean)2 f(x-mean)2
10000 51 510000 -24244.4 587792871 29977436417
20000 128 2560000 -14244.4 202904071 25971721077
30000 248 7440000 -4244.44 18015271 4467787187
40000 356 14240000 5755.56 33126471 11793023645
50000 95 4750000 15755.56 248237671 23582578737
60000 22 1320000 25755.56 663348871 14593675160
900 30820000 110386222222.24

Here,

#.(C) Long Answer Problem


Following data represent the lives of two models of refrigerators A and B

Life No. of Freeze of Model A No. of Freeze of Model B


0-2 5 2
2-4 16 7
4-6 13 17
6-8 7 19
8-10 5 9
10-12 4 1
Which model has greater (less) uniformity (consistency, variation, dispersion)?

Solution-

Working table-

Life Mid-value (x) fA fB fA.x fB.x fAx2 fBx2


0-2 1 5 2 5 2 5 2
2-4 3 16 7 48 21 144 63
4-6 5 13 17 65 85 325 425
6-8 7 7 19 49 133 343 931
8-10 9 5 9 45 81 405 729
10-12 11 4 1 44 11 484 121
Total 50 55 256 333 1706 2271

Calculation of mean

̅ ∑

̅ ∑

Calculation of s.d.

We have

√ ∑ ̅ √

√ ∑ ̅ √

Calculation of C.V.

Conclusion-
More uniform – Model B

More dispersed/ varied/ scattered – Model A

More consistent – Model B

You might also like