0% found this document useful (0 votes)
12 views44 pages

كتاب الإحصاء

Uploaded by

nydck2gb2j
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views44 pages

كتاب الإحصاء

Uploaded by

nydck2gb2j
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 44

Principles of Statistics

BUS-221
Level 3

Revised Edition
2020

Business Studies Program


Department of Management
Jazan Community College
Jazan University

1
Contents

UNIT NO. NAME PAGE NO.

1 Fundamentals of Statistics 3-5

Measures of Central Tendency:


2 6 - 12
Mean, Media & Mode

3 Measures of Dispersion – Range 13 – 16

4 Standard Deviation 17 – 20

5 Time Series Analysis 21 – 25

6 Index Numbers 26 – 28

7 Correlation 29 – 33

8 Probability 34 – 38

2
9 Hypothesis Testing 39 - 41

Unit-1
Fundamentals of Statistics

What is Statistics?
The word ‘Statistics’ refers to some numerical facts relating to any phenomena dealing with quantitative
information.

Scope of Statistics:
It comprises of consumption, production, investment, industrial and trade policy formulation, education and
health indicator, exports and imports trend determination, distribution of national income, fluctuations in
price level, taxation policy, poverty estimation, growth and development indicator as well as foreign policy
determination.

Functions of Statistics:
1. Presentation of facts in a definite form
2. Simplifying complex data
3. Facilitates comparison
4. Helps in prediction of future trends

Limitations of Statistics:
1. It does not deal with isolated numbers, but deals in groups.
2. It deals only with quantitative characteristics
3. It is only a means for further investigation but not an end in itself.
4. Statistics can be misused for vested interest by the investigator. Hence should be used with caution.

What are fundamental elements of Statistics?


There are 5 basic elements of an inferential statistical problem:
1. Population: The set of all units (people, objects, or events) that we are interested in studying.

3
2. Variables: Characteristics or properties of the individual population units. The process of assigning
numerical values to variables is called measurement.
3. Sample: a subset of the population of interest. In statistics, samples are the primary source of information
used in studies.
4. Statistical Inference: An estimate or prediction about a population based on information contained in a
sample.
5. Reliability: A statement about the degree of certainty associated with a statistical inference.

What are the types of Data?


1. Quantitative Data: Data that are measured on a numerical scale. (Age, Height, Weight, etc.)
2. Qualitative Data: Data that cannot be measured on a naturally occurring scale; they can only be
classified into categories (Gender, Ethnicity, Birthplace, etc.).

What is the role of Statistics in Managerial Decision Making?


1. To understand and use statistical terminologies and concepts.
2. Collect survey data (primary data) or use collections of existing data (secondary data).
3. Prepare graphical presentation of data.
4. Determine the meaning and applications of probabilities in Real world (Business, Social, and Political
Environments).
5. Formulate Hypotheses (claims) related to Business and Social Science and perform statistical tests of
those hypotheses.
6. Interpret statistical findings leading to decision making and policy implementation.

4
EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.
1. ‘Statistics’ refers to some numerical facts relating to any phenomena dealing with quantitative
information.
2. Statistics shows education and health indicators. ( )
3. Statistics deals with qualitative characteristics only. ( )
4. Existing data is known as secondary data. ( )
5. Prediction is not a part of statistics. ( )
6. Statistics can be misused. ( )
B. Choose the correct answer from the following:
1. …………………………….. is that data that can be measured on a numerical scale:
(a) Qualitative data (b) Quantitative data
(c) Primary data (d) Secondary data

2. Statistics comprises of …………………………………:


(a) consumption (b) production
(c) investment (d) all
C. Fill in the space with word from the following list:
population quantitative data samples complex

1. Statistics simplify …………………… data.


2. In statistics ……………………………… are the primary sources of information used in studies.
3. The set of all units (people, objects, transactions or events) that we are interested in studying is known
as………………………………………………...
4. Age, Height, Weight etc. are also known as……………………………………….
D: Subjective Question(s):
Q.1. Define Statistics.
_______________________________________________________________________________________
_______________________________________________________________________________________

5
Q.2.What are the elements of statistics?
_______________________________________________________________________________________
_______________________________________________________________________________________

Unit-2
Measures of Central Tendency- Mean, Median & Mode

There are basically three kinds of measures of central tendency called as averages – Mean, Median and
Mode.
What is Mean?

The "mean" is the sum of values of all observations divided by the number of observations. It is also called
as Arithmetic mean (AM).

The mathematical notation to calculate the mean in direct method is as follows:


Mean = ∑X ÷ N

Here, ∑X= Sum of all the individual values and N= Total number of items

Example 1: Calculate average marks scored by students of diploma in Statistics test from the following
Data:

13 , 18 , 13 , 14 , 13 , 16 , 14 , 21 , 13

SOLUTION: The mean is the average, so:

13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13 = 15

9
Thus, average marks of students of diploma in Statistics test are 15.

Example 2: Calculate average monthly expenditure of four supervisors working in ABC Ltd. From
the data given below:

Supervisors Monthly expenditure (in SR) X


1 2000

2 3000

3 4000

6
4 4500

∑X = 13,500

SOLUTION: Mean = ∑X ÷ N = 13500 ÷ 4 = SR 3375.

SR 3375 is the monthly expenditure incurred by the supervisors working in ABC Ltd.

What is Median?

Median is the centrally located value of a series such that half of the values of series are above it and half
below it. Median is considered as the best statistical technique for studying the qualitative attribute of an
observation in the data set. To find the median, we have to arrange the data in ascending or descending order
then apply the following formula:
Position of Media (Me) = (N + 1)th divided by 2 or (N + 1)th / 2, whereas, N = Number of items.

Example 1:

Calculate Median marks from the data given below of marks of nine students:

13 , 18 , 13 , 14 , 13 , 16 , 14 , 21 , 13

SOLUTION: The median is the middle value, so rewrite the list in order:

S. No. Marks

1 13

2 13

3 13

4 13

5 14

6 14

7 16

8 18

7
9 21
th
Median = (N + 1) ÷ 2
Median = (9 + 1) ÷ 2
Median = 10 ÷ 2 = 5th item
The marks of 5th student are 14. Hence, median marks are 14.

Example 2: Find the median for the following list of values:

8, 9, 10, 10, 10, 11, 11, 11, 12, 13

SOLUTION: The median is the middle value. In a list of ten values, that will be the (10 + 1) ÷ 2 = 5.5th
value; that is, I'll need to average the fifth and sixth numbers to find the median:

(10 + 11) ÷ 2 = 21 ÷ 2 = 10.5 So, the Media is 10.5.

What is Mode?

The mode of a set of numbers is the element that appears most frequently in the set. There can be more than
one mode in a set of numbers. A set that has two modes is bimodal and one that has three modes is trimodal.
If no element of a set appears more often than any other element, the set has no mode.
Calculation of mode:
Mode can be determined by two methods:
1. Inspection method
2. Grouping method
INSPECTION METHOD:

Example 1: Find the modal size of T-shirts sold in a shop from the following information given:

34, 26, 30, 34, 28, 32, 32, 34, 33, 31, 33 and 30.

SOLUTION:

T-Shirt size Frequency

26 1

28 1

8
30 2

31 1

32 2

33 2

34 3

The mode is the number that is repeated more often than any other, so 34 is the mode as it repeated 3 times.

GROUPING METHOD:

Example 2: Find the mode for the following marks scored by students of diploma in their Statistics test from
the information given: 13, 17, 24, 20, 18.

SOLUTION:

Marks scored Frequency

13 1

17 1

24 1

20 1

18 1

Here no number appears more than 1, so it has no modal marks.

Example 3: Find the mode for the modal size of shoes purchased by consumers from the information given:
1, 2, 2, 3, 4, 4, 5.

SOLUTION:

T-Shirt size Frequency

1 1

2 2

9
3 1

4 2

5 1

The shoe size numbers 2 and 4 each appear twice. The set has two modal values of shoe size.

EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.

1. The sum of values of all observations is called mean. ( )


2. Sum of all the values are not required to calculate the mean. ( )

10
3. The "median" is the last value in the list of numbers. ( )
4. To find the median, your numbers have to be listed in numerical order. ( )
5. The "mode" is the value that occurs least in the set. ( )
6. If no number is repeated, then there is no mode for the list. ( )
7. Mode can be determined by three methods. ( )

B. Fill in the space with word from the following list:

centrally qualitative bimodal results Averages

1. Mean, median, and mode are three kind of ………………………….


2. The mean is useful for predicting future …………………………….
3. The "median" is the …………………………. located value of a series.
4. Median is considered as the best statistical technique for studying the ………………….. attribute.
5. A set of data may have two modes, then it is known as …………………………..

C. Choose the correct answer from the following:

1. The mean is calculated with the help of……………………………………

(1) Position (2) Formula


(3) Highest value (4) Lowest value

2. Where you add up all the numbers and then divide by the number of numbers you calculate:

(1) Median (2) Mean


(3) Mode (4) Deviation

3. To find the median, we have to arrange the data in ……………. order then apply the formula:
a. Ascending b. Descending c. Both 1 & 2 d. None

4. If no number is repeated, then there is no ………………… for the list:

a. Mode b. Median c. Mean d. Deviation

11
5. The value of the variable for which the frequency is maximum is called as:

a. Median b. Mean c. Mode d. Deviation

D. Problem solving questions:

1. The Monthly income of 5 persons is given below (all figures in SAR):

1132, 1140, 1144, 1136, 1148

Find out the Arithmetic Mean.

2. The marks obtained by ten students in an examination are as given below. Calculate Arithmetic Mean.

ROLL NUMBERS MARKS


1 43
2 48
3 65
4 57
5 31
6 60
7 37
8 48
9 78
10 59
SUM 526

3. Calculate Median from the following data:

520, 20, 340, 190, 35, 800, 1210, 50, 80

4. Calculate Median from the marks obtained by 20 students:

Roll
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
No.

12
Marks 25 28 29 30 32 33 33 35 42 45 46 47 48 51 52 53 54 60 65 72

5. The following data shows the ages of 20 students in a class, find the Mode.

15, 17, 18, 20, 22, 24, 21, 17, 16, 15, 21, 22, 23, 22, 17, 22, 18, 22, 19, 20

6. Find the Mode in the following series:

Size of the shoes: 3, 4, 2, 1, 7, 6, 6, 7, 5, 6, 8, 9, 5,7

Unit-3
Measures of Dispersion - Range

What is Dispersion?

13
In statistics, dispersion (also called statistical dispersion) is quantifiable variation of measurements of
differing members of a population within the scale on which they are measured.
Dispersion is the extent to which values in a distribution differ from the average of distribution.

What is Range?

The simplest measure of dispersion is the range (R). The range is calculated by simply taking the difference
between the largest (L) and smallest (S) values in the data set. It is calculated by subtracting the smallest
observations from the greatest i.e., R = L – S.

Range

In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9, so the range is 9-3 equals 6.

Example 1:

Range: The range of a set of numbers is found by computing the difference between the highest score and
the lowest score. For example, the range of the following set of data:

34 50 61 77 99 26 15 62 44 74 88 is

R=L–S

99 - 15 = 84

Example 2:

Find the range for the following list of values: 13, 18, 13, 14, 13, 16, 14, 21, 13.

The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.

What is the Role of Range in Statistics?

14
1. The range is the length of the smallest interval which contains all the data. It is calculated by subtracting
the smallest observation from the greatest observation and provides an indication of statistical dispersion.

2. It is measured in the same units as the Data.

3. It provides the spread of Data in the Sample.

4. It provides the boundary for the Data.

5. The midrange point, i.e. the point halfway between the two extremes, is an indicator of the central
tendency of the data.

Coefficient of Range:

The relative measure of range, called the co-efficient of range, is obtained by applying the following
formula:

Co-efficient of range = L – S/ L + S

EXAMPLE 1:

From the daily wages of 10 employees working in a firm given below calculate the co-efficient of range of
wage distribution among them:

Daily wages (in SR) Number of workers

150 2

200 3

250 4

300 1

Co-efficient of range = L – S ÷ L + S

= 300 – 150 / 300 + 150

= 150 / 450 = 0.33

Co-efficient of range of wage distribution among 10 employees of a firm is 0.33.

15
EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.

1. Dispersion is also known as statistical dispersion. ( )

2. Range is not measured in the same units as the data. ( )

3. Range is a poor measure of dispersion. ( )

4. The difference between the middle value and the lowest value is known as range. ( )

5. The mid-range point is the point halfway between the two extreme values. ( )

B. Fill in the space with word from the following list:

data range dispersion boundary

1. The ……………………………………. is just the difference between the largest and smallest values.

2. It is measured in the same units as the ………………………………………..

3. It provides the ……………………………………………. for the Data.

4. It is a poor measure of ……………………………………… except when the sample size is large.

C. Choose the correct answer from the following:

1. The simplest absolute measure of dispersion is ………………………………….

a. Correlation
b. Range
c. Median
d. Deviation

2. The formula for calculating co-efficient of range is:

a. R = L + S / L - S
b. R = S + L / S - L
c. R = L + S / L - S
d. R = L – S / L + S

16
D. Solve the following Question:

1. The net profit of a business concern in thousands of SAR is given below:

Year 1996 1997 1998 1999 2000 2001 2002


Profit 100 160 150 220 300 190 200

Find out the Range.

2. From the daily wages of 10 employees working in a firm given below, calculate the co-efficient of
range.

Daily wages (in SR) Number of workers

200 2

250 3

300 4

350 1

17
Unit-4

Standard Deviation

What is Standard Deviation?

Standard deviation is the most important and widely used measure of dispersion. It is also called as Root
Mean Square Deviation. The concept of Standard Deviation was introduced by Karl Pearson in 1893. It is
denoted by Greek letter sigma ( σ ).

Example 1:
Suppose we wish to find the standard deviation of the data set consisting of the values 3, 7, 7, and 19.

Step 1: find the arithmetic mean (average) of 3, 7, 7, and 19,

Step 2: find the deviation of each number from the mean,

Step 3: square each of the deviations, which amplifies large deviations and makes negative values positive,

18
Step 4: find the mean of those squared deviations,

Step 5: take the non-negative square root of the quotient (converting squared units back to regular units),

So, the standard deviation of the set is 6.

Example 2:

Consider a example consisting of the following values

There are eight data in total, with a Mean (or average) value of 5:

To calculate the standard deviation, we compute the difference of each data point from the mean, and square
the result:

Next we average these values and take the Square root, which gives the standard deviation:

19
Therefore, the population above has a standard deviation of 2.

Uses of standard deviation

The standard deviation is essential for:

 Assessing the degree of dispersion of the values around its mean,


 Assessing the error to which the mean of a sample is subject when estimating the mean of a
population from which the sample was taken.

MERITS AND DEMERITS OF STANDARD DEVIATION:

MERITS:

1. It is based on all observations in a data set.

2. It can be subjected to further algebraic treatment.

3. It is possible to calculate the combined standard deviation of two or more sets of a data.

DEMERITS:

1. As compared to other measures of dispersion, calculations of standard deviations are difficult.

2. While calculating standard deviation, more importance is given to extreme values and less to those near
mean.

20
EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.

1. Standard deviation is the most important and widely used measure of studying dispersion. ( )

2. The standard deviation concept was introduced by Karl Pearson in 1823. ( )

4. Standard deviation checks the dispersion of its mean. ( )

5. Standard deviation is not based on all the observations in a data set. ( )

B. Fill in the space with word from the following list:

Karl Pearson root mean square uniformity data dispersion

1. Standard deviation is also known as ……………………………………….. deviation.


2. Standard Deviation is based on all observations in a ……………………. set.
3. The standard deviation concept was introduced by ……………………… in 1893.
4. Standard deviation is the most important and widely used measure of studying ………………………

C. Choose the correct answer from the following:


1. The Standard Deviation concept was introduced by Karl Pearson in ………………:

a. 1993 b. 1893 c. 2003 d. 2013

2. As compared to other measures of dispersion, calculations of standard deviations are …………..:

a. very easy b. easy c. medium d. difficult

21
D. Solve the following question:

Find the Standard Deviation of the monthly salaries of 10 persons given below.

Persons A B C D E F G H I J
Salaries
120 110 115 122 126 140 125 121 120 131
(in SR)

Unit-5
Time series analysis

What is Time Series?


A time series is a sequence of observations which are ordered in time (or space). If observations are made on
some phenomenon throughout time, it is most sensible to display the data in the order in which they arose,
particularly since successive observations will probably be dependent.
A time series graph is a line graph where time is measured on the horizontal axis (X axis) and the variable
being observed is measured on the vertical axis (Y axis).

There are two kinds of time series data:


1. Continuous, where we have an observation at every instant of time, e.g. lie detectors, electrocardiograms.
2. Discrete, where we have an observation at (usually regularly) spaced intervals.

Examples
1. Economics - weekly share prices, monthly profits
2. Meteorology - daily rainfall, wind speed, temperature
3. Sociology - crime figures (number of arrests, etc), employment figures

What are Components of Time Series Analysis?


1. Trend Component

22
Trend is a long term movement in a time series. It is the underlying direction (an upward or downward
tendency) and rate of change in a time series, when allowance has been made for the other components.

A simple way of detecting trend in seasonal data is to take averages over a certain period. If these averages
change with time we can say that there is evidence of a trend in the series.

It can be helpful to model trend using straight lines, polynomials etc.

Figure : Quarterly Gross Domestic Product

2. Cyclical Component

In weekly or monthly data, the cyclical component describes any regular fluctuations.

3. Seasonal Component
In weekly or monthly data, the seasonal component, often referred to as seasonality, is the component of
variation in a time series which is dependent on the time of year. It describes any regular fluctuations with a
period of less than one year. For example, the costs of various types of fruits and vegetables, unemployment
figures and average daily rainfall, all show marked seasonal variation.

23
Figure: Monthly Retail Sales in Riyadh Retail Department Stores
There is an obvious large seasonal increase during Eid festival shopping. In this example, the magnitude of
the seasonal component increases over time.
4. Irregular Component
The irregular component (sometimes also known as the residual) is what remains after the seasonal and trend
components of a time series have been estimated and removed. It results from short term fluctuations in the
series which are neither systematic nor predictable. In a highly irregular series, these fluctuations can
dominate movements, which will mask the trend and seasonality. The following graph is of a highly
irregular time series:

Figure: Monthly Value of Building Approvals, Australian Capital Territory (ACT)

5. Smoothing

24
Smoothing techniques are used to reduce irregularities (random fluctuations) in time series data. They
provide a clearer view of the true underlying behavior of the series.

In some time series, seasonal variation is so strong it obscures any trends or cycles which are very important
for the understanding of the process being observed. Smoothing can remove seasonality and makes long
term fluctuations in the series stand out more clearly. The most common type of smoothing technique is
moving average smoothing although others do exist. Since the type of seasonality will vary from series to
series, so must the type of smoothing.

6. Exponential Smoothing

Exponential smoothing is a smoothing technique used to reduce irregularities (random fluctuations) in time
series data, thus providing a clearer view of the true underlying behavior of the series. It also provides an
effective means of predicting future values of the time series (forecasting).

7. Moving Average Smoothing

A moving average is a form of average which has been adjusted to allow for seasonal or cyclical
components of a time series. Moving average smoothing is a smoothing technique used to make the long
term trends of a time series clearer.

When a variable, like the number of unemployed, or the cost of strawberries, is graphed against time, there
are likely to be considerable seasonal or cyclical components in the variation. These may make it difficult to
see the underlying trend. These components can be eliminated by taking a suitable moving average. By
reducing random fluctuations, moving average smoothing makes long term trends clearer.

8. Extrapolation

Extrapolation is when the value of a variable is estimated at times which have not yet been observed. This
estimate may be reasonably reliable for short times into the future, but for longer times, the estimate is liable
to become less accurate.

25
Example 1:

Draw a time series graph of the daily amount of rainfall (in millimeters) based on the following recorded
data:

No. of
1 2 3 4 5 6 7 8
Days

Amount
of
45 20 40 38 42 15 10 22
Rainfall
(in mm)
EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.
1. A time series is a sequence of observations which are ordered in time. ( )
2. Trend is a short term movement in a time series. ( )
3. A simple way of detecting trend in seasonal data is to take averages over a certain period. ( )
4. Seasonal component describes any regular fluctuations with a period of less than two years. ( )

B. Fill in the space with word from the following list:


Time cyclical line trend Exponential

1. A time series graph is a …………………. graph.


2. …………………… is measured on the horizontal axis (X axis).
3. A simple way of detecting ……………… in seasonal data is to take averages over a certain period.
4. The ……………… component describes any regular fluctuations.
5. …………………………. smoothing is a smoothing technique used to reduce irregularities (random
fluctuations) in time series data.
C. Choose the Correct Answer from the following four words:
1. A time series is a sequence of observations which are ordered in ……………………………..
a. Time b. Place c. Occasion d. Cycle

26
2. A time series graph is a line graph where variable being measured on …………………..

a. X axis b. Z axis c. Y axis d. None

3. Where we have an observation at every instant of time, then the time series is known as
……………………………………………
a. Discrete b. Stable c. Variable d. Continuous

4. ………………………………………….. is a long term movement in a time series.


a. Cyclical variations b. Trend Component c. Seasonal component d. Irregular component

27
Unit-6
Index Numbers

What are Index Numbers?


An index number is a percentage ratio of prices, quantities or values comparing two time periods or two
points in time. The time period that serves as a basis for the comparison is called the base period and the
period that is compared to the base period is called the given or current period.

Constructing an Index

value in period n
Index for any time period n = value in base period 100

Example Year Value Calculation Index


(Base year 1 12380 12380/12380 x 100 100
is year 1) 2 12490 12490/12380 x 100 100.88

What are Price Indexes and what are their uses?


The most commonly quoted index is the Retail Price Index, the RPI. The Retail Price Index is a monthly
index indicating the amount spent by a 'typical household'.

The RPI is used as a benchmark against which other indexes are compared to show how the price of a product
changes over time, due to:

 Changing costs of raw materials;

 Variable supply and demand;

 General inflation; etc.

EXAMPLE:

28
The retail price of rice per kg over a period of four y ears is given below:
Year 2013 2014 2015 2016
Prices (SR) 24.60 25.35 26.00 26.50

1. Find the Price Index based on 2013 Prices


2. Find the percentage change in price between consecutive years (base year 2013).

SOLUTION:
PERCENTAGE
PERCENTAGE
CHANGE
PRICE CHANGE WITH
YEAR PRICE (SR) BERWEEN
RELATIVE RESPECT TO
CONSECUTIVE
THE BASE YEAR
YEARS
2013 24.60 100 - -
25.35/24.60 x 100 =
2014 25.35 103.04 - 100 = 3.04 103.04 – 100 = 3.04
103.04
26.00/24.60 x 100 = 105.69 – 103.04 =
2015 26.00 105.69 – 100 = 5.69
105.69 2.65
26.50/ 24.60 x 100 = 107.72 – 105.69 =
2016 26.50 107.72 – 100 = 7.72
107.72 2.03

What is the Frequency of Calculation of Index Numbers?

Indexes can be calculated with any convenient frequency, that is:

1. Yearly

2. Monthly,

3. Daily, etc.

What is the use of Index Numbers?

1. The primary purposes of an index number are to provide a value useful for comparing magnitudes of
aggregates of related variables to each other, and to measure the changes in these magnitudes over time.

2. There are a number of particularly well-known ones, some of which are announced on Public media every
day.

3. Government agencies often report time series data in the form of index numbers. For example, the
consumer price index is an important economic indicator.

29
EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.

1. Index numbers measure the changing value of a variable over time in relation to its value at some fixed
point in time. ( )

2. The base period is given the value of 110. ( )

3. The most commonly quoted index is the Retail Price Index. ( )

4. The consumer price index is an important economic indicator. ( )

5. The base period of an index number is the period against which comparisons are made. ( )

B. Fill in the space with word from the following list:

Index-numbers Time series Barometers Changes Retail-Price

1. The ………………………… Index is a monthly index indicating the amount spent by a 'typical household'.

2. Government agencies often report ……………………………………………. data in the form of index


numbers.

3. Index numbers measure the effect of …………………………………………. over a period of time.

4. Many economic and business policies are guided by ……………………………………...

C. Choose the Correct Answer from the following four words:

1. The most commonly quoted index is the …………………………………

a. Retail Price Index b. Wholesale Price Index


c. Simple Index number d. Composite index number

2. Indexes can be calculated with any convenient frequency, that is:

a. Yearly b. Monthly c. Daily d. All the above

D. Solve the following Question:

Calculate Price Relative Index for all the years from the following data based on 2013 Prices:

Year 2013 2014 2015 2016 2017

30
Price(In SR) 120 140 150 165 175

Unit-7
Correlation

What is Correlation?
Correlation analysis studies the relation between two variables. It is a statistical technique that is used to
analyze the strength and direction of the relationship between two variables.

Types of Correlation:
There are three broad types of correlations –
1. Positive and Negative Correlation
2. Linear and Non-linear Correlation
3. Simple, Partial and Multiple Correlation

1. Positive and Negative correlation


A positive correlation refers to the same direction of change in the values of variable. In other words if
value of X increases of decreases, then the value of Y increases of decreases respectively.
EXAMPLE:
POSITIVE CORRELATION
Price of Commodity X (in SR) Quantity Supplied of Commodity X (in units)
10 6
12 9
14 10
16 12

A negative correlation refers to the opposite direction of change in the values of variables. In other words,
if value of X increases or decreases then the value of Y decreases or increases respectively.
EXAMPLE:
NEGATIVE CORRELATION
Price of Commodity X (in SR) Quantity Supplied of Commodity X (in units)
16 6

31
14
10 9
8 10
12

2. Linear and Non-linear correlation


A linear correlation depicts a constant change in one of the variable values with respect to a change in the
corresponding values of another variable. Hence, variation in the values of two variables is in constant ratio.
EXAMPLE:
LINEAR CORRELATION
X 10 20 30 40
Y 20 40 60 80

The line joining these points on a graph will be a straight line.


A non-linear correlation (also called as curvilinear) depicts an absolute change in one of the variable values
with respect to a change in the corresponding values of another variable. Hence, variation in the values of
two variables is not in constant ratio.
EXAMPLE:
NON-LINEAR CORRELATION
X 9 11 11 12
Y 18 20 30 45

The line joining these points on a graph will be a non-linear (curvilinear) line.

32
3. Simple, Partial and Multiple Correlation
Simple, Partial and Multiple Correlation is based upon the number of variables involved in the correlation
analysis.

When two variables are involved, then such a correlation is known as Simple Correlation. For e.g.,
expenditure incurred by Pepsi on advertisement and as a result sales of Pepsi increases.
In Partial Correlation, the relationship between two variables is influenced by other factors. For e.g.,
demand for Pepsi is not only affected by advertisement expenditure incurred by the firm, but it is also
depends on quality, content and brand image of the product in the market.

In Multiple Correlation, the relationship between more than three variables is considered simultaneously
for study. For e.g., expenditure pattern of Saudi consumers can be examined with reference to income,
saving pattern, family size, education and family background.

Methods of Correlation Analysis:


1. Scatter Diagram
2. Karl Pearson’s Coefficient of Correlation
3. Spearman’s Rank Correlation

33
EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.
1. Correlation analysis deals with the association between two or more variables. ( )
2. There are basically seven broad types of correlation. ( )

3. A positive correlation refers to the same direction of change in the values of variable. ( )

4. A negative correlation refers to the same direction of change in the values of variables. ( )

5. Karl Pearson’s Coefficient of correlation is a method to calculate correlation. ( )

6. The line joining the points on a graph if a straight line is called non-linear correlation. ( )

7. Correlation can be presented graphically. ( )

B. Fill in the space with word from the following list:

Simple Multiple
variables curvilinear Scatter
Correlation Correlation

1. Correlation analysis studies the relation between two …………………………...


2. ………………………… diagram is one of the methods of correlation analysis.
3. When two variables are involved, then such a correlation is known as ………………………..
4. A non-linear correlation is also called as ....................................
5. In…………………………………………., the relationship between more than three variables is
considered simultaneously for study

C. Choose the correct answer from the following:

34
1. When the increase (or decrease) in one variable result in a corresponding increase (or decrease) in the
other, the correlation is said to be:
a. Neutral b. Negative c. Positive d. None of above
2. When the increase (or decrease) in one variable result in a corresponding decreases (or increases) in the
other, the correlation is said to be:
a. Neutral b. Negative c. Positive d. None of above

3. The common method of calculation of Correlation is by the help of ………………………….


a. The Graphic method b. Karl Pearson’s coefficient of Correlation
c. Scatter Diagram d. Correlation table

35
Unit-8
Probability

What is Probability?
Probability (means chance) is branch of statistics that deals with measuring quantitatively the likelihood that
an event or experiment will have a particular outcome. It is the necessary foundation for Statistics

What are the Outcomes of Probability?

Mathematically, the probability that an event will occur is expressed as a number between 0 and 1.
Notationally, the probability of event A is represented by P(A).

 If P(A) equals zero, event A will almost definitely not occur.


 If P(A) is close to zero, there is only a small chance that event A will occur.
 If P(A) equals 0.5, there is a 50-50 chance that event A will occur.
 If P(A) is close to one, there is a strong chance that event A will occur.
 If P(A) equals one, event A will almost definitely occur.

In a statistical experiment, the sum of probabilities for all possible outcomes is equal to one. This means, for
example, that if an experiment can have three possible outcomes (A, B, and C), then P(A) + P(B) + P(C) = 1.

How to Compute Probability: Equally Likely Outcomes

Sometimes, a statistical experiment can have n possible outcomes, each of which is equally likely. Suppose
a subset of r outcomes are classified as "successful" outcomes.

The probability that the experiment results in a successful outcome (S) is:

P(S) = ( Number of successful outcomes ) / ( Total number of equally likely outcomes ) = r / n

36
Consider the following experiment. An urn has 10 marbles. Two marbles are red, three are green, and five
are blue. If an experimenter randomly selects 1 marble from the urn, what is the probability that it will be
green?

In this experiment, there are 10 equally likely outcomes, three of which are green marbles. Therefore, the
probability of choosing a green marble is 3/10 or 0.30.

How to Compute Probability: Law of Large Numbers

One can also think about the probability of an event in terms of its long-run relative frequency. The relative
frequency of an event is the number of times an event occurs, divided by the total number of trials.

P(A) = ( Frequency of Event A ) / ( Number of Trials )

For example, a merchant notices one day that 5 out of 50 visitors to her store make a purchase. The next day,
20 out of 50 visitors make a purchase. The two relative frequencies (5/50 or 0.10 and 20/50 or 0.40) differ.
However, summing results over many visitors, she might find that the probability that a visitor makes a
purchase gets closer and closer to 0.20.

The idea that the relative frequency of an event will converge on the probability of the event, as the number
of trials increases, is called the law of large numbers.

Problem 1:

A coin is tossed three times. What is the probability that it lands on heads exactly one time?

(A) 0.125
(B) 0.250
(C) 0.333
(D) 0.375
(E) 0.500

Solution:

37
The correct answer is (D). If you toss a coin three times, there are a total of eight possible outcomes. They
are: HHH, HHT, HTH, THH, HTT, THT, TTH, and TTT. Of the eight possible outcomes, three have exactly
one head. They are: HTT, THT, and TTH. Therefore, the probability that three flips of a coin will
produce exactly one head is 3/8 or 0.375.

Problem 2:

A statistician conducts a random experiment several times and comes up with the data shown in the table
below. Based on this sample data, what should be the statistician's estimate of the probability that the
outcome of his next trial of the experiment will be 8?

Outcome Frequency
1 1
2 2
3 5
4 9
5 13
6 4
7 3
8 2

Solution: We learned that the probability of an event is equal to its relative frequency for a large (infinite)
number of trials. Although the data above is limited, the statistician can estimate the probability based on his
results. The relative frequency of the outcome 8 is simply the number 2 divided by the total number of trials
of the experiment--39 in this case. Thus, the statistician's estimate of the probability of 8 should be
approximately 0.05.

Basic Rules of Probability

Now that we have introduced some of the terms associated with probability, we can consider some basic
rules. First, recall that the probability of an outcome is equal to its relative frequency; also recall that the sum
of all the relative frequencies is unity (the cumulative relative frequency has a maximum value of 1). As a
result, the sum of all the probabilities associated with a particular sample space S for an experiment must

38
also be unity-in other words, the probability that an experiment yields some outcome from the sample space
is 1 (or 100%). Using the probability notation introduced above, we can write

P(S) = 1

Second, because of how we defined probability using relative frequency, the probability of any event E from
the sample space is between 0 and 1. We can express this rule as

0 ≤ P(E) ≤ 1

Third, if two events A and B are mutually exclusive, then the probability of the union of these events( )
is the sum of the probabilities of each event individually. That is,

If the events are not mutually exclusive (that is, they share some elements in common), then the probability
of their union is sum of the individual probabilities minus the probability of all elements in common (this is
just the intersection of A and B).

This formula is actually a more general expression of the preceding formula. Fourth and finally, the
probability of an event E is equal to unity minus the probability of the event's complement, EC. This
statement simply combines the facts that E and EC are mutually exclusive but span the entire sample
space S and that the probability of S is unity. Thus, using the rules above,

What are the Uses of Probability?

1. Probability is widely used in the Physical, Biological, and Social Sciences and in Industry and Commerce.

39
2. It is applied in diverse areas as Genetics, Quantum Mechanics, and Insurance.

3. It also involves deep and important theoretical problems in pure mathematics and has strong connections
with the theory, known as mathematical analysis, has developed out of calculus.

EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.

1. The Probability of a given event is an expression of likelihood or chance of occurrence of an event. ( )

2. The probability that an event will occur is expressed as a number between -1 and 1. ( )

3. Probability theory is being applied in the solution of social, economic, and political problems. ( )

4. In a statistical experiment, the sum of probabilities for all possible outcomes is equal to -1. ( )

5. The theory of Probability is not useful for Insurance studies. ( )

B. Fill in the space with word from the following list:

uncertainty one relative probability biological

1. In a statistical experiment, the sum of probabilities for all possible outcomes is equal to ………………..

2. The ……………………………………….. of an outcome is represented by a number between Zero and


One.

3. Probability is widely used in the physical, ………………………….. and social sciences and in Industry
and Commerce.

4. The …………………….. frequency of an event is the number of times an event occurs, divided by the
total number of trials.

C. Choose the Correct Answer from the following four words:

1. Probability is widely used in the …………………………………………

a. Physical Science b. Biological Science c. Social Science d. All the above

2. “Probability 0” indicating certainty that an event will …………………………..

40
a. Occur b. Not Occur c. May occur d. May not occur

Unit-9
Hypothesis testing

What is Hypothesis testing?

A Statistical hypothesis test is a method of making statistical decisions using experimental data.

The best way to determine whether a statistical hypothesis is true would be to examine the entire population.
Since that is often impractical, researchers typically examine a random sample from the population. If
sample data are consistent with the statistical hypothesis, the hypothesis is accepted; if not, it is rejected.

There are two types of statistical hypotheses.

 Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample
observations result purely from chance.

 Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that
sample observations are influenced by some non-random cause.

For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis
might be that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be that
the number of Heads and Tails would be very different. Symbolically, these hypotheses would be expressed
as

H0: P = 0.5
Ha: P ≠ 0.5

Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be
inclined to reject the null hypothesis and accept the alternative hypothesis.

Hypothesis Tests

41
Statisticians follow a formal process to determine whether to accept or reject a null hypothesis, based on
sample data. This process, called hypothesis testing, consists of four steps.

 State the hypotheses. This involves stating the null and alternative hypotheses. The hypotheses are
stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false.

 Formulate an analysis plan. The analysis plan describes how to use sample data to accept or reject the
null hypothesis. The accept/reject decision often focuses around a single test statistic.
 Analyze sample data. Find the value of the test statistic (mean score, proportion, t-score, z-score,
etc.) described in the analysis plan. Complete other computations, as required by the plan.
 Interpret results. Apply the decision rule described in the analysis plan. If the test statistic supports
the null hypothesis, accept the null hypothesis; otherwise, reject the null hypothesis.

Decision Errors

Two types of errors can result from a hypothesis test.

 Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The
probability of committing a Type I error is called the significance level. This probability is also

called alpha, and is often denoted by α.


 Type II error. A Type II error occurs when the researcher accepts a null hypothesis that is false. The

probability of committing a Type II error is called Beta, and is often denoted by β. The probability
of not committing a Type II error is called the Power of the test.

What is the use of Estimation theory and Hypothesis testing?

1. Estimation theory is helpful in estimating the values of parameters based on measured/empirical data.

2. Hypotheses Testing helps in making Statistical Decisions.

3. It is a very important tool used in Statistics.

42
EXERCISES
A. Put in the space ( X ) if it is False or ( √ ) if it is True.

1. A Statistical hypothesis test is a method of making statistical decisions using vague data. ( )

2. The best way to determine whether a statistical hypothesis is true would be to examine the entire
population. ( )

3. There are two types of statistical hypotheses that are null hypotheses and bull hypotheses. ( )

4. Hypotheses testing help in making statistical decisions. ( )

5. A Type I error occurs when the researcher rejects a null hypothesis when it is false. ( )

B. Fill in the space with word from the following list:

sample experimental null random alternative

1. A Statistical hypothesis test is a method of making Statistical Decisions using


…………………………………….. data.

2. For the Hypotheses testing researchers typically examine a ……………………………….. sample from
the population.

3. The ……………………………………. hypothesis is denoted by H0.


4 Statisticians follow a formal process to determine whether to accept or reject a null hypothesis, based on
…………………………………….. data.

5. There are two types of statistical hypotheses, that are Null hypotheses and
………………………………….. hypotheses.

C. Choose the correct answer from the following:

1. A Statistical hypothesis test is a method of making statistical decisions using


…………………………………………….

a. Incidental Data b. Sensible Data c. Experimental Data d. Magical Data

43
2. Hypotheses testing begins with ………………………………………….

a. Procedure b. Sample c. Consolidation d. Assumption

44

You might also like