0% found this document useful (0 votes)
11 views

Hypothesis Testing (S-5)

The document discusses hypothesis testing, including one-tailed and two-tailed tests, critical and acceptance regions, and the concepts of Type I and Type II errors. It also explains the significance level, confidence level, and power of a statistical test, along with relevant formulas for calculating test statistics. Additionally, it covers the Neyman-Pearson Lemma and provides examples of calculating error sizes and power functions for specific hypotheses.

Uploaded by

u2204054
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Hypothesis Testing (S-5)

The document discusses hypothesis testing, including one-tailed and two-tailed tests, critical and acceptance regions, and the concepts of Type I and Type II errors. It also explains the significance level, confidence level, and power of a statistical test, along with relevant formulas for calculating test statistics. Additionally, it covers the Neyman-Pearson Lemma and provides examples of calculating error sizes and power functions for specific hypotheses.

Uploaded by

u2204054
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Hypothesis Testing (S-5)

* One- Talid Test: A test on-tailed when the native hypothesis H1 :


states a direction i.e one such as.
Ho: The mean income of women stockbrokers is $65,000 per year.
H1: The mean income of women stockbroker is greater or less that
$65,000 per years.
* Two-talid test: A test is two-taid when the alternative hypothesis
has no specified reaction that is has two taild such as
Ho : The mean income of women stockbrokers is $ 65.000 per
year.
H1: The mean income of women stockbrokers is $ 65.000 per
year.
* Critical region : The region of the values of test- statistic will lead
to rejection of Ho is called critical region.
* Acceptance region: The region of the values of test Statistic will
load to acceptance of Ho is called the acceptance again.
Critical Values

Critical Acceptance Critical


Region Region Region
Reject Ho (Accept Ho) Reject Ho

Large Z= 0 Large
Negative Z Z-Values Positive Z
Close to Zero

P-Value : The process compares the probability called the p-value with
the significance level, then Ho is rejected. If it is larger than the
significance level, then Ho is not reject the P-value is the lowest
2

level of significance of which he observed value of the test statistic


is significance.
* Type- I error: Rejection of the null hypothesis when it is true is
called, a type- I error.
Type-II error: Acceptance of the null hypothesis when it is false is
called a type-II error.
In testing any satisfied hypothesis are four pos situation that
determine whether our decision is correct or in error. These four
situation are summarized in the following table.
Ho is true Ho is false
Accept Ho Correct Decision Type- I error (p)
Reject Ho Type- I error   Correct Decision
* Level of significance   the probability of type-I error is known as
the level of significance of the test. It is also called the size of the
critical region.
* Confidence Level: The level of significance of a statistical test is
comparable to the probability of an error, also called   . The size
of a confidence interval (1-  ) is called a confidence level and
represents the complement of P(Type- I error)
1-  = confidence level
= 1-P (Type I error)
* Power of statistical test:
= P (accept Ho | Ho is false); Type- II error.
The complement of the probability is
1-  = P (reject Ho | Ho is falses)
The probality 1-B is known as the power of a statistical test.
3

Some Formula
1. For one mean: sample is large or small and  - is known, also
population are normally distributed.
x 
sample statistic Z
| n
x 
2.  is not known t
s n

3. For two mean:


(a)  - is known, z
x  x   
1 2 1  2 
1 2
2 2

n1 n2

(b)  - is unknown, z
x  x   M
1 2 1 M2
2 2
s1 s
 2
n1 n 2

Test of Hypothesis
Simple and composite hypothesis:
If the statistical hypothesis specifies the population completely then it is
called simple hypothesis, otherwise it called a composite statistical
hypothesis.
e.g: x1, x2......., xn be a random sample of size n from a normal
population with mean  and variance 2, then the hypothesis
Ho :   o,  2   0
2
is a simple hypothesis. On the other hand ...
  0 ,  2   0 2 is composite hypothesis.
A hypothesis which does not specify completely ‘r’ parameters of
population then it is called composite hypothesis with r degrees of
freedom.
Given the frequency function :
4

1 
 :0  x  
f  x.0   
0 ; otherwise 

and that you are testing the hypothesis Ho: 1.5 against  =2.5 by means
of single observation value (observed value) of x, what would be the
sizes of the type- I and type- II errors. If you chose the interval 0.8  x as
the critical region? Also obtain the power function of the test.
* Neyman- person Lemma:
Let k> 0, be a constant and w be a critical region of size  such that
 f  x,  1  
W  x s :  k
 f  x,  o  

 L1 
w=  x s :  k  1
 L0 
 L1 
and w =  x s :  k  2
 L0 
Where Lo and L1 are the likelihood functions of the sample observations
x  x1 , x2 ....... xn  under Ho and H1 respectively. Then w is the most power
ful critical region of the test hypothesis Ho :   0 against the alternative
H 1 ;   1

Proof : we are given


P  x W |Ho    Lo dx   (3)
W

The power of the region is


P  x W |H 1    L1dx 1   ( say )

In order to establish the lemma, we have to prove that there exist no


other critical region of size less than or equal to  , which is more of size
less than or equal to  , which is more powerful than w.
5

Let W1 be another critical region of size 1   and power 1  1 so that we


have :
P  x W1 | H o    L0 dx   1 5
W1

and Px W1 | H 1    L1dx 1  1 ( 6)


W1

Now we have to show that 1   1  1

W1
W
A C B

Let w = AUC and w1 = BUC


(C-may be empty i.e. W and W1 may be disjoint,
if 1   , we have
 Lo dx   Lo dx
W1 W1

  Lo dx   Lo dx
B C AUC

  Lo dx   Lo dx
B A

  Lo dx   Lo dx (7)
A B

Since A W, Then


(1) implies
 L1dx  k  Lo dx  K  Lo dx (8) using (7)
A B B

Also (2) implies


L1
 K  xw
Lo

=   L1dx  k  Lo dx
w w
6

This result also holds for any subject of w. say w  w1  B hence


 L1dx  k  Lo dx   L1dx [using (8)]
B B A

Adding c L1dx to both sides, we get


 L1dx   L1dx
w1 w

= 1    1  1
Hence proved the lemma

* Given the frequency function:


1 
 ; 0  x  
f  x,0   
0 : elsewhere

and that you are testing the null hypothesis Ho:  1 against H1: 2, by
means of single observed value of x. what would be the sizes of the
type- I and type- II errors, if you choose the interval (i) 0 .5  x (ii) 1 x 1.5
as the critical regions? Also obtain the power function of the lest.
Ans: Here we want to test
H o : 1 against H 1 :  2

(i) W  x : 0.5  x x : x  0.5

and W  x : x  0.5

  P x w | Ho  Px  0.51  1

= P0.5  x  |  1
= P0.5  x 1|  1
1
=  [ f  x,   1 dx
0.5

1
=  1.dx  0.5
0.5
7

Similarly 
  P x w | H 1 
= Px  0.5|  2
0.5
=  [ f  x. ]   2 dx
0

0.5
1
=  dx
0 2
1
=  0 .5
2
= 0.25
Thus the sizes of type-I and type- II errors are respectively
  0.5,   0.25

(ii) w= x : 1  x 1.5
  Px W | o 1
1.5
=   f  x1   1 dx  0
1

Since under Ho : 1, f  x,   0, for 1 x  1.5


  P x  w |  2 
= 1  p x  w |   2 
1.5
= 1   [ f ( x, )]   2 dx
1.5

1.5
= 1  .5 dx
1

= 1  1 x1.5
2

= 1  1 x1.51
2

= 1  1 x.5
2
= 1- 0.25
= 0.75
Hence power function = 1   1  0.75  0.25
8

* If x 1 is the critical region for testing H o :  2 against the alternative


 1 , on the basis of the single observation from the population
f  x1 ,   e x , o  x  

obtain the values of type- I and type- II errors


Ans: Here, W  x : x  1

and W  x : x  1

Also Ho :   2
Ho :   2

Now   Px  w | Ho

= Px 1|  2

=   f x1    2 dx
1


=  2 e 2 x dx
1


 e 2 x 
= 2 
 2 
= e 2
1
=
e2
 
  P x w H 1  P  x 1 | 1

  e 1
1
1
 2 e  2 x dx   e  x 0
1
1  e 1 1  =
0 e e
P-1 : Let us consider x  50.58 , S= 1.65 . Find  for   51, n  40
Ans: Given that x  50.58
S= 1.65
  51
n= 40.
The Probability of type II error   P ( H o accept | Ho False)
= P (Ho accept | H1 true)
9

= P x  50.43|   51
 x  51 50.43  51 
= P   

 1 . 65 | 40 1 . 65 / 40 
= Pz   2.18
= 0.0146
Power of test 1   1  0.0146  0.9854 for H 1 :   51
P-2 : The average weight and s.d of detergent packet of a detergent production
company is 500 gm and 4 gm respectively. A government agency receives many
consumer complains that each packet contains less than 500 gm. To check the
consumers complain at the 5% level of signification, the government agency bags
100 packets of this detergent and find that mean weight 498.5 gm. should the
government agency order the company to puck more detergent in its packets.
Ans: Here   500,   4, x  498.5, n 100
H o :   500
H 1 :   500
  0.05
Z 0.005  1.64

x  o
Z
| n
498.5  500
=
4 100
= -3.75
Since Z < 2005
Hence Ho rejected. Therefore we conclude that the average weight of each packet
less that 500 gm. Hence government agency will be ordered the company to put
more detergent in its packets.
P-3: The s.d of the weight of 100 gm bread made by a certain bakery is 1 gm. On a
certain day the owner claims that the production is out of control. To check
whether its production is under control, employees select a random sample of 25
10

breads and find that there mean weight is 99.5 gm. Test the claim of the owner at
5% and 1% level of significance. Also compute the P-value.
Ans: Let the weight of bread is 
 1, x  99.5 gm n= 25
H o :  100
H 1 :   100
  0.05
Since it is too-taid test
 Z 0.025 = -1.96, Z 0.0975 = 1.96
x  o
Test Statistic, Z   N 0.1
/ n
99.5  100

1
25

0 .5

1
5

= 2.5
Since Z < z0.025, hence Ho rejected at 5% level of significance.
Therefore the average weight of bread is less that 100 gram
At 1% level of significance z.005 = -2.575 and z.995 = 2.575
Since z > z0.005
Hence Ho accepted at 1% level of significance.
P-Value: probability of test statistic below – 2.5 = 0.0062
probability of test statistic above – 2.5 = 0.0062
 In two tail test p-value > 0.0062+ 0.0062 = 0.0124
Since, p-value > 0.01 i.e p-value < 0.05
Thus at 5% level of significance Ho:  = 100 rejected and H1 :  100 accepted
 The production is out of control of that day.
P-4: A medical doctor sensed that the patients with lung cancer in the age groups
40-45 had smoked average more than 20 years. A sample of 20 patients gave the
following years of smoking.
11

22.0 21.3 19.6 19.6 21.4 24.0 25.9 19.7 25.5 25.1
22.2 21.5 19.8 22.5 24.5 20.5 19.8 25.0 23.8 25.7
using a 1% level of significance is their sufficient evidence to justify the doctor is
belief?
Ans Ho :   20
H1 :   20
  0.01
At   0.01 t0.01 = 2.539 with n-1 = 19 degree of freedom
x  22.47
n  20
  20
S  2.28
x  o 22.47  20 2.47
t    4.84
s | n 2.28 / 20 0.0703
Since t  t 0.01
Ho rejected
Therefore conclude that lung cancer patients in the age group 40-45 had
smoked on average more than 20 years.

Sampling and Sampling Distribution


Population : Population is the aggregate or totality of statistical data forming a
subject of investigation.
Example (i) The Population of books in the national libr.
(ii) The population of the heights of Bangladesh.
(iii) The population of nationalized Books in Bangladesh.
Sample: A sample is a portion of the population which is examined with a view to
estimating the characteristic of the population.
Example: (i) To access the quality of a bag of rice, we examine only a portion of
it.
12

(ii) To estimate the proportion of defective articles in a large


consignment, only a portion is selected and examined.
Sampling : The results and findings are generalized and made applicable to the
whole fields of inquiry is based on two important principles of statistics.
(i) Law of statistical regularity (LSR): LSR states that, a moderately large
number of items selected at random from a given population exhibits
nearly the same compositon and characteristics of the population.
(ii) Law of Inertia of Large Numbers (LILNS): LILNS states that, other
things remaining the same the larger the size of sample, the more
accurate is the result obtained. It is the corollary to the law of statistically
regularity.
At first one can classify the sampling under to separate heads, such as
(i) Probability sampling methods.
(ii) Non- Probability sampling methods.
The probability sampling methods are classified as :
(i) Random sampling
(ii) stratified sampling
(iii) Systemtic sampling
(iv) Multi-stage sampling
Random Sampling : Random sampling is the process of drawing a sample from a
population in such a way that each number of population has an equal chance of
being included in the sample. The sample obtained by the process of random
sampling is called a random sample.
According to W.H Harper, cc “A random sample is a sample selected in such a
way that every item in the population has an equal chance of being included”.
Example: Winning tickets in a lottery are drawn at random by taking numbers out
of revolving dram. The dram tickets form a random sample selection of items in a
random sample depends entirely on chance.
Random sampling is a scientific process in which sampling error can be measured.
It is economical in that it saves time, money and labour
13

* Illustrate how to draw random sample of 5 bags of rice from the 10,000
shipload.
Ans:
Method : All the nice bags are first serially numbered from 1 to 10.00. Each of
these numbers are then written on 10,000 small identical cards. The cards are
placed inside a box and 5 amongst them are chosen as in the lottery. The bags
bearing the numbers corresponding to those on the drawn cards then comprising
our random from the whole lot.
Construction of a sample 5 bags
Note : Given an extract from one of these two- digit series.
43 10 53 74 35 08 90 61 18 37 44 10
50 32 40 43 62 23 50 05 10 03 22 11
51 94 05 17 58 53 78 80 59 01 94 32
13 99 75 53 08 70 94 25 12 58 41 54

The 10000 bags of nice of the ship are nubered 1,2, .........99, 100, 101, ..... 999,
1000, .....9998, 9999, 10000 From a page of random numbers we now select any
row and column and take 5 four-digit figures successively as they occur. Thus if
we start from the digit in the first and first column of the above series in random
number and move horizontally. Our selected numbers will be 4310, 5374, 3508,
9061, 1837
Hence the sample of 5 bags rice construct by the bags numbered by 4310, 5374,
3508, 9061, 1837.

Stratified Sampling: Stratified sampling is generally used when the population is


hetrogeneous. In this case, the population is heterogeneous. In this case, the
population is first sub- divided into several parts called strate according to some
relevant characteristics so that each stratum is more or less homogeneous. Each
stratum is called a sub- population. Then a small sample is selected from each
stratum at random. All the sub-samples combined together from the stratified
sample. This represents the population property. The process of obtaining and
14

examining a stratified sample with a view to estimating the characteristic of


population is known as satisfied sampling.
Illustration: Let us select a stratified sample of 100 families from a city with a view
to studying area is divided into a number of strata, according to economics
condition of their inhabit ants as measured by annual income (say). Thus localities
mostly inhabited by people with more or less similar annual income may be
included under one stratum. A few families are them chosen at random from each
so that the sum total of all the families from all the strata is 100.
It will give better estimates of the population characteristics
System Sampling: In this case, all the units of the population are arranged in some
order. If the population size is finite, all the unites are first serially listed and
arrange in order. Then from the first K items, one unit is selected at random. This
unit and every kth unit of teh serially listed population combined together
constitute listed a systematic sample. This type of sampling is known as systematic
sampling.
Illustration : Suppose it is desired to select a systematic sample of 500 village from
a state having 40,000 village. A list of all the villages which is our population, is
first prepared are these are serially numbered. In this case the population size is 80
times the sample size. A number between 1 and 80, say 60, is selected at random
with the help of random numbers. Then the village with serial number 60 is the
first number of our sample. After the first number is drawn, every subsequent 80 th
village in the list,i.e those with serial numbers 140, 220, 300, 380 ... etc are
included in the sample.
In case of random sample all the numbers have to be chosen randomly. Where as
in systematic sample only the first number has to be chosen at random. Thus a
systematic sample is much easier to chosse than a random sample of the same size.

You might also like