0% found this document useful (0 votes)
570 views

Theory of Attributes

The document discusses the theory of attributes in statistics. It defines attributes as qualitative characteristics that cannot be directly measured, only observed as present or absent. It then provides notations for different classes defined by single or multiple attributes. The frequencies of these classes must be consistent, meaning no ultimate class frequency can be negative. Consistency is tested by checking if calculations based on the frequency relationships produce negative values.

Uploaded by

Kbharadwaj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
570 views

Theory of Attributes

The document discusses the theory of attributes in statistics. It defines attributes as qualitative characteristics that cannot be directly measured, only observed as present or absent. It then provides notations for different classes defined by single or multiple attributes. The frequencies of these classes must be consistent, meaning no ultimate class frequency can be negative. Consistency is tested by checking if calculations based on the frequency relationships produce negative values.

Uploaded by

Kbharadwaj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

B.Sc.

(III) Statistics Paper -I Unit-IV THEORY OF ATTRIBUTES

Introduction:

Literally, an attribute means a qualitative characteristic of an individual


which are not related to quantitative measurements. Examples of
attributes are health, honesty, blindness etc. They cannot be measured
directly. One may observe only the presence or absence of these
attributes.

Notations:

If only one attribute is studied, the population is divided into two


classes according to its presence or absence and such classification is
termed as division by dichotomy.

If a class is divided into more than two classes, such classification is


called manifold classification. Positive class which denotes the
presence of attribute is generally denoted by Roman letters general ly
A,B,….etc and the negative class denoting the absence of the attribute
and it is denoted by the Greek letters ,  ….etc For example, A
represents the attribute ‘Literacy’ and B represents ‘Criminal’ then
and  represents the ‘Illiteracy’ and ‘Not Crim inal’ respectively.

Classes and Class frequencies:

Different attributes, their sub -groups and combinations are called


different classes and the number of observations assigned to them are
called their class frequencies.

If two attributes are studied the n umber of classes will be 9 .

i.e. (A) , ( ), (B), ( ), (A ) ( ), (B), (AB) and N.

The chart given below illustrate it clearly.

(A) ()

(AB) (A ) (B) ( )


The number of observations or units belonging to class is known as its
frequency are denoted within bracket. Thus (A) stands for the
frequency of the class A and (AB) stands for the number objects
possessing the attribute both A and B.

The classes denoted by capital letters only are called positive classes
and their frequenc ies are known as positive class frequencies. E.g. (A),
(AB), (ABC) etc. Conventionally, N i.e. universe or population is taken
as positive class.

The classes denoted by Greek letters only are called negative classes
and their frequencies are known as negat ive class frequencies. E.g.
(), ( ), ( ) etc.

The classes denoted by Combination of Capital and Greek letters are


called Contrary classes and their frequencies are known as contrary
class frequencies. E.g. (A  ), (B), ( C) etc.

Order of Classes and Class frequencies:

A class represented by n attributes is called a class of order n and its


frequency is called the frequency of order n.

Thus,

(A), (B), () etc. are classes of order 1.

(AB), (A ), (B), ( ) etc. are classes of order 2.

(ABC), ( C), (A C) etc. are classes of order 3 and so on.

However the universe N is considered a class of order 0 .

The class represented by the attributes of highest order are called


ultimate classes and their frequencies are called ultimate class
frequencies. e.g.

In case of two attributes A and B, the classes (AB), (A  ), (B), ( ).

In case of three attributes A, B & C, the classes order three ultimate


classes.
Note :

1. There will be 2n and 2 n -1 positive and negative classes


respectively in case of n attributes.
2. There will be 3 n total classes in case of n attributes.
3. There will be 2 n ultimate classes in case of n attributes.

Proof:

1. Let the n attributes be A, B, ….., M.

Class Order Number of Positive Classes


0 n
C0 = 1 (N)
1 n
C1 As r attributes from n
2 n
C2 attributes can be selected
… in n C r ways.
r n
Cr

n n
Cn
Total n
C 0 + n C1 + n C 2 +…… n C r +……+ n C n = (1+1) n = 2 n
Similarly, in case of negative classes the number of negative classe s of
order 0 will be 0, so no. of negative classes = 2 n -1.

2. Let the n attributes be A, B, ….., M.

Class Number of Total Classes


Order Classes
0 n
C 0 =1 (N) 2 0 x nC0
As r attributes from n
1 n
C1 2 1 x n C1 attributes can be
2 n
C2 2 2 x nC2 n
selected in C r ways
… and every selected r
r n
Cr 2 r x nCr attributes has 2r
… forms.
n n
Cn 2 n x nCn
Total 2 0 . n C 0 +2 1 . n C1 +...+2 r . n C r +…+2 n . n C n =(1+2) n = 3 n
3. Let the n attributes be A, B, ….., M.
n
As n attributes from n attributes can be selected in C n ways and every
selected n attributes has 2 n forms.

So total no. of ultimate classes = 2 n x n C n = 2 n .


Relationship between the class frequencies:

The frequency of a lower order class can always be expressed in terms


of the higher order class frequencies.

i.e., N = ( A ) + ( ) = (B) + ( ) Or

(A) = (AB) + (A )

() = ( B) + ( )

(B) = (AB) + ( B)

( ) = (A ) + ( )

If we know the 2 n ultimate class frequencies we can find all the class
frequencies.

Note: To calculate class frequencies we can treat class symbols as


operators and are used like algebraic quantities as

(A) = A.N and ()=N, then (A) + ()=A.NN

Or N = (A+ .N or A+=1 or 1-A

e.g. (A = A .N = A(1-B).N = A.N – AB.N = (A) – (AB)

Fundamental Set

The set of 2 n class frequencies is called a Fundamental set provided


that the class frequencies are algebraically independent i.e. no one can
be expressed in terms of some or all of the others.

The set of all 2 n positive classes and 2 n ultimate classes are


fundamental sets.

Consistenc y of the data:

A set of class frequencies is said to be consistent if all of its class


frequencies conform with one another without ant contradiction
otherwise it is said to inconsisten t data and this quality is said to be
inconsistency.

In order to find out whether the given data are consistent or not we
have to apply a very simple test. The test is to find out whether any or
more of the ultimate class -frequencies is negative or not. If none of the
class frequencies is negative we can safel y calculate that the given
data are consistent (i.e the frequencies do not conflict in any way each
other). On the other hand, if any of the ultimate class frequencies
comes to be negative the given data are inconsistent.

Example: Given N = 2500, (A) = 420, (AB) = 85 and (B) = 670.

Find the missing values and comment on consistency of the data .

Solution: W e know that

N = (A) +( ) = (B) + ( ) ….(1)

(A) = (AB) + (A ) ….(2)

() = ( B) + ( ) ….(3)

(B) = (AB) + ( B) ….(4)

( ) = (A ) + ( ) ….(5)

From (2) 420 = 85 + (A  ) (A ) = 420 –85 = 335

From (4) 670 = 85 + ( B) (B) = 670 85 = 585

From (1) 2500 = 420 + ( ) () = 2500 420 = 2080

From (1) ( ) = 2500 670 = 1830

From (3) = 2080 = 585 + (  ) ( ) = 1495

Since all the ultimate class frequencies are non -negative, hence data is
consistent.

Example: Given N = 600, (A) = 250, (B) = 400 and (A ) = 200.

Do you find data consistent.

Solution: W e know that

N = (A) +( ) = (B) + ( ) ….(1)

(A) = (AB) + (A ) ….(2)

() = ( B) + ( ) ….(3)


From (2) 250 = (AB) + 200 (AB) = 250-200 = 50

From (1) 600 = 250 + () () = 600 250 = 350

From (3) 350 = 400 + ( ) ( ) = 350 - 400 = -50

Since one of the ultimate class frequenc y is negative [( ) = -50],


hence data is said to be inconsistent.

Mathematical Condition for C onsistency of the data:

The necessary and sufficient condition for the consistency of a set of


independent class frequencies is that none of the ultimate class -
frequencies should be negative.

Or

All the ultimate class frequencies must be non -negative.

Conditions of Consistency in term of positive class frequencies:

One Attribute:

Let the attribute is A, then there will be two ultimate class frequencies
(A) and (). Hence conditions are

(A)  0, and ….(1)

()  0  N - (A)  0  N  (A) or (A)  N ….(2)

Tw o Attributes:

Let two attributes are A and B, then there will be four ultimate class
frequencies (AB), (A ), (B) and ( ). Hence conditions are

(AB)  0 ….(1)

(A )  0 ….(2)

(B)  0 ….(3)

( ) 0 ….(4)

Now expressing these ultimate class frequencies in term of positive


class frequencies, we have

(AB)  0 ….(5)
(A )  0  (A) – (AB)  0 or (AB)  (A) ….(6)

(B)  0  (B) – (AB)  0 or (AB)  (B) ….(7)

( )  0  N – (A) - (B) + (AB)  0 or (AB)  (A)+(B)-N ….(8)

{As ( ) =  .N = (1-A)(1-B).N = 1.N–A.N–B.N+AB.N = N-(A)-(B)-(AB)}

Three Attributes:

Let three attributes are A,B and C then there will be eight ultimate
class frequencies (ABC), (AB ), (A C), ( BC), (A ), (B), (  C) and
( ). Hence conditions are

(ABC)  0 ….(1) (AB)  0 ….(2)

(A C)  0 ….(3) (BC)  0 ….(4)

(A )  0 ….(5) (B)  0 ….(6)

( C)  0 ….(7) ( )  0 ….(8)

Now expressing these ultimate class frequencies in term of positive


class frequencies, we have

(ABC)  0 ….(9)

(AB)  0 (AB) – (ABC)  0 or (ABC)  (AB) ….(10)

(A C)  0 (AC) – (ABC)  0 or (ABC)  (AC) ….(11)

(BC)  0 (BC) – (ABC)  0 or (ABC)  (BC) ….(12)

(A )  0 (A) – (AB) – (AC) + (ABC)  0

or (ABC)  (AB) + (AC) – (A) ….(13)

(B)  0 (B) – (AB) – (BC) + (ABC)  0

or (ABC)  (AB) + (BC) – (B) ….(14)

( C)  0  (C) – (AC) – (BC) + (ABC)  0

or (ABC)  (AC) + (BC) – (C) ….(15)

( )  0  N – (A) – (B) - (C) + (AB) + (AC) + (BC) - (ABC)  0


or (ABC)  N - (A) - (B) - (C) + (AB) + (BC) + (AC) ….(16)

Now combining (9) and (16)

0  (ABC)  N - (A) - (B) - (C) + (AB) + (BC) + (AC)

Or (AB) + (BC) + (AC)  (A) + (B) + (C) – N ….(17)

Now combining (10) and (15)

(AC) + (BC) – (C)  (ABC)  (AB)

Or (AC) + (BC) – (AB)  (C) ….(18)

Similarly, combining (11) and (14) & (12) & (13)

(AB) + (BC) – (AC)  (B) ….(19)

(AB) + (AC) – (BC)  (A) ….(20)

Hence, in case of three attributes, the conditions are

(AB) + (BC) + (AC)  (A) + (B) + (C) – N

(AC) + (BC) – (AB)  (C)

(AB) + (BC) – (AC)  (B)

(AB) + (AC) – (BC)  (A)

Tw o Way Contingency Table

A table which represents the classification according to the distinct


classes of two attributes ‘A’ and ‘B’ is called a two way Contingency
Table.

Suppose, the attribute A has m distinct classes denoted by A 1 , A 2 , …


Am

and attribute B has n distinct classes denoted by B 1 , B 2 , … B n .

Then there are in all m.n classes (cells) in the contingency table. The
frequency of a typical cell corresponding A i and B j is (A i B j ).

The contingency table will be as follows:


Attribute B
Total
B1 B2 … Bj … Bn

A1 (A 1 B 1 ) (A 1 B 2 ) (A 1 B j ) (A 1 B n ) (A 1 )

A2 (A 2 B 1 ) (A 2 B 2 ) (A 2 B j ) (A 2 B n ) (A 2 )
Attribute A

Ai (A i B 1 ) (A i B 2 ) (A i B j ) (A i B n ) (A i )

Am (A m B 1 ) (A m B 2 ) (A m B j ) (A m B n ) (A m )

Total (B 1 ) (B 2 ) (B j ) (B n ) N

Independence of Attributes:

If the attributes are said to be independent , the presence or absence of


one attribute does not affect the presence or absence of the other. For
example, the attributes skin colour and intelligence of persons are
independent. Mathematically, two attributes A and B are said to be
independent if the proportion of A’s among B’s and A’s among  ’s are
same. i.e.

( AB ) ( A )

( B) ( )

Fundamental Rule: If the attributes A and B are independent, then


proportion of AB’s in population is equal to the product of the
proportion of A’s and B’s in the population. i.e.

( AB ) ( A) ( B )
 .
N N N
If two attributes A and B are independent, then the value of (AB) mu st
( A).( B ) ( A).( B )
be equal to i.e. ( AB )  . This value of (AB) is called
N N
expected value of (AB) and the given value of (AB) is called observed
value.

Association of Attributes:

Two attributes A and B are said to be associated if they are not


independent but they are related with each other in some way or other.

The attributes A and B are said to be positively associated if

( A).( B )
( AB ) 
N

( A).( B )
If ( AB )  , then they are said to be negatively associated.
N

Example: Show that whether A and B are independent, positively

associated or negatively associated.

(AB) = 128, (B) = 384, (A ) = 24 and ( ) = 72

Solution:

(A) = (AB) + (A ) = 128 + 24

(A) = 152

(B) = (AB) + ( B) =128 +384

(B) = 512

() = ( B) + ( ) = 384 + 72

() = 456

(N) = (A) + () = 152 + 456 = 608

Now,

(A).(B)/N = =152x512/608=128=(AB)

Hence A and B are independent


Example:From the following data, find out the types of association of

A and B.

1) N = 200 (A) = 30 (B) = 100 (AB) = 15

2) N = 400 (A) = 50 (B) = 160 (AB) = 20

3) N = 800 (A) = 160 (B) = 300 (AB) = 50

Solution:

1. Expected frequency of (AB) = (A).(B) /N=(30)(100)/200=15

Since the actual frequency is equal to the expected frequency,

ie 15 = 15, therefore A and B are independent.

2. Expected frequency of (AB) = (A).(B) /N=(50)(160)/400= 20

Since the actual frequency is greater than expected frequency. i.e.,

25 > 20, therefore A and B are positively associated.

3. Expected frequency of (AB) = (A).(B) /N=(160)(300)/800=60

Since Actual frequency is less than expected frequency i.e., 50 < 60

therefore A and B are negatively associated.

Yule’s Co-efficient of Association:

To measure the intensity of association Prof. G. Undy Yule has


suggested a formula . It is a relative measure of association between
two attributes A and B.

If (AB), (B), (A ) and ( ) are the four distinct combination of A, B,
and  then Yules’ co-efficient of association is

( AB ).( )  ( A ).(B )
Q
( AB ).( )  ( A ).(B )

Note:

If Q = +1 there is perfect positive association

If Q = -1 there is perfect negative association

If Q = 0 there is no association (i .e.) A and B are independent


Example: Investigate the association between darkness of eye colour
in father and son from the following data.

Fathers’ with dark eyes and sons’ with dark eyes = 50

Fathers’ with dark eyes an sons’ with no dark eyes = 79

Fathers’ with no dark eyes and sons with dark eyes = 89

Neither son nor father having dark eyes = 782

Solution: Let A denote the dark eye colour of father and B denote

dark eye colour of son.

A 

B 50 89

 79 782

( AB ).( )  ( A ).(B )
Yules’ co-efficient of association is Q 
( AB ).( )  ( A ).(B )

50x 782  79 x89 46131


Q   0.69
50 x 782  79 x89 32069

there is a positive association between the eye colour of fathers’

and sons’ .

Example:

Can vaccination be regarded as a preventive measure of small pox


from the data given below.

Of 1482 persons in a locality, exposed to small pox, 368 in all were


attacked, among the 1482 persons 343 had been vaccinated among
these only 35 were attacked.

Solution: Let A denote the attribute of vaccination and B denote that

of attacked.

Q =0.57
i.e., there is a negative association between attacked and vaccinated.
In other words there is a positive association between not attacked and
vaccinated. Hence vaccination can be regarded as a preventive
measure for small pox.

Example:In a co-educational institution, out of 200 students, 150 were


boys. They took an examination and it was found that 120 passed, 10
girls failed. Is there any association between gender and success in the
examination.

Solution: Let A denote boys and denote girls. Let B denote those
who passed the examination and  denote those who failed.

W e have given N = 200 (A) = 150 (AB) = 120 (  ) = 10

Other frequencies can be obtained from the following table

Yule’ s co -efficient of association is

Q = 0

Therefore, there is no association between gender and success in the


examination.

Coefficient of Collignation

Yule has suggested another coefficient called the coefficient of


collingation Y, is given by

( A )(B )
1
( AB )( )
Y  , Y lies between -1 and +1.
( A )(B )
1
( AB )( )

Q can be defined in term of Y as

2Y
Q
1Y 2

Chi-Square Statistic ( 2 ) or Square Contingenc y

For a given 2x2 contingency table as


Attribute A  Total

B (AB) (B) (B)

 (A ) ( ) ( )

Total (A) () N

the  2 -statistic is given as

m n 
 O ij  E ij 
2

  
2
 , where O i j ’s and E i j ’s are observed and expected
i  1 j 1  E ij 

frequency respectively of (i,j)th cell in a m x n Contingency Table.

The main steps of  2 test as a test of Independence for a


contingenc y table are as follow s:

1. Null and Alternative Hypotheses

H0: The two attributes are independent

Vs

H1: The two attributes are associated.

2. Calculate the expected frequencies for each cell under the


assumption that the two attributes are independent.

3. Calculate the value of  2 by the formula.

4. Compare the calculated value of  2 with the tabulated value for


(m-1).(n-1) degree of freedom at % level of significance.

5. If  2 ( C a l ) > 2 ( t a b ) , the null hypothesis of independence is rejected at


% level of significance otherwise accepted.

Conditions for the Application of  2 - test

1. N, the total frequency must be large, (>50).


2. No theoretical cell frequency should be less than 5.

3. The constraints imposed on the cell frequencies must be


linear.
2
Mean square Contingenc y ( 2 )  2 
N

2 2
Pearson’s Coefficient of Contingency (C) C  
2  N 1 2

C lies between 0 and 1. It never attains the value 1.

2
Tschuprow ’s Coefficient (T ) T 
2 2
, W here r and c are no.
( r  1)(c  1)
of rows and columns respectively in contingency table.

You might also like