0% found this document useful (0 votes)
38 views

Basic Statistics: Shyam Karmakar

Basic Statistical Maths
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Basic Statistics: Shyam Karmakar

Basic Statistical Maths
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 64

Basic Statistics

Shyam Karmakar

May 5, 2005
Contents
1. Descriptive and Inferential Statistics
2. Scale of Measurement
3. Percentiles and Quartiles
4. Measure of Central Tendency
5. Measure of Variability
6. Grouped Data and the Histogram
7. Relations between the Mean and Standard Deviation
8. Methods of Displaying Data
9. Correlation and Regression
10. Cross Tabulation
11. Probability
Copyright © 2004 marketRx, Inc. All rights reserved.
2
WHAT IS STATISTICS?

 Statistics is a science that helps us make better decisions


in business and economics as well as in other fields.

 Statistics teaches us how to summarize, analyze, and


draw meaningful inferences from data that then lead to
improve decisions.

 These decisions that we make help us improve the


running a department, a company, the entire economy,
etc.

Copyright © 2004 marketRx, Inc. All rights reserved.


3
Samples and Populations

 A population consists of the set of all measurements for


which the investigator is interested.

 A sample is a subset of the measurements selected from


the population.

 A census is a complete enumeration of every item in a


population.

Copyright © 2004 marketRx, Inc. All rights reserved.


4
Using Statistics (Two Categories)

 Descriptive Statistics  Inferential Statistics


 Collect  Predict and forecast values
 Organize of population parameters
 Summarize  Test hypotheses about values
 Display of population parameters
 Analyze  Make decisions

Copyright © 2004 marketRx, Inc. All rights reserved.


5
Types of Data - Two Types

 Qualitative - Categorical  Quantitative - Measurable


or Nominal: or Countable:
Examples are- Examples are-
 Color  Temperatures
 Gender  Salaries
 Nationality  Number of points scored
on a 100 point exam

Copyright © 2004 marketRx, Inc. All rights reserved.


6
Scales of Measurement
• Nominal Scale - groups or classes
 Gender

• Ordinal Scale - order matters


 Ranks (top ten videos)

• Interval Scale - difference or distance matters – has


arbitrary zero value.
 Temperatures (0F, 0C)

• Ratio Scale - Ratio matters – has a natural zero value.


 Salaries

Copyright © 2004 marketRx, Inc. All rights reserved.


7
Percentiles and Quartiles

 Given any set of numerical observations, order them


according to magnitude.

 The Pth percentile in the ordered set is the value that has
P% (P percent) of the data points below it and (100 – p)%
above it.

 The position of the Pth percentile is given by (n + 1)P/100,


where n is the number of observations in the set.

Copyright © 2004 marketRx, Inc. All rights reserved.


8
Example 1
A large department store collects data on sales made by each of its salespeople.
The number of sales made on a given day by each of 20 salespeople is shown on
the next slide. Also, the data has been sorted in magnitude.
Sales Sorted Sales
9 6
6 9
12 10
10 12
13 13
15 14
16 14
14 15
14 16
16 16
17 16
16 17
24 17
21 18
22 18
18 19
19 20
18 21
20 22
17 24

Copyright © 2004 marketRx, Inc. All rights reserved.


9
Example (Contd.) Percentiles
 Find the 50th percentiles of this data set.

 To find the 50th percentile, determine the data point in position


(n + 1)P/100 = (20 + 1)(50/100) = 10.5.

 Thus, the percentile is located at the 10.5th position.

 The 10th observation is 16, and the 11th observation is also 16.

 The 50th percentile will lie halfway between the 10th and 11th
values (which are both 16 in this case) and is thus 16.

Copyright © 2004 marketRx, Inc. All rights reserved.


10
Quartiles

 Quartiles are the percentage points that break down the ordered
data set into quarters.

 The first quartile, Q1, (25th percentile) is often called the


lower quartile. It is the point below which lie 1/4 of the data.

 The second quartile, Q2, (50th percentile) is often called the


median or the middle quartile. It is the point below which lie 1/2
of the data.

 The third quartile, Q3, (75th percentile) is often called the


upper quartile. It is the point below which lie 3/4 of the data.

Copyright © 2004 marketRx, Inc. All rights reserved.


11
Example 2 : Finding Quartiles
Sales Sorted Sales (n+1)P/100 Quartiles
9 6 Position
6 9
12 10
10 12
13 13 First Quartile (20+1)*25/100=5.25 13 + (.25)(1) = 13.25
15 14
16 14
14 15
14 16
16 16 Median (20+1)*50/100=10.5 16 + (.5)(0) = 16
17 16
16 17
24 17
21 18
22 18 Third Quartile (20+1)*75/100=15.75 18+ (.75)(1) = 18.75
18 19
19 20
18 21
20 22
17 24

Copyright © 2004 marketRx, Inc. All rights reserved.


12
Summary Measures: Population Parameters Sample Statistics

 Measures of Central Tendency  Measures of Variability


 Median  Range
 Mode  Interquartile range
 Mean  Variance
 Standard Deviation

 Other summary measures:


 Skewness
 Kurtosis

Copyright © 2004 marketRx, Inc. All rights reserved.


13
Measures of Central Tendency or Location

Median  Middle value when sorted


in order of magnitude
 50th percentile or 2nd quartile

Mode  Most frequently- occurring value

Mean  Average

Copyright © 2004 marketRx, Inc. All rights reserved.


14
Measures of Location
 Median of a sample is the middle value when the data are
arranged in ascending or descending order.

If the number of data points is even, the median is usually


estimated as the midpoint between the two middle values –
by adding the two middle values and dividing their sum by 2.

 Mode is the value that occurs most frequently. It represents


the highest peak of the distribution.

The mode is a good measure of location when the variable is


inherently categorical or has otherwise been grouped into
categories.

Copyright © 2004 marketRx, Inc. All rights reserved.


15
Example – Median
Sales Sorted Sales
9 6
6 9
12 10
10 12
13 13
15 14 Median
16 14 50th Percentile
14 15
14 16
16 16
17 16 Median (20+1)50/100=10.5 16 + (.5)(0) = 16
16 17
24 17
21 18 The median is the middle value of
22 18
18 19 data sorted in order of magnitude.
19 20 It is the 50th percentile.
18 21
20 22
17 24

Copyright © 2004 marketRx, Inc. All rights reserved.


16
Example - Mode

.
. . . . . : . : : : . . . . .
---------------------------------------------------------------
6 9 10 12 13 14 15 16 17 18 19 20 21 22 24

Mode = 16

The mode is the most frequently occurring value. It is the value


with the highest frequency.

Copyright © 2004 marketRx, Inc. All rights reserved.


17
Measures of Location
 Mean, or average value, is the most commonly used
measure of central tendency.
The mean of a set of observations is their average - the
sum of the observed values divided by the number of
observations.

 The mean, X ,is given by


n
X = S X i /n
i =1

Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)

Copyright © 2004 marketRx, Inc. All rights reserved.


18
Example – Mean
Sale Weight W*S
s 1 9
9 3 18 n
6
12
2
2
24
20 x 317
10 1
3
13
45
x= i =1
= = 1585
.
13
15 1 16
n 20
16 2 28
14 2 28
14 3 48
16 3 51
17 2 32
16 2 48
24 1 21 Weighted average or mean is calculated as follows:
21 3 66
22 2 36 n 667
X = S wi X i / S wi = 42 = 15.88
18 1 19
19 3 54
18 2 40 i =1
20 3 51
17
317 42 667

Copyright © 2004 marketRx, Inc. All rights reserved.


19
Example - Mean and Mode

.
. . . . . : . : : : . . . . .
---------------------------------------------------------------
6 9 10 12 13 14 15 16 17 18 19 20 21 22 24

Mean = 15.85
Median and Mode = 16

Copyright © 2004 marketRx, Inc. All rights reserved.


20
Measures of Shape
 Skewness. The tendency of the deviations from the mean
to be larger in one direction than in the other.
It can be thought of as the tendency for one tail of the
distribution to be heavier than the other.

 Kurtosis is a measure of the relative peakedness or


flatness of the curve defined by the frequency distribution.
The kurtosis of a normal distribution is zero. If the kurtosis is
positive, then the distribution is more peaked than a normal
distribution. A negative value means that the distribution is
flatter than a normal distribution.

Copyright © 2004 marketRx, Inc. All rights reserved.


21
Skewness and Kurtosis
 Skewness
 Measure of asymmetry of a frequency distribution
• Skewed to left
• Symmetric or unskewed
• Skewed to right

 Kurtosis
 Measure of flatness or peakedness of a frequency distribution
• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)

Copyright © 2004 marketRx, Inc. All rights reserved.


22
Skewness
Skewed to left Skewed to right

Symmetric

Copyright © 2004 marketRx, Inc. All rights reserved.


23
Skewness of a Distribution

Symmetric Distribution

Skewed Distribution

Mean
Median
Mode
(a)

Mean Median Mode


Copyright © 2004 marketRx, Inc. All rights reserved.
24
(b)
Kurtosis
Platykurtic - flat distribution Leptokurtic - peaked distribution

Mesokurtic - not too flat and not too peaked

Copyright © 2004 marketRx, Inc. All rights reserved.


25
Measures of Variability or Dispersion

 Range
 Difference between maximum and minimum values

 Interquartile Range
 Difference between third and first quartiles (Q3 - Q1)

 Variance
 Average*of the squared deviations from the mean

 Standard Deviation
 Square root of the variance

 Definitions of population variance and sample variance differ slightly.


Copyright © 2004 marketRx, Inc. All rights reserved.
26
Measures of Variability
 Range measures the spread of the data.
It is simply the difference between the largest and smallest
values in the sample. Range = Xlargest – Xsmallest.

 Interquartile range is the difference between third and first


quartiles (Q3 - Q1) or the 75th and 25th percentile.

Copyright © 2004 marketRx, Inc. All rights reserved.


27
Example - Range and Interquartile Range
Sorted
Sales Sales Rank Range: Maximum - Minimum =
9 6 1 Minimum 24 - 6 = 18
6 9 2
12 10 3
10 12 4
13 13 5
15 14 6 First Quartile Q1 = 13 + (.25)(1) = 13.25
16 14 7
14 15 8
14 16 9
16 16 10
17 16 11
16 17 12
24 17 13
21 18 14
22 18 15
18 19 16 Third Quartile Q3 = 18+ (.75)(1) = 18.75
19 20 17
18 21 18 Interquartile Q3 - Q1 =
20 22 19 Range: 18.75 - 13.25 = 5.5
17 24 20 Maximum

Copyright © 2004 marketRx, Inc. All rights reserved.


28
Measures of Variability
 Variance is the mean squared deviation from the mean.

n 2
(Xi - X)
Var x = S
i =1 n - 1
The variance can never be negative.

 Standard deviation is the square root of the variance.

Sx = Square root (Var x )


 Coefficient of variation is the ratio of the standard deviation to the
mean expressed as a percentage, and is a unitless measure of relative
variability.

CV = s x/X
Copyright © 2004 marketRx, Inc. All rights reserved.
29
Variance and Standard Deviation
Population Variance Sample Variance

(x - x)
n
N 2

(x - m) 2

s =
2 i =1

s 2 = i=1
N
(n - 1)
( )
2

( x)
2
N n
 x
i =1
N
x -
n

x - 2 i =1 2

= n
i =1
= i=1 N
N (n - 1)
s= s 2

s= s 2

Copyright © 2004 marketRx, Inc. All rights reserved.


30
Calculation of Sample Variance
x x-x (x - x) 2 x2 n

(x - x)
2
6 -9.85 97.0225 36 378.55
s =
2 i =1
=
9
10
-6.85
-5.85
46.9225
34.2225
81
100 (n - 1) (20 - 1)
12 -3.85 14.8225 144 378.55
13 -2.85 8.1225 169 = = 19.923684
14 -1.85 3.4225 196 19
14 -1.85 3.4225 196
 n x
2

15 -0.85 0.7225 225


n  i =1 
16
16
0.15
0.15
0.0225
0.0225
256
256
 x - 2

=
i =1 n
16
17
0.15
1.15
0.0225
1.3225
256
289 (n - 1)
17 1.15 1.3225 289 2
100489
317
18 2.15 4.6225 324 5403 - 5403 -
18 2.15 4.6225 324
= 20 = 20
19
20
3.15
4.15
9.9225
17.2225
361
400
(20 - 1) 19
21 5.15 26.5225 441 5403 - 5024.45 378.55
22 6.15 37.8225 484 = = = 19.923684
24 8.15 66.4225 576 19 19
317 0 378.5500 5403 s = s = 19.923684 = 4.46
2

Copyright © 2004 marketRx, Inc. All rights reserved.


31
Relations between the Mean and Standard Deviation

 Chebyshev’s Theorem
 Applies to any distribution, regardless of shape
 Places lower limits on the percentages of observations within a given
number of standard deviations from the mean

 Empirical Rule
 Applies only to roughly bell-shaped and symmetric distributions
 Specifies approximate percentages of observations within a given number
of standard deviations from the mean

Copyright © 2004 marketRx, Inc. All rights reserved.


32
Chebyshev’s Theorem
 1 
1 - 
 At least  of
the elements of any distribution lie within k
k2  
standard deviations of the mean

1 1 3
1 - 2 = 1 - = = 75%
2 4 4 2
Standard
At 1 1 8 Lie
1 - 2 = 1 - = = 89% 3 deviations
least 3 9 9 within of the mean
1 1 15
1- 2 = 1 - = = 94% 4
4 16 16

Copyright © 2004 marketRx, Inc. All rights reserved.


33
Empirical Rule
 For roughly bell-shaped and symmetric distributions,
approximately:

68% 1 standard deviation


of the mean

95% Lie 2 standard deviations


within of the mean

All 3 standard deviations


of the mean

Copyright © 2004 marketRx, Inc. All rights reserved.


34
Group Data and the Histogram
 Dividing data into groups or classes or intervals

 Groups should be:


 Mutually exclusive
• Not overlapping - every observation is assigned to only one group

 Exhaustive
• Every observation is assigned to a group

 Equal-width (if possible)


• First or last group may be open-ended

 A histogram is a chart made of bars of different heights.


 Widths and locations of bars correspond to widths and locations of data groupings
 Heights of bars correspond to frequencies or relative frequencies of data groupings

Copyright © 2004 marketRx, Inc. All rights reserved.


35
Frequency Histogram

8
7
6
5
Frequency

4
3
2
1
0
2 3 4 5 6 7
Familiarity
Copyright © 2004 marketRx, Inc. All rights reserved.
36
Histogram Example
Frequency Histogram

Relative Frequency Histogram

Copyright © 2004 marketRx, Inc. All rights reserved.


37
Frequency Distribution
 Table with two columns listing:
 Each and every group or class or interval of values
 Associated frequency of each group
• Number of observations assigned to each group
• Sum of frequencies is number of observations
– N for population
– n for sample

 Class midpoint is the middle value of a group or class or interval

 Relative frequency is the proportion of total observations in


each class
 Sum of relative frequencies = 1

Copyright © 2004 marketRx, Inc. All rights reserved.


38
Example 3: Frequency Distribution

x f(x) f(x)/n
Spending Class ($) Frequency (number of customers) Relative Frequency

0 to less than 100 30 0.163


100 to less than 200 38 0.207
200 to less than 300 50 0.272
300 to less than 400 31 0.168
400 to less than 500 22 0.120
500 to less than 600 13 0.070

184 1.000

• Example of relative frequency: 30/184 = 0.163


• Sum of relative frequencies = 1

Copyright © 2004 marketRx, Inc. All rights reserved.


39
Cumulative Frequency Distribution

x F(x) F(x)/n
Spending Class ($) Cumulative Frequency Cumulative Relative Frequency

0 to less than 100 30 0.163


100 to less than 200 68 0.370
200 to less than 300 118 0.641
300 to less than 400 149 0.810
400 to less than 500 171 0.929
500 to less than 600 184 1.000

The cumulative frequency of each group is the sum of the


frequencies of that and all preceding groups.

Copyright © 2004 marketRx, Inc. All rights reserved.


40
Frequency Distribution of Familiarity with the Internet

Valid Cumulative
Value label Value Frequency (N) Percentage percentage percentage

Not so familiar 1 0 0.0 0.0 0.0


2 2 6.7 6.9 6.9
3 6 20.0 20.7 27.6
4 6 20.0 20.7 48.3
5 3 10.0 10.3 58.6
6 8 26.7 27.6 86.2
Very familiar 7 4 13.3 13.8 100.0
Missing 9 1 3.3

TOTAL 30 100.0 100.0

Copyright © 2004 marketRx, Inc. All rights reserved.


41
Methods of Displaying Data
 Pie Charts
 Categories represented as percentages of total

 Bar Graphs
 Heights of rectangles represent group frequencies

 Frequency Polygons
 Height of line represents frequency

 Ogives
 Height of line represents cumulative frequency

 Time Plots
 Represents values over time

Copyright © 2004 marketRx, Inc. All rights reserved.


42
Pie Chart

Figure 1-10: Twentysomethings split on job satisfication


Category
Don't like my job but it is on my career path
Job is OK, but it is not on my career path
Enjoy job, but it is not on my career path
My job just pays the bills
Happy with career

6.0% Do not like my job, but it is on my career path

Happy with career


19.0%
33.0%
Job OK, but it is not on my career path

19.0%
Enjoy job, but it is not on my career path
23.0%
My job just pays the bills

Copyright © 2004 marketRx, Inc. All rights reserved.


43
Bar Chart (vertical)

Figure 1-11: SHIFTING GEARS


Quartely net income for General Motors (in billions)

1.5

1.2

0.9

0.6

0.3

0.0
1Q 2Q 3Q 4Q 1Q
2003 C4 2004

Copyright © 2004 marketRx, Inc. All rights reserved.


44
Bar Chart (horizontal)

PRESCRIBING BEHAVIOR (% OF PATIENTS)

Medications Currently on Medication


Topical steroids 99%
Ointments 95%
Creams 95%
Foams 92%
Shampoos 90%
Lotions 66%
Other Topicals 97%
Dovonex 97%
Tazorac 81%
Other 26%
Phototherapies 93%
Sunlight 67%
Home light 57%
UVB, broadband 54%
PUVA 49%
Suntanning beds 43%
UVB, narrow band 33%
UVA 21%
System ic m edications 99%
Biologics (Enbrel, Etc.) 95%
Soriatane (acitretin) 88%
Methotrexate (Trexall) 82%
Cyclosporine (Neoral ) 44%
Other systemics 8%
Other Medications 4%

Copyright © 2004 marketRx, Inc. All rights reserved.


45
Cluster Bars (vertical)

Actual vs. Preferred Length of Drug Discussion

6.0 5.6

5.0
4.2
Mean minutes

4.0 3.4
2.6 2.8
3.0 2.4 2.4
1.7 1.6 1.6
2.0 1.3
1.2 1.2 1.0 1.21.1
1.0

0.0
Remicade Enbrel Humira Kineret Amevive Raptiva Asacol Pentasa

Actual Minutes Preferred Minutes

Copyright © 2004 marketRx, Inc. All rights reserved.


46
Stacked Bar Chart

Share of Patients within the Drug Prescribed

100%

28%
80% 41% 38%
Percent of patients

14%
60%
12%
22%
40%

58%
50%
20% 38%

0%
Videx EC Zerit_d4T Sustiva
Drug

Asymptomatic Symptomatic no AIDS AIDS

Copyright © 2004 marketRx, Inc. All rights reserved.


47
Gap Analysis

Crohn's Post Allocation/ Post


Humira Approval

30%

20%
% Change in Share

13% 11%
10%
1%
0%
Remicade Humira Antegren Other
-10%

-20%

-30% -24%

Copyright © 2004 marketRx, Inc. All rights reserved.


48
Frequency Polygon and Ogive

Relative Frequency Polygon Ogive

0.3
1.0

0.2

0.5

0.1

0.0
0.0
0 10 20 30 40 50
0 10 20 30 40 50
Sales Sales

(Cumulative frequency or
relative frequency graph)

Copyright © 2004 marketRx, Inc. All rights reserved.


49
Time Plot
M o n t h l y S t e e l P r o d u c t io n

8 .5
Millions of Tons

7 .5

6 .5

5 .5

M o n th J F M AM J J A S ON D J F M AM J J A SON D J F M AM J J AS O

Copyright © 2004 marketRx, Inc. All rights reserved.


50
Scatter Plots

• Scatter Plots are used to identify and report any underlying


relationships among pairs of data sets.
• The plot consists of a scatter of points, each point representing
an observation.

Copyright © 2004 marketRx, Inc. All rights reserved.


51
Scatter Plots

• Scatter plot with


trend line.
• This type of
relationship is
known as a
positive
correlation.

Copyright © 2004 marketRx, Inc. All rights reserved.


52
Correlation

Correlation : Correlation is a measure of the relation between two or


more variables.

Correlation coefficients can range from -1.00 to +1.00. The value of


-1.00 represents a perfect negative correlation while a value of +1.00
represents a perfect positive correlation. A value of 0.00 represents a
lack of correlation.

Copyright © 2004 marketRx, Inc. All rights reserved.


53
Correlation and Regression
 The most widely-used type of correlation coefficient is Pearson r, also called
linear or product-moment correlation.

 In simple language, one can say that the correlation coefficient determines the
extent to which values of two variables are "proportional" to each other.

 The value of the correlation (i.e., correlation coefficient) does not depend on the
specific measurement units used; for example, the correlation between height
and weight will be identical regardless of whether inches and pounds, or
centimeters and kilograms are used as measurement units.

 Proportional means linearly related; that is, the correlation is high if it can be
approximated by a straight line (sloped upwards or downwards). This line is
called the regression line or least squares line, because it is determined such that
the sum of the squared distances of all the data points from the line is the lowest
possible.

 Pearson correlation assumes that the two variables are measured on at least
interval scales. The Pearson product moment correlation coefficient is calculated
as follows:
r12 = [ (Yi1 - Y-bar1)*(Yi2 - Y-bar2)] / [ (Yi1 - Y-bar1)2 * (Yi2 - Y-bar2)2]1/2

Copyright © 2004 marketRx, Inc. All rights reserved.


54
Index Calculation
 Index is calculated with respect to a base figure.

Example:
Year Price Index
1992 225 225*100/225 = 100
1996 240 240*100/225 = 106.6
2000 250 250*100/225 = 111.1 250*100/240 = 104.2
2004 255 255*100/225 = 113.3 = 111.1/106.6 = 104.2

Copyright © 2004 marketRx, Inc. All rights reserved.


55
Cross-Tabulation
 While a frequency distribution describes one variable at a
time, a cross-tabulation describes two or more variables
simultaneously.

 Cross-tabulation results in tables that reflect the joint


distribution of two or more variables with a limited number of
categories or distinct values.

Copyright © 2004 marketRx, Inc. All rights reserved.


56
Gender and Internet Usage

Gender

Row
Internet Usage Male Female Total

Light (1) 5 10 15

Heavy (2) 10 5 15

Column Total 15 15

Copyright © 2004 marketRx, Inc. All rights reserved.


57
Two Variables Cross-Tabulation
 Since two variables have been cross classified, percentages
could be computed either columnwise, based on column
totals, or rowwise, based on row totals.

 The general rule is to compute the percentages in the


direction of the independent variable, across the dependent
variable.

Copyright © 2004 marketRx, Inc. All rights reserved.


58
Internet Usage by Gender

Gender

Internet Usage Male Female

Light 33.3% 66.7%

Heavy 66.7% 33.3%

Column total 100% 100%

Copyright © 2004 marketRx, Inc. All rights reserved.


59
Gender by Internet Usage

Internet Usage

Gender Light Heavy Total

Male 33.3% 66.7% 100.0%

Female 66.7% 33.3% 100.0%

Copyright © 2004 marketRx, Inc. All rights reserved.


60
Introduction of a Third Variable in Cross-Tabulation
Original Two Variables

Some Association No Association


between the Two between the Two
Variables Variables

Introduce a Third Introduce a Third


Variable Variable

Refined Association No Association No Change in Some Association


between the Two between the Two the Initial between the Two
Variables Variables Pattern Variables
Copyright © 2004 marketRx, Inc. All rights reserved.
61
Purchase of Fashion Clothing by Marital Status

Purchase of Current Marital Status


Fashion
Clothing Married Unmarried
High 31% 52%
Low 69% 48%
Column 100% 100%
Number of 700 300
respondents

Copyright © 2004 marketRx, Inc. All rights reserved.


62
Purchase of Fashion Clothing by Marital Status

Pur chase of Sex


Fashion Male Female
Clothing Marr ied Not Mar r ied Not
Mar r ied Mar r ied
High 35% 40% 25% 60%

Low 65% 60% 75% 40%

Column 100% 100% 100% 100%


totals
Number of 400 120 300 180
cases

Copyright © 2004 marketRx, Inc. All rights reserved.


63
Probability
 The probability of some event class occurring can be defined as a ratio of
elementary events in the particular class to the total number of number of
possible outcomes in the sample space. Thus, the probability of event
class A is:

P(A) = The number of occurences


The possible outcomes

Note that probabilities can range from 0 to 1.0. If in your calculations you
arrive at a probability greater than 1.0, you did something wrong.

Copyright © 2004 marketRx, Inc. All rights reserved.


64

You might also like