0% found this document useful (0 votes)
14 views61 pages

Data Analysis Final

Uploaded by

aadesh13verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views61 pages

Data Analysis Final

Uploaded by

aadesh13verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 61

Descriptive

Statistics
Descriptive
Statistics

 Descriptive Statistics summarize some characteristic of a


sample)
•Measures of central tendency: Mean, Median & Mode
•Measures of dispersion: Measure range of distribution of scores,
difference with mean score of data: Quartiles, Variance, Standard
Deviations
• Measures of skewness: how far are group data from normal curve
 Frequency Distribution and Cross Tabulation: The frequency with which
observations are assigned to each category or point on a measurement
scale.
 Most basic form of descriptive statistics
 May be expressed as a percentage of the total sample found in each category
Review of Hypotheses
Testing Procedures
 Analyse the difference between sample statistic and
hypothesized population parameter
 Steps involved:
1. Formulation of hypotheses
2. Selection of statistical test to be used: based on research questions
formulated, number of samples, scale used
3. Selection of significance level
4. Doing computations
1. Calculation of standard error of simple statistic and its standardization
2. Determination of critical value
3. Comparison of value of simple statistic with the critical value and identify its
position in acceptance or rejection region
5. Making decisions – business research conclusions
What is Hypothesis Testing?

Hypothesis testing is a procedure, based on


sample evidence and probability theory,
used to determine whether the hypothesis is
a reasonable statement and should not be
rejected, or is unreasonable and should be
rejected.

4
One-tail vs. Two-tail Test

5
Testing for a Population Mean with a
Known Population Standard Deviation- Example

Step 1: State the null hypothesis and the alternate hypothesis.


H0:  = 200
H1:  ≠ 200
(note: keyword in the problem “has changed”)

Step 2: Select the level of significance.


α = 0.01 as stated in the problem

Step 3: Select the test statistic.


Use Z-distribution since σ is known

6
Testing for a Population Mean with a
Known Population Standard Deviation- Example

Step 4: Formulate the decision rule.


Reject H0 if |Z| > Z/2
Z  Z / 2
X 
 Z / 2
/ n
203.5  200
 Z .01/ 2
16 / 50
1.55 is not  2.58

Step 5: Make a decision and interpret the result.


Because 1.55 does not fall in the rejection region, H0 is not
rejected. We conclude that the population mean is not different from
200. So we would report to the vice president of manufacturing that the
sample evidence does not show that the production rate at the Fredonia
7
Plant has changed from 200 per week.
Testing for a Population Mean with a Known Population
Standard Deviation- Another Example

Suppose in the previous problem the vice president


wants to know whether there has been an increase in
the number of units assembled. To put it another way,
can we conclude, because of the improved production
methods, that the mean number of desks assembled in
the last 50 weeks was more than 200?
Recall: σ=16, n=200, α=.01

8
Parametric tests
When assumption of normality holds true for population
One Sample Tests of Hypothesis
Types of Statistical tests
& its Characteristics
Hypothesis Number of Samples Measurement Test Requirement
testing Scale
Hypotheses One Nominal Chi square
about Two or more Nominal Chi square
Frequency
Distributions
Hypotheses One (large sample) Interval or Ratio Z test n>= 30 where σ is
about means known
One (small Sample) T-test n<30 where σ is not
known
Two (large samples) Z test n>= 30 where σ is
known
Two (small samples) T-test n<30 where σ is not
known
Two (small sample) One way
ANOVA
Types of Statistical tests
& its Characteristics

Hypothesis Number of Samples Measurement Test Requirement


testing Scale
Hypotheses One (large sample) Interval or Ratio Z test n>= 30 where σ is
about known
proportions One (small Sample) T-test n<30 where σ is not
known
Two (large samples) Z test n>= 30 where σ is
known
Two (small samples) Nominal T-test n<30 where σ is not
known
Variance Two or More samples Interval or Ratio F test ( or
ANOVA
test)
Non parametric tests
When assumption of normality does not hold true for
population or when it is not possible to make any assumption
about population distribution
What is T-Test?
 Sometimes, we don’t just look at or describe one group of data.

Instead, we want to look at two groups of data and compare them.


We want to see if the two groups are different. T-tests are often
used to compare the means from two different groups of data.
 It can help you find out if means are significantly different from

one another or if they are relatively the same. If the means are
significantly different, you can say that the variable being
manipulated, your Independent Variable (IV), had an effect on
the variable being measured, your Dependent Variable (DV).
Independent Sample T-Test Vs Paired
Sample T -Test

Independent Sample t-tests are used to compare groups of


participants that are not related in any way. The groups are
independent from one another. So, participants in one group
have no relationship to participants in the second group.
Independent Vs Paired Sample T -
Test
These types of tests are used to compare groups that are
related in some way. There are so many ways that
participants in two groups can be related. One way is that
participants in the first group are the same as participants
in the second group. This is sometimes called a repeated
measures design.
A second way is that participants in the first group are
genetically related to participants in the second group.
For example, a pair of twins could be divided up so one
twin participated with the first group and the other twin
participated with the second group.
Independent Vs Paired Sample T -
Test

A third way is if participants in one group are matched


with participants in a second group by some attribute.
For example, if a participant in the first group rates
high on depression, researchers might try to find a
participant in the second group that also rates high on
depression.
Example

Suppose you want to study the effect of sugar (IV) on memory

for words (DV). You have two groups (also called conditions) in
your experiment, sugar and no sugar. Each participant only
participates in one condition of the experiment. Participants in
the first condition are not related in any way to participants in
the second condition. Because the participants in each condition
are not related in any way, we will use the Independent Samples
T-Test.
Data

Condition 1: Sugar
 Participant 1 = 3 words
 Participant 2 = 6 words
 Participant 3 = 4 words
 Participant 4 = 3 words
 Participant 5 = 5 words
Data

Condition 3: No Sugar
Participant 1 = 2 words
Participant 2 = 2 words
Participant 3 = 1 words
Participant 4 = 3 words
Participant 5 = 3 words
What we want to know?

In this experiment, you want to know if there is a


significant different between the data collected from each
condition, sugar and no sugar. You want to know if sugar
really does have an effect on memory for words. Does
word memory significantly increase or decrease when
people eat sugar? Is there no difference in word memory
for sugar and no sugar conditions?
Analysis of Variance (ANOVA)
What does ANOVA stand for?

ANalysis Of Variance. With ANOVA, we analyze and

compare the variability of scores between conditions and


within conditions. This helps us find out if the IV had a
significant effect on the DV.
1-Way ANOVA?

Sometimes, we want to look at more than two groups of data


and compare them. We want to see if more than two groups of
data are different. While we could use T-tests to compare the
means from two different groups of data, but we need a
different kind of test when comparing three or more groups.

We can use a 1-Way ANOVA test to compare three or more


groups or conditions in an experiment. A 1-Way ANOVA can
help you find out if the means for each group / condition are
significantly different from one another or if they are
relatively the same.
1-Way ANOVA?

If the means are significantly different, you can say


that the variable beingmanipulated, your Independent
Variable (IV), had an effect on the variable
being measured, your Dependent Variable (DV).
Why is it called 1-way?

Because we use this test to analyze data from


experiments that have only one IV. If we were
analyzing data from experiments with more than one
IV, we would need to use a different test.
Example

Suppose you want to study the effect of sugar (IV) on memory

for words (DV). You have three groups (also called conditions)
in your experiment, sugar, a little sugar and no sugar. Each
participant only participates in one condition of the experiment.
Participants in the first condition are not related in any way to
participants in the second condition or third condition. Because
the participants in each condition are not related in any way, we
will use the 1-Way Between Subjects ANOVA.
Data

Condition 1: Sugar
 Participant 1 = 3 words
 Participant 2 = 6 words
 Participant 3 = 4 words
 Participant 4 = 3 words
 Participant 5 = 5 words
Data

Condition 2: A little Sugar


Participant 1 = 3 words
Participant 2 = 5 words
Participant 3 = 3 words
Participant 4 = 3 words
Participant 5 = 4 words
Data

Condition 3: No Sugar
Participant 1 = 2 words
Participant 2 = 2 words
Participant 3 = 1 words
Participant 4 = 3 words
Participant 5 = 3 words
What we want to know?

In this experiment, you want to know if there is a


significant different between the data collected from each
condition, sugar, a little sugar and no sugar. You want to
know if sugar really does have an effect on memory for
words. Does word memory significantly increase or
decrease when people eat sugar? Is there no difference in
word memory for sugar and no sugar conditions?
Associative & predictive analysis
Bivariate & MultiVariate – Correlation & Regression
Correlation
 Attempts to determine the ‘degree of relationship’ between
two variables
 If observed between variables that cannot possibly be related, called
Spurious / Nonsense Correlation
 Cannot tell us the cause and effect relationship, only
establishes covariation, correlation may be explained by any
one or combination of following reasons:
 Due to pure chance, especially in a small sample
 Both correlated variables may be influenced by one or more other
variables
 Both variables may be mutually influencing each other so that neither
can be designated as cause and other as effect
Correlation
 Types
 Positive or Negative
 Simple, Partial or Multiple
 Linear and non-linear
 Methods of studying:
 Scatter Diagram Method
 Graphic Method
 Coefficient of Correlation: Pearson’s / Spearman’s
 Concurrent Deviation Method
 Method of Least Squares
Regression
 Regression analysis attempts to establish the ‘nature of
relationship’ between variables – i.e., to study the
functional relationship between the variables and thereby
provide a mechanism for prediction or forecasting
 Measure of average relationship between two or more
variables in terms of original units of data
 Dependent Variable (Y): The variable of interest that we try
to predict
 Independent/ Explanatory Variable (X): The variable used
to predict the dependent variable
 Terms do not imply a necessary cause and effect relationship
Regression
 Uses of Regression Analysis
 Provides estimates of values of dependent variable from values of
independent variables – estimation through Regression Line
 Obtain a measure of error (standard error of estimate) involved in
using regression line as a basis for estimation
 Calculation of Correlation may be calculated. The square of
correlation coefficient (r²) measures
 the degree of association of correlation between two variables
 Assesses the proportion of variance in the dependent variable that has been
accounted for by the regression line
 Greater the value of r², better is the fit and predictive value of the
regression line
Difference between
Correlation & Regression Analysis
Correlation Regression
Measure Measure of ‘nature’ – used
of degree of
covariability for prediction of one value
from another
No way to determine cause
Cause and effect studied by
and effect relationship,
taking one variable as
coefficient (r) is symmetric dependent and other are
rxy  ryx
independent
Nonsense correlation may Nonsense regression cannot
occur occur
Independent of change of Independent of change in
scale and origin origin but not in scale
Multivariate Techniques
Multi-Dimensional Scaling
Data Reduction (Factor Analysis)
Cluster Analysis
Discriminant Analysis for Classification
and Prediction
Discriminant Analysis: Application
Areas
The major application area for this technique is
where we want to be able to distinguish between
two or three sets of objects or people, based on the
knowledge of some of their characteristics.

Ex: Selection process for a job, admission


process of an educational programme in a college,
or dividing a gp of people into potential buyers or
non-buyers. It is used by Credit Rating Agencies to
rate individuals, to classify them into good lending
risks or bad lending risks
What is Discriminant Analysis?

 Discriminant analysis is used to predict group


membership.
 This technique is used to classify individuals/objects
into one of the alternative groups on the basis of a set
of predictor variables.
 The dependent variable in discriminant analysis is
categorical whereas the independent or predictor
variables are either interval or ratio scale in nature.
 When there are two groups (categories) of dependent
variable, we have two-group discriminant analysis
and when there are more than two groups, it is a case
of multiple discriminant analysis.
Objectives of Discriminant
Analysis
The objectives of discriminant analysis are the following:

 To find a linear combination of variables that discriminate between


categories of dependent variable in the best possible manner.
 To find out which independent variables are relatively better in
discriminating between groups.
 To determine the statistical significance of the discriminant function
and whether any statistical difference exists among groups in terms of
predictor variables.
 To develop the procedure for assigning new objects, firms or
individuals whose profile but not the group identity are known to one
of the two groups.
 To evaluate the accuracy of classification, i.e., the percentage of
customers that it is able to classify correctly.
Discriminant Analysis: Method
It is very similar to multiple regression technique.
The equation in a two variable discriminant analysis is:
Y=a+K1X1+K2X2

It is called Discriminant Function


Where
Y = DV which is a categorical var(Unlike Regression where it was
continuous var)
X1, X2 = IVs
K1, K2 = Coeff of Ivs
A = Constant

Y is actually a classification into 2 or more gps and therefore, a


'grouping’ variable.
Uses of Discriminant Analysis

Some of the uses of Discriminant Analysis are:

Scale construction: Discriminant analysis is used to


identify the variables/statements that are
discriminating and on which people with diverse views
will respond differently.

Perceptual mapping: The technique is also used


extensively to create attribute-based spatial maps of
the respondent’s mental positioning of brands.
Uses of Discriminant Analysis
 Segment discrimination: To understand what are the key
variables on which two or more groups differ from each
other, this technique is extremely useful. Questions to
which one may seek answers are as follows:
 What are the demographic variables on which potentially successful
salesmen and potentially unsuccessful salesmen differ?
 What are the variables on which users/non-users of a product can be
differentiated?
 What are the economic and psychographic variables on which price-
sensitive and non-price sensitive customers be differentiated?
 What are the variables on which the buyers of local/national brand of
a product be differentiated?
Definitions of Key Terms used in
Discriminant Analysis

 Eigenvalue - The basic principle in the estimation of a discriminant


function is that the variance between the groups relative to the variance
within the group should be maximized. The ratio of between group
variance to within group variance is called Eigenvalue.
 Wilks’ Lambda – It is given by ratio of within group sum of squares
to total sum of squares. The Wilks’ lambda takes a value between 0 and
1 and lower the value of Wilks’ lambda, the higher is the significance
of the discriminant function. A statistically significant function will
enhance the reliability that the differentiation between the groups
exists.
FACTOR ANALYSIS
Application Areas
 It is a useful method of reducing data complexity by reducing number
of variables being studied.
 For Example, to find a marketing decision maker wondering what
exactly makes a consumer buy his product. The possible purchasing
criteria could range from just one or two to fifteen or twenty, and the
manager generally shooting in dark to figure out what really drives
buyer behavior?
 Factor Analysis is a good way of identifying latent or underlying
factors from an array of seemingly important variables.
 Factor Analysis is a set of techniques which, by analysing correlations
between variables, reduces their number into fewer factors which
explain much of the original data, more economically
Introduction to Factor Analysis

 Factor analysis is a multivariate statistical technique in which there is


no distinction between dependent and independent variables.
 In factor analysis, all variables under investigation are analysed
together to extract the underlined factors.
 Factor analysis is a data reduction method.

 It is a very useful method to reduce a large number of variables


resulting in data complexity to a few manageable factors.
 These factors explain most part of the variations of the original set of
data.
 A factor is a linear combination of variables.
Uses of Factor Analysis
 Marketing studies: The technique has extensive use in the field of
marketing and can be successfully used for new product development;
product acceptance research, developing of advertising copy, pricing
studies and for branding studies.

For example we can use it to:

• identify the attributes of brands that influence consumers’ choice;

• get an insight into the media habits of various consumers;

• identify the characteristics of price-sensitive customers.


Conditions for a Factor Analysis
Exercise
The following conditions must be ensured before executing the
technique:
Factor analysis exercise requires metric data. This means the data
should be either interval or ratio scale in nature.
The variables for factor analysis are identified through exploratory
research which may be conducted by reviewing the literature on the
subject, researches carried out already in this area, by informal
interviews of knowledgeable persons, qualitative analysis like focus
group discussions held with a small sample of the respondent
population, analysis of case studies and judgment of the researcher.
As the responses to different statements are obtained through
different scales, all the responses need to be standardized. The
standardization helps in comparison of different responses from such
scales.
Conditions for a Factor Analysis
Exercise

The size of the sample respondents should be at least


four to five times more than the number of variables
(number of statements).
The basic principle behind the application of factor
analysis is that the initial set of variables should be
highly correlated. If the correlation coefficients
between all the variables are small, factor analysis may
not be an appropriate technique.
Factor Analysis
It is a general term denoting a class of procedures
primarily used for data reduction and
summarization.
In research, there may be a large number of
variables, most of which are correlated and which
must be reduced to a manageable level.
Relationships among sets of many interrelated
variables are examined and represented in terms of
a few underlying factors.
Use of Factor
Analysis
Factor Analysis is used in following circumstances:
To identify underlying dimensions, or factors, that explain
the correlations among a set of variables.
 A set of lifestyle statements may be used to measure the
psychographic profile of consumers. These statements may then be
factor analyzed to identify the underlying psychographic factors.
To identify a new, smaller set of uncorrelated variables to
replace the original set of correlated variables in subsequent
multivariate analysis.
Applications of
Factor Analysis
It can be used in market segmentation for identifying
the underlying variables on which to group the
customers.
In product research, it can be employed to determine
the brand attributes that influence consumer choice.
In advertising studies, it can be used to understand the
media consumption habits of the target market.
In pricing studies, it can be used to identify the
characteristics of price-sensitive consumers.
Key terms used in Factor Analysis

 Factor Scores – It is the composite scores estimated for


each respondent on the extracted factors.
 Factor Loading – The correlation coefficient between the
factor score and the variables included in the study is called
factor loading.
 Factor Matrix (Component Matrix) – It contains the
factor loadings of all the variables on all the extracted
factors.
 Eigen Value – The percentage of variance explained by
each factor can be computed using eigen value.
Steps in a Factor Analysis Exercise
Step 1
 Factor Extraction Process: Objective is to identify how many factors
would be extracted from the data. The Principal Component Analysis
(PCA) is used for that.
 There is also a rule -of- thumb based on Eigen Value.

 The higher the Eigen Value of a factor, the higher the amount of
variance explained by the factor.
 Before extraction, it is assumed that each of the original variables has
an Eigen Value=1
Steps in a Factor Analysis Exercise
Step 2

 Rotation of factors:

 The second step in the factor analysis exercise is the rotation of initial factor
solutions. This is because the initial factors are very difficult to interpret.
Therefore, the initial solution is rotated so as to yield a solution that can be
interpreted easily.
 To interpret and name the factors

 This is done by the process of identifying which factors are associated with which
of the original variables. Rotated Factor Matrix is used for this purpose
 Values close to 1 represent high loadings and those close to 0 represent low
loadings
Cluster Analysis
 It is a class of techniques used to classify objects or cases
into relatively homogeneous groups called clusters. Objects
in each cluster tend to be similar to each other and
dissimilar to objects in other clusters.
 Cluster analysis is also called classification analysis or
numerical taxonomy.
 Clustering procedures can either assign each object to one
and only one cluster or there can be situations in which the
boundaries of the clusters are not clear-cut and the
classification of consumers is not obvious as many of them
could be grouped into one cluster or another.
Applications of
Cluster Analysis
 Segmenting the market: Consumers may be clustered on the
basis of benefits sought for the purchase of a product. Each
cluster would consist of consumers who are relatively
homogeneous in terms of the benefits they seek. This approach
is called benefit segmentation.
 Understanding buyer behavior: it can be used to identify
homogeneous groups of buyers and then the buying behavior
of each group may be examined separately.
Applications of
Cluster Analysis
 Identifying new product opportunities: by clustering brands
and products, competitive sets within the market can be
determined. Brands in the same cluster compete more fiercely
with each other than with brands in other cluster.
 Selecting test markets: by grouping cities into homogeneous
clusters, it is possible to select comparable cities to test various
marketing strategies.
 Reducing Data: it may be used as a general data reduction tool
to develop clusters or subgroups of data that are more
manageable than individual observations.

You might also like