0% found this document useful (0 votes)
3 views

Res Meth Ch09 Analysing Data Quantitatively

Chapter 9 of 'Research Methods for Business Students' focuses on quantitative data analysis, covering key aspects such as data preparation, types of data, and appropriate statistical techniques. It emphasizes the importance of using statistical software for data coding, exploration, and presentation through various graphical methods. The chapter also discusses descriptive statistics, correlation, regression, and the necessary conditions to establish causal relationships between variables.

Uploaded by

alli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Res Meth Ch09 Analysing Data Quantitatively

Chapter 9 of 'Research Methods for Business Students' focuses on quantitative data analysis, covering key aspects such as data preparation, types of data, and appropriate statistical techniques. It emphasizes the importance of using statistical software for data coding, exploration, and presentation through various graphical methods. The chapter also discusses descriptive statistics, correlation, regression, and the necessary conditions to establish causal relationships between variables.

Uploaded by

alli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Research Methods for Business Students

Chapter 9 Analysing data quantitatively


Learning Objectives
By the end of this chapter you should be able to:

12.1 identify the main issues that you need to consider when preparing data for quantitative
analysis and when analysing these data;

12.2 recognise different types of data and understand the implications of data type for
subsequent analyses;

12.3 code data and create a data matrix using statistical analysis software;

12.4 select the most appropriate tables and graphs to explore and illustrate different
aspects of your data;

12.5 select the most appropriate statistics to describe individual variables and to examine
relationships between variables and trends in your data;

12.6 interpret the tables, graphs and statistics that you use correctly.
Introduction
• Quantitative data refer to all such primary and secondary data and
can range from simple counts such as the frequency of
occurrences of an advertising slogan to more complex data such
as test scores, prices or rental costs.
• Quantitative analysis techniques range from creating simple
tables or graphs that show the frequency of occurrence and using
statistics such as indices to enable comparisons, through
establishing statistical relationships between variables, to
complex statistical modelling.
• Before we begin to analyze data quantitatively we therefore need
to ensure that our data are already quantified or that they are
quantifiable and can be transformed into quantitative data which
can be recorded as numbers and analyzed quantitatively.
Introduction
• Within quantitative analysis, calculations and diagram drawing
are usually undertaken using analysis software ranging from
spreadsheets such as Excel™ to more advanced data management
and statistical analysis software such as SAS™, Stata™, IBM
SPSS Statistics™.
• You might also use more specialized survey design and analysis
online software such as Qualtrics Research core ™ and
SurveyMonkey™, statistical shareware such as the R Project for
Statistical Computing, or content analysis and text mining
software such as WordStat™.
Preparing data for quantitative analysis
• When preparing data for quantitative analysis you need
to be clear about the:
– definition and selection of cases;
– data type or types (scale of measurement);
– numerical codes used to classify data to ensure they
will enable your research questions to be answered.
Exploring and presenting data
• Once your data have been entered and checked for errors, you are ready to start
your analysis.
• Exploratory Data Analysis (EDA) approach is useful in these initial stages.
– This approach emphasizes the use of graphs to explore and understand
your data.
• Although within data analysis the term graph has a specific meaning: ‘. . . A
visual display that illustrates one or more relationships among numbers’, it is
often used interchangeably with the term ‘chart’.
• Consequently, while some authors (and data analysis software) use the term bar
graphs, others use the term bar charts.
• Even more confusingly, what are referred to as ‘pie charts’ are actually graphs!
Bar graph

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Table 9.1 (1 of 4)
Data presentation by data type: A summary

Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018


Table 9.1 (2 of 4)
Data presentation by data type: A summary

Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018


Table 9.1 (3 of 4)
Data presentation by data type: A summary

Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018


Table 9.1 (4 of 4)
Data presentation by data type: A summary

Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018


Bar graph (data reordered)

Source: Adapted from Eurostat (2017) © European Communities 2017, reproduced with permission

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Word cloud

For text data the relative proportions of key words and phrases can be shown using a word
cloud, there being numerous free word cloud generators such as Wordle™ available
online.
In a word cloud the frequency of occurrence of a particular word or phrase is represented
by the font size of the word or occasionally the colour.

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Histogram

* A diagram consisting of rectangles whose area is proportional to the frequency of


a variable and whose width is equal to the class interval.

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Pictogram

* A graphic symbol
that conveys its
meaning through
its pictorial
resemblance to a
physical object.

Source: Adapted from Harley-Davidson Inc. (2017)

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Line graph

• Trends can only be presented for variables containing numerical (and occasionally
ranked) longitudinal data.
• The most suitable diagram for exploring the trend is a line graph in which the data
values for each time period are joined with a line to represent the trend

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Frequency polygons showing distributions
of values

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Annotated frequency polygon showing a
normal distribution

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Contingency table: Number of insurance
claims by gender, 2018

• Let’s draw some different forms of bar graphs for this table (bar graphs
make it easier to understand and compare the cases….
Multiple bar graph

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Percentage component bar graph

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Scatter graph

Copyright © 2019, 2016, 2012 Pearson Education, Inc. All Rights Reserved
Describing data using statistics
Descriptive statistics by data type: a summary
Describing data using statistics
Descriptive statistics by data type: a summary

Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018


Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018
Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018
Table 9.2 (1 of 5)
Statistics to examine relationships, differences
and trends by data type: A summary
Categorical Numerical
Nominal Ordinal Continuous Discrete
(Descriptive) (Ranked)
To test normality of Kolmogorov-Smirnov test,
distribution Shapiro-Wilk test
To test whether two Chi square (data may need Chi square if variable grouped
variables are independent grouping) into discrete classes
To test whether two Cramer’s V and
variables are associated Phi (both
variables must
be
dichotomous)

Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018


Table 9.2 (2 of 5)
Statistics to examine relationships, differences
and trends by data type: A summary
Categorical Numerical
Nominal Ordinal Continuous Discrete
(Descriptive) (Ranked)
To test whether two Kolmogorov- Independent t-test or paired t-
groups (categories) are Smirnov test (often used to test for
different (data may changes over time) or Mann-
need Whitney U test (where data
grouping) or skewed or a small sample)
Man-
Whitney U
test

To test whether three or Analysis of variance (ANOVA)


more groups (categories)
are different
Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018
Table 9.2 (3 of 5)
Statistics to examine relationships, differences
and trends by data type: A summary
Categorical Numerical
Nominal Ordinal Continuous Discrete
(Descriptive) (Ranked)
To assess the strength of Spearman’s Pearson’s product moment
relationship between two rank correlation coefficient (PMCC)
variables correlation
coefficient or
Kendall’s rank
order
correlation
coefficient

To assess the strength of Coefficient of determination


a relationship between
one dependent and one
independent variable
Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018
Table 9.2 (4 of 5)
Statistics to examine relationships, differences
and trends by data type: A summary
Categorical Numerical
Nominal Ordinal Continuous Discrete
(Descriptive) (Ranked)
To assess the strength of Coefficient of multiple
a relationship between determination
one dependent and two or
more independent
variables

To predict the value of a Regression equation


dependent variable from
one or more independent
variables
To explore relative change Index numbers
over time

Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018


Table 9.2 (5 of 5)
Statistics to examine relationships, differences
and trends by data type: A summary
Categorical Numerical
Nominal Ordinal Continuous Discrete
(Descriptive) (Ranked)
To compare relative Index numbers
changes over time

To determine the trend Time series, moving averages or


over time of a series of regression equation (regression
data analysis)
Source: © Mark Saunders, Philip Lewis and Adrian Thornhill 2018
32

CENTRAL TENDENCY
Central tendency is about calculating the middle, or centre,
of your data.

– Mean (Average)
– Median
– Mode
33

CENTRAL TENDENCY

The mean, or average, can be calculated by adding up each score—or data point, or
value—and then dividing the sum by the number of scores.

• Very susceptible to outliers

• Outliers: data points that differ significantly from the rest.


34

CENTRAL TENDENCY

The median is a measure of the middle where 50% of your data is higher and 50% is
lower than that number. To calculate it, you order all of your data points from highest to
lowest and then find the one in the middle. If you have an even number of data points,
you take the average of the 2 in the middle.
The median is not susceptible to outliers.
35

CENTRAL TENDENCY

The mode is the most frequently occurring value in the data. If there’s no value that
repeats, then there is no mode. To calculate the mode, count the number of times each
value
appears in the data. The one that appears most frequently is the mode.
36

CENTRAL TENDENCY
37

CENTRAL TENDENCY

3, 17, 3, 44, 21, 3, 8, 32, 75


38

CENTRAL TENDENCY

99, 92, 92, 100, 90, 92, 91, 93, 93, 35


39

CENTRAL TENDENCY
Set 1: 45, 45, 45, 45, 45
Set 2: 40, 40, 45, 50, 50
Set 3: 20, 20, 45, 70, 70
40

VARIATION
• Range
• Standard deviation
41

Normal distribution
42

Standardized normal distribution

Standardized normal distribution, is a theoretical distribution


where the mean is 0, the standard deviation is 1,
43

Standardized normal distribution

 A z-score expresses the distance of a single case from the mean of


the variable. It does so in units of the standard deviation. A z-score of
1.5 means that on that variable, the case is 1.5 standard deviations
above the mean. A z-score of −0.6 means that the case is 0.6
standard deviations below the mean.

 To calculate a z-score, all you need is your score or value on the


variable for this data point, the mean for the variable, and the
standard deviation. The formula to find an individual z-score is the
score minus the mean divided by the standard deviation.
44

Please learn;

• Z-tests
• T-tests
45

CORRELATION - REGRESSION
Correlation is
necessary but not sufficient
to establish causality:

You need to find a correlation as one step on the


path to show a causal relationship between 2
variables, but just finding that correlation is not
enough to say that change in one variable causes
change in another.
46

CORRELATION - REGRESSION
There are 4 requirements to establish a causal relationship between 2 variables.

1.Establish a correlation between the 2 variables. This means that as


you see change in the presence or value of one variable, you also see a change in the presence of value of a second
variable. But just because you establish correlation does not mean that you have causation.

2.Establish a theoretical relationship between the 2


variables. One way to know if your correlation is a simple coincidence or not is to have a strong explanation for
why the 2 variables might be related. Does it make sense that one variable causes the other?

3.Establish temporal order between the 2 variables. This is a formal


way of saying that if you are going to claim that some variable x causes change in some variable y, you have to be able to
show that x occurred prior in time to y. This is simple in theory but can be difficult to do in practice. If you don’t have the
information to establish temporal order, establishing causation beyond any doubt will be difficult. But just because you
establish that one variable occurs prior in time to another, it does not mean that the first one causes the change in the
second one.

4.Eliminate alternative spurious explanations for the


relationship between 2 variables. This means that you need to make sure there is no
third variable that is actually driving the correlation between x and y. It is possible that the correlation between x and y is
actually caused by some third variable—for example, z. When you have a spurious cause, what is really going on is that z is
changing, and z’s change is causing the change in both x and y, rather than x causing the change in y.
47

CORRELATION coefficients
48

REGRESSION

Regression allows you to apply statistical controls to your data to test for the
independent effect of each x on your y.

It doesn’t completely eliminate the risk of alternative explanations; if a variable


isn’t included in your data set, you can’t necessarily control for it.

But regression can tell you the extent to which your independent variables
capture change in the dependent variable, and this means that, more than any
of the other techniques you’ve learned, this one is the best able to help you say
something not only about differences and correlation, but about causation.

If your project asks questions about whether one factor or variable


causes another, regression might be your ticket to an answer .
49

REGRESSION
As with all statistical tests, (linear) regression rests on certain assumptions. Before
using regression, you will want to make sure that your data meets these assumptions.
¯¯

Linear data. You expect to find a relationship that is either positive, where increases in x lead to corresponding
increases in y, or negative, where increases in x lead to decreases in y. If you instead expect to find a curvilinear or other kind of
relationship, the standard regression analysis is not going to work very well.
¯¯

Normal distributions of all variables. There are tests you can run for this,
but one thing to watch out for is if your data has outliers that might skew an otherwise normal distribution. You can check this by creating a
quick graph of each variable or by standardizing your variables into z-scores.
¯¯

Homoscedasticity. This means that the variance of the errors is stable regardless of the value of x. You
don’t want the errors to be higher for some values of x than others. You can check this by graphing the errors after running a regression and
seeing if they are pretty evenly distributed everywhere along the line. There are also tests you can run to check for this, but a visual
inspection will give you your first hints if you have a problem here.
¯¯

No multicollinearity. Regression assumes that your independent variables are not highly correlated with
each other. Otherwise, the model gets confused over their independent causal power.
50

REGRESSION

Regression
y = α + βx
51

REGRESSION
52

REGRESSION Regression
y = α + βx
Regression is not something you want to
calculate by hand; let your favorite
statistics package do it for you.
53

REGRESSION Regression
y = α + βx

Regression is not something you want


to calculate by hand; let your favorite
statistics package do it for you.

When you let computers do it ;


you are going to get a bunch of useful
information as outputs, including r-squared, betas,
standard errors, and p-values.
54

REGRESSION Regression
y = α + βx
R-squared tells you how much of the variation in y is captured by variation in the other variables. It’s read as a
percentage. R-squared tells you something about the overall strength of your model—that combination of x’s you’ve used to
try to explain the variation in y.

Beta coefficient: You also get valuable information about each individual variable when you run a
regression. Most important are your betas—the coefficients that tell you the exact predicted impact of x on y. They tell you that
for every 1 unit increase in x, this will be the corresponding increase in y. The unit is in standard deviations. The size of the
beta tells you the size of the impact of that x on y. A small beta means that x only has a small influence on y. A zero would
support the null hypothesis, that there is no relationship between x and y. Larger betas mean that x has a larger effect on y—
and that is ultimately what you are testing for.

Your results will also give you astandard error of the estimate and standard errors for each coefficient. The
standard error of the estimate is a measure of how much on average each actual data point differs from the predicted point on
the regression line. In essence, this is a standard deviation of the error. The standard error for each coefficient is the standard
deviation of that particular coefficient. You generally want your betas to be large relative to their standard errors—a sign of the
power of that independent variable in affecting the dependent variable.

P-value; significance: Your output will also typically give you your t-statistic and its
associated p-value for each variable—this is a measure of the statistical significance of your variable. Remember, no matter
how big your beta is, if the variable is not statistically significant at the 0.05 level or better, it should not be reported as a
genuine finding. You should also get a p-value for the entire model; you’ll want to make sure that it, too, is at the 0.05 level or
better.
END

You might also like