0% found this document useful (0 votes)
5 views

Unit 3 Research Methods

Research Methodology

Uploaded by

sruthyanand.lv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Unit 3 Research Methods

Research Methodology

Uploaded by

sruthyanand.lv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

lOMoAR cPSD| 44911668

Research Methodology and IPRUNIT III


DATA ANALYSIS AND REPORTING
Overview of Multivariate analysis, Hypotheses testing and Measures of Association.
Presenting Insights and findings using written reports and oral presentation.

Introduction:
There are three categories of analysis to be aware of:
• Univariate analysis, which looks at just one variable
• Bivariate analysis, which analyses two variables
• Multivariate analysis, which looks at more than two variables

1. Univariate data –
This type of data consists of only one variable. The analysis of Univariate data
is thus the simplest form of analysis since the information deals with only one
quantity that changes. It does not deal with causes or relationships and the
main purpose of the analysis is to describe the data and find patterns that
exist within it. The example of a Univariate data can be height.
2. Bivariate data –
This type of data involves two different variables. The analysis of this type of
data deals with causes and relationships and the analysis are done to find out
the relationship among the two variables. Example of bivariate data can be
temperature and ice cream sales in summer season.
Suppose the temperature and ice cream sales are the two variables of a
bivariate data (figure 2). Here, the relationship is visible from the table that
temperature and sales are directly proportional to each other and thus
related because as the temperature increases, the sales also increase. Thus
bivariate data analysis involves comparisons, relationships, causes and
explanations. These variables are often plotted on X and Y axis on the graph
for better understanding of data and one of these variables is independent
while the other is dependent.

3. Multivariate data
When the data involves three or more variables, it is categorized under
multivariate. Example of this type of data is suppose an advertiser wants to
compare the popularity of four advertisements on a website, then their click
rates could be measured for both men and women and relationships between
Variables can then be examined.
lOMoAR cPSD| 44911668

It is similar to bivariate but contains more than one dependent variable. The
ways to perform analysis on this data depends on the goals to be achieved.
Some of the techniques are regression analysis, path analysis, factor analysis
and multivariate analysis of variance (MANOVA).

Overview of Multivariate analysis


Multivariate means involving multiple dependent variables resulting in one
outcome. This explains that the majority of the problems in the real world are
Multivariate. For example, we cannot predict the weather of any year based on the
season. There are multiple factors like pollution, humidity, precipitation, etc.

Multivariate analysis (MVA) is a Statistical procedure for analysis of data involving


more than one type of measurement or observation. It may also mean solving
problems where more than one dependent variable is analysed simultaneously with
other variables.

1. What is multivariate analysis?


In data analytics, we look at different variables (or factors) and how they might impact
certain situations or outcomes. For example, in marketing, you might look at how the
variable “money spent on advertising” impacts the variable “number of sales.” In the
healthcare sector, you might want to explore whether there’s a correlation between
“weekly hours of exercise” and “cholesterol level.” This helps us to understand why
certain outcomes occur, which in turn allows us to make informed predictions and
decisions for the future.

Advantages and Disadvantages of Multivariate Analysis


Advantages
• The main advantage of multivariate analysis is that since it considers more
than one factor of independent variables that influence the variability of
dependent variables, the conclusion drawn is more accurate.
• The conclusions are more realistic and nearer to the real-life situation.

Disadvantages
• The main disadvantage of MVA includes that it requires rather complex
computations to arrive at a satisfactory conclusion.
lOMoAR cPSD| 44911668

• Many observations for a large number of variables need to be collected and


tabulated; it is a rather time-consuming process.

Classification Chart of Multivariate Techniques


Selection of the appropriate multivariate technique depends upon-
a) Are the variables divided into independent and dependent classification?
b) If yes, how many variables are treated as dependents in a single analysis?
c) How are the variables, both dependent and independent measured?

Multivariate analysis technique can be classified into two broad categories viz., this
classification depends upon the question: are the involved variables dependent on
each other or not?
If the answer is yes: We have Dependence methods.
If the answer is no: We have Interdependence methods.

Dependence technique: Dependence Techniques are types of multivariate analysis


techniques that are used when one or more of the variables can be identified as
dependent variables and the remaining variables can be identified as independent.

Multiple Regression
Multiple Regression Analysis– Multiple regression is an extension of simple linear
regression. It is used when we want to predict the value of a variable based on the
value of two or more other variables. The variable we want to predict is called the
dependent variable (or sometimes, the outcome, target, or criterion variable).
Multiple regressions use multiple “x” variables for each independent variable: (x1)1,
(x2)1, (x3)1, Y1)

Conjoint analysis
‘Conjoint analysis‘ is a survey-based statistical technique used in market research
lOMoAR cPSD| 44911668

that helps determine how people value different attributes (feature, function,
lOMoAR cPSD| 44911668

benefits) that make up an individual product or service. The objective of conjoint


analysis is to determine the choices or decisions of the end-user, which drives the
policy/product/service. Today it is used in many fields including marketing, product
management, operations research, etc.

It is used frequently in testing consumer response to new products, in acceptance of


advertisements and in-service design. Conjoint analysis techniques may also be
referred to as multi-attribute compositional modelling, discrete choice modelling, or
stated preference research, and is part of a broader set of trade-off analysis tools
used for systematic analysis of decisions.

There are multiple conjoint techniques, few of them are CBC (Choice-based conjoint)
or ACBC (Adaptive CBC).

Multiple Discriminant Analysis

The objective of discriminant analysis is to determine group membership of samples


from a group of predictors by finding linear combinations of the variables which
maximize the differences between the variables being studied, to establish a model
to sort objects into their appropriate populations with minimal error.

Discriminant analysis derives an equation as a linear combination of the


independent variables that will discriminate best between the groups in the
dependent variable. This linear combination is known as the discriminant function.

The weights assigned to each independent variable are corrected for the
interrelationships among all the variables. The weights are referred to as
discriminant coefficients.

The discriminant equation:


F = β0 + β1X1 + β2X2 + … + βpXp + ε
where, F is a latent variable formed by the linear combination of the dependent
variable, X1, X2,… XP is the p independent variable, ε is the error term and β0, β1,
β2,…, βp is the discriminant coefficients.

A linear probability model


A linear probability model (LPM) is a regression model where the outcome variable
is binary, and one or more explanatory variables are used to predict the outcome.
lOMoAR cPSD| 44911668

Explanatory variables can themselves be binary or be continuous. If the


classification involves a binary dependent variable and the independent variables
include non-metric ones, it is better to apply linear probability models.

Binary outcomes are everywhere: whether a person died or not, broke a hip has
hypertension or diabetes, etc.

We typically want to understand what the probability of the binary outcome is given
explanatory variables.

We could actually use our linear model to do so; it’s very simple to understand why.

If Y is an indicator or dummy variable, then E[Y |X] is the proportion of 1s given X,


which we interpret as a probability of Y given X.

We can then interpret the parameters as the change in the probability of Y when X
changes by one unit or for a small change in X For example, if we model , we could
interpret β1 as the change in the probability of death for an additional year of age

Multivariate Analysis of Variance and Covariance


Multivariate analysis of variance (MANOVA) is an extension of a common analysis of
variance (ANOVA). In ANOVA, differences among various group means on a single-
response variable are studied. In MANOVA, the number of response variables is
increased to two or more. The hypothesis concerns a comparison of vectors of group
means. A MANOVA has one or more factors (each with two or more levels) and two
or more dependent variables. The calculations are extensions of the general linear
model approach used for ANOVA.

Canonical Correlation Analysis


Canonical correlation analysis is the study of the linear relations between two sets of
variables. It is the multivariate extension of correlation analysis.
CCA is used for two typical purposes:-
• Data Reduction
• Data Interpretation
You could compute all correlations between variables from the one set (p) to the
variables in the second set (q), however interpretation is difficult when pq is large.
lOMoAR cPSD| 44911668

Canonical Correlation Analysis allows us to summarize the relationships into a lesser


number of statistics while preserving the main facets of the relationships. In a way,
the motivation for canonical correlation is very similar to principal component
analysis.

Structural Equation Modelling


Structural equation modelling is a multivariate statistical analysis technique that is
used to analyse structural relationships. It is an extremely broad and flexible
framework for data analysis, perhaps better thought of as a family of related
methods rather than as a single technique.

SEM in a single analysis can assess the assumed causation among a set of dependent
and independent constructs i.e. validation of the structural model and the loadings of
observed items (measurements) on their expected latent variables (constructs) i.e.
validation of the measurement model. The combined analysis of the measurement
and the structural model enables the measurement errors of the observed variables
to be analysed as an integral part of the model, and factor analysis combined in one
operation with the hypotheses testing.

Interdependence Technique
Interdependence techniques are a type of relationship that variables cannot be
classified as either dependent or independent.

It aims to unravel relationships between variables and/or subjects without explicitly


assuming specific distributions for the variables. The idea is to describe the patterns
in the data without making (very) strong assumptions about the variables.

Factor Analysis
Factor analysis is a way to condense the data in many variables into just a few
variables. For this reason, it is also sometimes called “dimension reduction”. It
lOMoAR cPSD| 44911668

makes the grouping of variables with high correlation. Factor analysis includes
techniques such as principal component analysis and common factor analysis.

This type of technique is used as a pre-processing step to transform the data before
using other models. When the data has too many variables, the performance of
multivariate techniques is not at the optimum level, as patterns are more difficult to
find. By using factor analysis, the patterns become less diluted and easier to analyse.

Cluster analysis
Cluster analysis is a class of techniques that are used to classify objects or cases into
relative groups called clusters. In cluster analysis, there is no prior information
about the group or cluster membership for any of the objects.
• While doing cluster analysis, we first partition the set of data into groups
based on data similarity and then assign the labels to the groups.
• The main advantage of clustering over classification is that it is adaptable
to changes and helps single out useful features that distinguish different
groups.

Cluster Analysis used in outlier detection applications such as detection of credit


card fraud. As a data mining function, cluster analysis serves as a tool to gain insight
into the distribution of data to observe the characteristics of each cluster.

Multidimensional Scaling
Multidimensional scaling (MDS) is a technique that creates a map displaying the
relative positions of several objects, given only a table of the distances between
them. The map may consist of one, two, three, or even more dimensions. The
program calculates either the metric or the non-metric solution. The table of
distances is known as the proximity matrix. It arises either directly from
experiments or indirectly as a correlation matrix.

Correspondence analysis
Correspondence analysis is a method for visualizing the rows and columns of a table
of non-negative data as points in a map, with a specific spatial interpretation. Data
are usually counted in a cross-tabulation, although the method has been extended
too many other types of data using appropriate data transformations. For cross-
tabulations, the method can be considered to explain the association between the
rows and columns of the table as measured by the Pearson chi-square statistic. The
method has several similarities to principal component analysis, in that it situates
lOMoAR cPSD| 44911668

the rows or the columns in a high-dimensional space and then finds a best-fitting
subspace, usually a plane, in which to approximate the points.

A correspondence table is any rectangular two-way array of non-negative quantities


that indicates the strength of association between the row entry and the column
entry of the table. The most common example of a correspondence table is a
contingency table, in which row and column entries refer to the categories of two
categorical variables, and the quantities in the cells of the table are frequencies.

The Objective of multivariate analysis


(1) Data reduction or structural simplification: This helps data to get simplified
as possible without sacrificing valuable information. This will make interpretation
easier.
(2) Sorting and grouping: When we have multiple variables, Groups of “similar”
objects or variables are created, based upon measured characteristics.
(3) Investigation of dependence among variables: The nature of the relationships
among variables is of interest. Are all the variables mutually independent or are one
or more variables dependent on the others?
(4) Prediction Relationships between variables: must be determined for the
purpose of predicting the values of one or more variables based on observations on
the other variables.
(5) Hypothesis construction and testing. Specific statistical hypotheses,
formulated in terms of the parameters of multivariate populations, are tested. This
may be done to validate assumptions or to reinforce prior convictions.

Dependence methods
Dependence methods are used when one or some of the variables are dependent on
others. Dependence looks at cause and effect; in other words, can the values of two or
more independent variables be used to explain, describe, or predict the value of
another, dependent variable? To give a simple example, the dependent variable of
“weight” might be predicted by independent variables such as “height” and “age.”

In machine learning, dependence techniques are used to build predictive models. The
analyst enters input data into the model, specifying which variables are independent
and which ones are dependent—in other words, which variables they want the model to
predict, and which variables they want the model to use to make those predictions.

Interdependence methods
lOMoAR cPSD| 44911668

Interdependence methods are used to understand the structural makeup and


underlying patterns within a dataset. In this case, no variables are dependent on others,
so you’re not looking for causal relationships. Rather, interdependence methods seek to
give meaning to a set of variables or to group them together in meaningful ways.

Hypotheses testing and Measures of Association:

What Is Hypothesis Testing?


Hypothesis testing is an act in statistics whereby an analyst tests an assumption
regarding a population parameter. The methodology employed by the analyst depends
on the nature of the data used and the reason for the analysis.

Hypothesis testing is used to assess the plausibility of a hypothesis by using sample


data. Such data may come from a larger population, or from a data-generating process.
The word "population" will be used for both of these cases in the following
descriptions.

How Hypothesis Testing Works


In hypothesis testing, an analyst tests a statistical sample, with the goal of providing
evidence on the plausibility of the null hypothesis.

Statistical analysts test a hypothesis by measuring and examining a random sample of


the population being analysed. All analysts use a random population sample to test two
different hypotheses: the null hypothesis and the alternative hypothesis.

The null hypothesis is usually a hypothesis of equality between population parameters;


e.g., a null hypothesis may state that the population mean return is equal to zero. The
alternative hypothesis is effectively the opposite of a null hypothesis (e.g., the
population mean return is not equal to zero). Thus, they are mutually exclusive, and
only one can be true. However, one of the two hypotheses will always be true.

4 Steps of Hypothesis Testing


All hypotheses are tested using a four-step process:
1. The first step is for the analyst to state the two hypotheses so that only one can
be right.
2. The next step is to formulate an analysis plan, which outlines how the data will
be evaluated.
3. The third step is to carry out the plan and physically analyse the sample data.
lOMoAR cPSD| 44911668

4. The fourth and final step is to analyse the results and either reject the null
hypothesis, or state that the null hypothesis is plausible, given the data.

Real-World Example of Hypothesis Testing


If, for example, a person wants to test that a penny has exactly a 50% chance of landing
on heads, the null hypothesis would be that 50% is correct, and the alternative
hypothesis would be that 50% is not correct.

Mathematically, the null hypothesis would be represented as Ho: P = 0.5. The


alternative hypothesis would be denoted as "Ha" and be identical to the null
hypothesis, except with the equal sign struck-through, meaning that it does not equal
50%.

A random sample of 100 coin flips is taken, and the null hypothesis is then tested. If it is
found that the 100 coin flips were distributed as 40 heads and 60 tails, the analyst
would assume that a penny does not have a 50% chance of landing on heads and would
reject the null hypothesis and accept the alternative hypothesis.

If, on the other hand, there were 48 heads and 52 tails, then it is plausible that the coin
could be fair and still produce such a result. In cases such as this where the null
hypothesis is "accepted," the analyst states that the difference between the expected
results (50 heads and 50 tails) and the observed results (48 heads and 52 tails) is
"explainable by chance alone."

Hypothesis Testing _ (Alternate Content)

Hypothesis testing is the use of statistics to determine the probability that a given
hypothesis is true. The usual process of hypothesis testing consists of four steps.

1. Formulate the null hypothesis (H NOT) (commonly, that the observations are the
result of pure chance) and the alternative hypothesis Ha (commonly, that the
observations show a real effect combined with a component of chance variation).
2. Identify a test statistic that can be used to assess the truth of the null hypothesis.
3. Compute the P-value, which is the probability that a test statistic at least as significant
as the one observed would be obtained assuming that the null hypothesis were true.
The smaller the P-value, the stronger the evidence against the null hypothesis.
4. Compare the P-value to an acceptable significance value alpha (sometimes called
an alpha value). If P<=alpha, that the observed effect is statistically significant, the null
hypothesis is ruled out, and the alternative hypothesis is valid.
lOMoAR cPSD| 44911668

Measures of Association:

Measure of association, in statistics, any of various factors or coefficients used


to quantify a relationship between two or more variables. Measures of association are
used in various fields of research but are especially common in the areas
of epidemiology and psychology, where they frequently are used to quantify
relationships between exposures and diseases or behaviours.

A measure of association may be determined by any of several different analyses,


including correlation analysis and regression analysis.

(Although the terms correlation and association are often used


interchangeably, correlation in a stricter sense refers to linear correlation,
and association refers to any relationship between variables.) The method used to
determine the strength of an association depends on the characteristics of the data for
each variable. Data may be measured on an interval/ratio scale, an ordinal/rank scale,
or a nominal/categorical scale. These three characteristics can be thought of as
continuous, integer, and qualitative categories, respectively

Methods of analysis
Pearson’s correlation coefficient
A typical example for quantifying the association between two variables measured on
an interval/ratio scale is the analysis of relationship between a person’s height and
weight. Each of these two characteristic variables is measured on a continuous scale.

The appropriate measure of association for this situation is Pearson’s correlation


coefficient, r (rho), which measures the strength of the linear relationship between two
variables on a continuous scale. The coefficient r takes on the values of −1 through +1.
Values of −1 or +1 indicate a perfect linear relationship between the two variables,
whereas a value of 0 indicates no linear relationship. (Negative values simply indicate
the direction of the association, whereby as one variable increases, the other decreases.)

Correlation coefficients that differ from 0 but are not −1 or +1 indicate a linear
relationship, although not a perfect linear relationship. In practice, ρ (the population
correlation coefficient) is estimated by r, which is the correlation coefficient derived
from sample data.
lOMoAR cPSD| 44911668

Spearman rank-order correlation coefficient


The Spearman rank-order correlation coefficient (Spearman rho) is designed to
measure the strength of a monotonic (in a constant direction) association between two
variables measured on an ordinal or ranked scale. Data that result from ranking and
data collected on a scale that is not truly interval in nature (e.g., data obtained
from Likert-scale administration) are subject to Spearman correlation analysis. In
addition, any interval data may be transformed to ranks and analysed with the
Spearman rho, although this results in a loss of information. Nonetheless, this approach
may be used, for example, if one variable of interest is measured on an interval scale
and the other is measured on an ordinal scale. Similar to Pearson’s correlation
coefficient, Spearman rho may be tested for its significance. A similar measure of
strength of association is the Kendall tau, which also may be applied to measure the
strength of a monotonic association between two variables measured on an ordinal or
rank scale.

As an example of when Spearman rho would be appropriate, consider the case where
there are seven substantial health threats to a community. Health officials wish to
determine a hierarchy of threats in order to most efficiently deploy their resources.
They ask two credible epidemiologists to rank the seven threats from 1 to 7, where 1 is
the most significant threat. The Spearman rho or Kendall tau may be calculated to
measure the degree of association between the epidemiologists’ rankings, thereby
indicating the collective strength of a potential action plan. If there is a significant
association between the two sets of ranks, health officials may feel more confident in
their strategy than if a significant association is not evident.

Chi-square test
The chi-square test for association (contingency) is a standard measure for association
between two categorical variables. The chi-square test, unlike Pearson’s correlation
coefficient or Spearman rho, is a measure of the significance of the association rather
than a measure of the strength of the association.

A simple and generic example follows. If scientists were studying the relationship
between gender and political party, then they could count people from a random sample
belonging to the various combinations: female-Democrat, female-Republican, male-
Democrat, and male-Republican. The scientists could then perform a chi-square test to
determine whether there was a significant disproportionate membership among those
groups, indicating an association between gender and political party.
lOMoAR cPSD| 44911668

Presenting Insights and findings using written reports and oral presentation:

Oral presentation:
Two words that are capable of striking fear into the hearts of even the most confident
student. But should they? Though not all of us can ever hope to reach the heady heights
of oratory genius achieved by the likes of Barack Obama or Martin Luther King Jr, there
are steps we can take to help us to present our point of view strongly.

Step 1: Research
Find out as much as you can about your chosen topic. The key skills for presenting
argument in the VCE English Study Design clearly state that you need to ‘conduct
research to support the development of arguments on particular issues and
acknowledge sources accurately and appropriately where relevant’. You are expected to
research your chosen topic so that you have a deep and nuanced understanding of the
issues and arguments. Read from multiple sources that present various points of view,
and take notes on the arguments used.

Step 2: Plan your overall approach


Great speeches very rarely just happen; they are carefully crafted pieces of writing. Use
your knowledge of argument and persuasive language as a basis for the development of
your oral presentation. Remember that you are required to provide a written statement
of intention to accompany your presentation. This statement of intention must outline
the decisions you have made in the planning process, and explain how these
demonstrate understanding of argument and persuasive language.

So, before you start writing, take the time to think carefully about the following aspects
of your presentation.

Your contention
Where do you stand on the issue? Why? Express this in a clear and direct sentence.
Avoid statements such as ‘Greyhound racing is bad’. This a vague and general opinion,
not a contention. A contention on this issue would be something like ‘The cruel and
abusive practice of greyhound racing should be banned immediately’.

Your context and audience


Who are you addressing? By that, I don’t mean your teacher or your classmates. Rather,
who is your imagined audience for the speech? This is important to keep in mind, as it
will inform the language choices you make. Furthermore, consider in what context you
lOMoAR cPSD| 44911668

would be addressing your audience. Is your speech designed to be delivered on the


steps of parliament at a rally or to a group of students at a graduation dinner? Decide
this before you start writing. And don’t be afraid to adopt a persona – this will allow you
broader scope in selecting a particular context and audience.
Once you have decided on your contention and on your context and audience, it is time
to consider some of the finer details of your presentation.

Your purpose
What do you want your imagined audience to think, feel or do? Do you wish to inform or
educate them? To create alarm? To effect change? Your purpose should be closely
related to your contention.

Your tone
What feelings are you seeking to communicate and to evoke in the audience? What
mood are you trying to generate? Will you be using humour to relax your audience? Will
you be hostile? Sympathetic? Will your tone change at any point and, if so, why?
All of the above are important factors to consider, as they will affect your language
choices and the persuasive language techniques you employ.

Step 3: Plan your arguments


Now you need to decide on your supporting arguments. For each argument, ask

Yourself:
What persuasive language techniques will I use?
What evidence will I present?

Try to vary your chosen techniques, and remember Aristotle’s principles of rhetoric –
logos (appeal to logic and reason), ethos (character of the speaker)
and pathos (emotional influence of the speaker). A strong argument will address all
three elements in varying degrees.

Step 4: Write the introduction


Good speeches start strongly. You need to grab the audience’s attention and make your
point of view clear from the outset. The way you begin should be consistent with your
audience and purpose. Strategies that you might consider are listed below.

Anecdote – this is a great way to highlight a personal connection to the issue or to strike
a sympathetic tone.
lOMoAR cPSD| 44911668

Statistics – if your purpose is to shock your audience or to promote change, this is a


great way to ‘hit them hard’ right from the outset.

Inclusive language – if you want to create a shared sense of purpose, make it clear to
your audience that they are part of this issue, and that how they feel matters.

Once you have your audience’s attention, introduce yourself (or your persona), clarify
the issue, state your contention and signpost your main arguments.

Step 5: Write the body


This is where all you’re planning from step 3 pays off!

For each body paragraph, ensure that you create strong topic sentences that clearly
highlight your main arguments, and then develop each argument using your carefully
selected language and evidence.

There are a few things that you should keep in mind as you write:
Cohesion is king! Keep your line of argument consistent and use connectives
throughout.

Analyse the evidence! Don’t just present a raft of statistics or evidence and expect them
to make the argument for you. Analyse their importance in relation to the debate.
Include some rebuttal! An issue has two sides – you need to rebut some or all arguments
from the opposing point of view.

Step 6: Write the conclusion


Aim to finish strongly. Reiterate your contention and then tell the audience what they
should think, feel or do. (This should directly relate to the purpose you decided on in the
planning stage).

To ensure that you finish on a powerful note, consider using an appeal, a rhetorical
question, or a call to action.

Step 7: Proofread and practise


Read your speech to friends or family and get their feedback. Did the line of argument
make sense to them? Did you persuade them? Did any parts of your speech lose their
attention? Take note of these responses and edit your speech as required.

You might also like