0% found this document useful (0 votes)
12 views

Quantitative data Analysis 2025.pptx

The document discusses the process and importance of quantitative data analysis, emphasizing the need for careful data preparation, selection of appropriate statistical techniques, and effective presentation of results. It outlines various types of quantitative data, stages of analysis, and key statistical methods such as descriptive and inferential statistics, including t-tests and ANOVA. The document also highlights the significance of hypothesis testing and the interpretation of statistical findings in research.

Uploaded by

bizunehnigussie5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Quantitative data Analysis 2025.pptx

The document discusses the process and importance of quantitative data analysis, emphasizing the need for careful data preparation, selection of appropriate statistical techniques, and effective presentation of results. It outlines various types of quantitative data, stages of analysis, and key statistical methods such as descriptive and inferential statistics, including t-tests and ANOVA. The document also highlights the significance of hypothesis testing and the interpretation of statistical findings in research.

Uploaded by

bizunehnigussie5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Quantitative Data

Analysis
By: Dr. Mulugeta W and Mr. Dagne Amdetsion
Portion 3
Key points

 Data must be analysed to produce information

 Computer software analysis is normally used for this process

 Data should be carefully prepared for analysis

 Researchers need to know how to select and use different charting and statistical
techniques
Main Concern

 Preparing, inputting and checking data

 Choosing the most appropriate statistics to describe the data

 Choosing the most appropriate statistics to examine data relationships and


trends
Defining the Data Type
Defining the Data Type
What is data analysis?

 A complex process that involves moving back and forth


 between concrete bits of data and abstract concepts
 between inductive and deductive reasoning
 between description and interpretation
 Simply put: Data analysis is the process of making meaning from the data
Analysis process needs to do four
things
 Describe the data clearly.
 Identify what is typical and atypical among the data.
 Bring to light differences, relationships, and other patterns existent in the
data; and
 Answer research questions or test hypotheses.
Effective Data Analysis

 Effective data analysis involves


 keeping your eye on the main game
 managing your data
 engaging in the actual process of quantitative and / or qualitative analysis
 presenting your data
 drawing meaningful and logical conclusions
Data analysis and interpretation

 Think about analysis EARLY


 Start with a plan
 Code, enter, clean
 Analyze
 Interpret
 Reflect
 What did we learn?
 What conclusions can we draw?
 What are our recommendations?
 What are the limitations of our analysis?
Quantitative data analysis

 quantitative data analysis is - the process of analyzing numerical data using


statistical methods to identify patterns and trends in research.
 Highlight the objective nature of quantitative data, relying on measurable
variables and standardized tools for data collection.
 Types of Quantitative Data:
 Continuous data: Explain examples like height, weight, temperature, where data
can take on any value within a range.
 Discrete data: Illustrate with examples like number of children, where data is
counted in whole numbers.
 Ordinal data: Discuss data with a ranking order, like level of agreement on a
survey.
 Nominal data: Explain categories with no inherent order, like gender or color.
Qualitative data Analysis Method
Data Preparation

 Step 1: Data Validation


 The purpose of data validation is to find out, as far as possible, whether the
data collection was done as per the pre-set standards and without any
bias. It is a four-step process, which includes…
 Fraud, to infer whether each respondent was actually interviewed or not.
 Screening, to make sure that respondents were chosen as per the research
criteria.
 Procedure, to check whether the data collection procedure was duly
followed.
 Completeness, to ensure that the interviewer asked the respondent all the
questions, rather than just a few required ones.
To do this,

 researchers would need to pick a random sample of completed surveys


and validate the collected data. (Note that this can be time-consuming for
surveys with lots of responses.) For example, imagine a survey with 200
respondents split into 2 cities. The researcher can pick a sample of 20
random respondents from each city. After this, the researcher can reach
out to them through email or phone and check their responses to a certain
set of questions.
Step 2: Data Editing

 Typically, large data sets include errors. For example, respondents may fill
fields incorrectly or skip them accidentally. To make sure that there are no
such errors, the researcher should conduct basic data checks, check for
outliers, and edit the raw research data to identify and clear out any data
points that may hamper the accuracy of the results.
 For example, an error could be fields that were left empty by respondents.
While editing the data, it is important to make sure to remove or fill all the
empty fields. (Here are 4 methods to deal with missing data.)
Step 3: Data Coding

 This is one of the most important steps in data preparation. It refers to


grouping and assigning values to responses from the survey.
 For example, if a researcher has interviewed 1,000 people and now wants
to find the average age of the respondents, the researcher will create age
buckets and categorize the age of each of the respondent as per these
codes. (For example, respondents between 13-15 years old would have
their age coded as 0, 16-18 as 1, 18-20 as 2, etc.)
Stages of Quantitative Data Analysis:

 Data Collection:
 Research design (experimental, correlational, descriptive)
 Sampling methods (random, stratified, convenience)
 Data collection instruments (surveys, questionnaires, standardized tests)
 Data Cleaning and Preparation:
 Missing data handling (imputation techniques)
 Outlier identification and management
 Data transformation (e.g., log transformation)
Analysing quantitative data

 Researchers' who have completed quantitative data gathering techniques


will typically have considerable numerical data that require analysis .
 As data are gathered, they are typically disorganised and made up of
separate bits of information.
 When numbers look this way , the meaning is not clear
How the collected data can fit within
each of these scales
 One object is different from another, you have a nominal scale
 One object is bigger or better or more of anything than another, you have
an ordinal scale.
 One object is so many units (degrees, inches) more than another you have
an interval scale;
 One object is so many times big or bright or bright or tall or heavy as
another you have a ratio scale
 Statistical analysis of these numbers is a way to focus and manage data.
 Mertler and Charles (2005) identify a number of different purposes for using
statistics. Specifically, statistics can:
 Summarise data and reveal what is typical and atypical within a group;
 Identify the relative standing of individuals within an identified cohort;
Cont’d

 Show relationship between and among variables;


 Show similarities and differences among groups;
 Identify error that is inherent in a sample selection;
 Test for significance of findings; and
 Support the researchers in making other inferences about population
 Statistics help to condense a vast body of data into an amount of
information that the mind can more easily understand.
 Statistics identifies patterns and relationships within the data that they may
otherwise go unnoticed.
Generating statistics

 For many, the thought of generating statistics can be quite scary


 The process of engaging with formulas and completing complex
calculation can be daunting.
 However modern technology, the software program such as SPSS (
Statistical Package for the Social Sciences)
Descriptive Statistics

 Typically descriptive statistics (also known as descriptive analysis) is the first


level of analysis. It helps researchers summarize the data and find patterns.
A few commonly used descriptive statistics are:
 Mean: numerical average of a set of values.
 Median: midpoint of a set of numerical values.
 Mode: most common value among a set of values.
 Percentage: used to express how a value or group of respondents within
the data relates to a larger group of respondents.
 Frequency: the number of times a value is found.
 Range: the highest and lowest value in a set of values.
Descriptive statistics
 Descriptive statistics provide absolute numbers. However, they do not
explain the rationale or reasoning behind those numbers. Before applying
descriptive statistics, it’s important to think about which one is best suited
for your research question and what you want to show. For example, a
percentage is a good way to show the gender distribution of respondents.
 Descriptive statistics are most helpful when the research is limited to the
sample and does not need to be generalized to a larger population. For
example, if you are comparing the percentage of children vaccinated in
two different villages, then descriptive statistics is enough.
 Since descriptive analysis is mostly used for analyzing single variable, it is
often called univariate analysis.
Descriptive statistics

 Calculated in order to report on and describe what happened during the


period of research.
 There are three basic categories of descriptive statistics, all of which are
frequently used by teacher- researchers. These categories are:
 Measures of central tendency.
 Measures of dispersion
 Measures of relationship

 Measures of central tendency (mean, median, mode)


 Measures of variability (standard deviation, variance)
Measures of central tendency

 Statistical procedures that indicate, with a single score, what is typical or


standard about a group of individuals. These indices are commonly used
when trying to describe the collective level of performance, attitude, or
opinion of a group of study participants. There are three measures of
central tendency: the mean, the median, and the mode.
Measures of dispersion
 Indicates what is different within a group of scores,
 It also indicates how much spread or diversity exist within a group of scores.
 The two primary measures of dispersion
 Range (H-L)
 Standard deviation is formally designed as the average distance to scores
away from the mean
Measures of relationship

 The third type of descriptive statistics measures relationship between


variables. There are numerous types of correlation coefficients, the name
given to these various measures the direction and degree of relation ship
between two variables. It is calculated when analysing data from studies
using correlation design.
Inferential Statistics
 An inferential statistics in research would focus on
 explaining how researchers use data from a sample to make inferences or
generalizations about a larger population,
 primarily through techniques like hypothesis testing, confidence intervals, and
regression analysis,
 allowing them to draw conclusions about relationships between variables
beyond just the observed data se
 Clearly define inferential statistics as a method to draw conclusions about a
population based on data collected from a representative sample
Common inferential statistical tests
 T-test: Comparing the means of two independent groups
 ANOVA (Analysis of Variance): Comparing the means of more than two
groups
 Chi-square test: Analyzing categorical data to assess differences between
groups
 Regression analysis: Examining the relationship between variables to
predict outcomes
Inferential and Descriptive

Descriptive Statistics  Inferential Statistics


Organize • Generalize from samples


to pops
Summarize
• Hypothesis testing
Simplify • Relationships among
Presentation of data variables
Describing data  Make predictions
Fundamentals of Hypothesis Testing

 Hypothesis testing is a technique for interpreting and drawing inferences


about a population based on sample data. It aids in determining which
sample data best support mutually exclusive population claims.
 Null Hypothesis (H0) - The Null Hypothesis is the assumption that the event
will not occur. A null hypothesis has no bearing on the study's outcome
unless it is rejected.
 H0 is the symbol for it, and it is pronounced H-naught.
 Alternate Hypothesis(H1 or Ha) - The Alternate Hypothesis is the logical
opposite of the null hypothesis. The acceptance of the alternative
hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.
Measures of Variability

2. Variance: (use all data points):

average of the distance that each score is from


the mean (Squared deviation from the mean)

3. Standard Deviation= SD= s2

4. Standard Error of the mean = SEM = SD/ n


Hypothesis
A statement about what findings are expected

null hypothesis
"the two groups will not differ“

alternative hypothesis
"group A will do better than group B"
"group A and B will not perform the same"
 Based on the results, the assumed null hypothesis is accepted or rejected. If
the null hypothesis is rejected, it indicates that data readings are strong
and are probably not due to chance.
 Null hypothesis rejected: Differences are statistically significant
 Null hypothesis accepted: Differences are not statistically significant
Possible Outcomes in
Hypothesis Testing (Decision)

Null is True Null is False


Correct
Accept Error
Decision
Type II Error

Correct
Reject Error
Decision
Type I Error
T-Test

 A t-test is an inferential statistic used in hypothesis testing to determine if


there is a statistically significant difference between the means of two
sample populations.
 A t-test is an inferential statistic used to determine if there is a significant
difference between the means of two groups and how they are related. T-
tests are used when the data sets follow a normal distribution and have
unknown variances, like the data set recorded from flipping a coin 100
times.
 The t-test is a test used for hypothesis testing in statistics and uses the t-
statistic, the t-distribution values, and the degrees of freedom to determine
statistical significance.
TAKEAWAYS

 KEY A t-test is an inferential statistic used to determine if there is a


statistically significant difference between the means of two variables.
 The t-test is a test used for hypothesis testing in statistics.
 Calculating a t-test requires the difference between the mean values from
each data set, the standard deviation of each group, and the number of
data values.
 T-tests can be dependent or independent.
Four assumptions are made while using a t-
test:

 The data collected must follow a continuous or ordinal scale, such as the
scores for an IQ test.
 The data is collected from a randomly selected portion of the total
population
 The data will result in a normal distribution of a bell-shaped curve.
 Equal or homogenous variance exists when the standard variations are
equal.
Using the T-Test

 Calculating a t-test requires three fundamental data values:


 The difference between the mean values from each data set, also known
as the mean difference
 The standard deviation of each group
 The number of data values of each group
Example of When a T-Test Would Be Used
 Imagine that a drug manufacturer tests a new medicine. Following
standard procedure, the drug is given to one group of patients, and a
placebo is given to another group called the control group. The placebo is
a substance with no therapeutic value and serves as a benchmark to
measure how the other group, administered the actual drug, responds.
 After the drug trial, the members of the control group reported an increase
in average life expectancy of three years. Members of the group that was
prescribed the new drug reported an increase in average life expectancy
of four years.
 Initial observation indicates that the drug is working. However, it is also
possible that the observation may be due to chance. A t-test can be used
to determine if the results are significant and applicable to the entire
population, or whether they are random and not due to the drug
intervention.
ANOVA

 ANOVA, which stands for Analysis of Variance, is a statistical test used to


analyze the difference between the means of more than two groups.
 A one-way ANOVA uses one independent variable, while a two-way
ANOVA uses two independent variables.
 One-way ANOVA exampleAs a crop researcher, you want to test the
effect of three different fertilizer mixtures on crop yield. You can use a one-
way ANOVA to find out if there is a difference in crop yields between the
three groups.
 Table of contents
When to use a one-way ANOVA
 Use a one-way ANOVA when you have collected data about
one categorical independent variable and one quantitative dependent
variable. The independent variable should have at least three levels (i.e. at
least three different groups or categories).
 ANOVA tells you if the dependent variable changes according to the level
of the independent variable. For example:
 Your independent variable is social media use, and you assign groups
to low, medium, and high levels of social media use to find out if there is a
difference in hours of sleep per night.
 Your independent variable is brand of soda, and you collect data
on Coke, Pepsi, Sprite, and Fanta to find out if there is a difference in
the price per 100ml.
 You independent variable is type of fertilizer, and you treat crop fields with
mixtures 1, 2 and 3 to find out if there is a difference in crop yield.
 The null hypothesis (H0) of ANOVA is that there is no difference among
group means. The alternative hypothesis (Ha) is that at least one group
differs significantly from the overall mean of the dependent variable.
 If you only want to compare two groups, use a t test instead.
How does an ANOVA test work?

 ANOVA determines whether the groups created by the levels of the


independent variable are statistically different by calculating whether the
means of the treatment levels are different from the overall mean of the
dependent variable.
 If any of the group means is significantly different from the overall mean, then
the null hypothesis is rejected.
 ANOVA uses the F test for statistical significance. This allows for comparison of
multiple means at once, because the error is calculated for the whole set of
comparisons rather than for each individual two-way comparison (which would
happen with a t test).
 The F test compares the variance in each group mean from the overall group
variance. If the variance within groups is smaller than the variance between
groups, the F test will find a higher F value, and therefore a higher likelihood that
the difference observed is real and not due to chance.
Assumptions of ANOVA
 The assumptions of the ANOVA test are the same as the general
assumptions for any parametric test:
 Independence of observations: the data were collected using statistically
valid sampling methods, and there are no hidden relationships among
observations. If your data fail to meet this assumption because you have
a confounding variable that you need to control for statistically, use an
ANOVA with blocking variables.
 Normally-distributed response variable: The values of the dependent
variable follow a normal distribution.
 Homogeneity of variance: The variation within each group being
compared is similar for every group. If the variances are different among
the groups, then ANOVA probably isn’t the right fit for the data.
Chi-square test:
 The Chi-Square test is a statistical procedure for determining the difference
between observed and expected data. This test can also be used to
decide whether it correlates to our data's categorical variables. It helps to
determine whether a difference between two categorical variables is due
to chance or a relationship between them.
 A chi-square test or comparable nonparametric test is required to test a
hypothesis regarding the distribution of a categorical variable. Categorical
variables, which indicate categories such as animals or countries, can be
nominal or ordinal. They cannot have a normal distribution since they only
have a few particular values.
Chi-Square Test Formula

Chi-Square Test Formula

Where

c = Degrees of freedom

O = Observed Value

E = Expected Value
 The degrees of freedom in a statistical calculation represent the number of
variables that can vary. The degrees of freedom can be calculated to
ensure that chi-square tests are statistically valid. These tests are frequently
used to compare observed data with data expected to be obtained if a
particular hypothesis were true.
 The Observed values are those you gather yourselves.
 The expected values are the anticipated frequencies, based on the null
hypothesis.
Types of Chi-Square Tests

 There are two main types of Chi-Square tests:


 Independence
 Goodness-of-Fit
Independence
 The Chi-Square Test of Independence is a derivable ( also known as
inferential ) statistical test which examines whether the two sets of variables
are likely to be related with each other or not. This test is used when we
have counts of values for two nominal or categorical variables and is
considered as non-parametric test. A relatively large sample size and
independence of obseravations are the required criteria for conducting
this test.
 Example:
 In a movie theatre, suppose we made a list of movie genres. Let us consider
this as the first variable. The second variable is whether or not the people
who came to watch those genres of movies have bought snacks at the
theatre. Here the null hypothesis is that th genre of the film and whether
people bought snacks or not are unrelatable. If this is true, the movie
genres don’t impact snack sales.
Goodness-Of-Fit

 In statistical hypothesis testing, the Chi-Square Goodness-of-Fit test


determines whether a variable is likely to come from a given distribution or
not. We must have a set of data values and the idea of the distribution of
this data. We can use this test when we have value counts for categorical
variables. This test demonstrates a way of deciding if the data values have
a “ good enough” fit for our idea or if it is a representative sample data of
the entire population.
 Example:
 Suppose we have bags of balls with five different colours in each bag. The
given condition is that the bag should contain an equal number of balls of
each colour. The idea we would like to test here is that the proportions of
the five colours of balls in each bag must be exact.
Chi-Square Test Examples

 1. Chi-Square Test for Independence


 Example: A researcher wants to determine if there is an association between
gender (male/female) and preference for a new product (like/dislike). The test
can assess whether preferences are independent of gender.
 2. Chi-Square Test for Goodness of Fit
 Example: A dice manufacturer wants to test if a six-sided die is fair. They roll the
die 60 times and expect each face to appear 10 times. The test checks if the
observed frequencies match the expected frequencies.
 3. Chi-Square Test for Homogeneity
 Example: A fast-food chain wants to see if the preference for a particular menu
item is consistent across different cities. The test can compare the distribution of
preferences in multiple cities to see if they are homogeneous.
Cont’d
 4. Chi-Square Test for a Contingency Table
 Example: A study investigates whether smoking status (smoker/non-smoker) is
related to the presence of lung disease (yes/no). The test can evaluate the
relationship between smoking and lung disease in the sample.
 5. Chi-Square Test for Population Proportions
 Example: A political analyst wants to see if voter preference (candidate A vs.
candidate B) is the same across different age groups. The test can determine if
the proportions of
Regression analysis
 Regression analysis is a statistical method used to examine the relationship
between a dependent variable and one or more independent variables,
allowing researchers to understand how changes in the independent
variables influence the dependent variable and potentially predict future
outcomes based on this relationship; essentially, it helps determine which
factors have the most significant impact on a particular outcome of
interest.
 Regression analysis allows for investigating the relationship between
variables.1 Usually, the variables are labelled as dependent or
independent. An independent variable is an input, driver or factor that has
an impact on a dependent variable (which can also be called an
outcome). For example, if we were to say age affects academic
performance of students, what will be the independent and dependent
variables here? Well here age is an independent variable, and it has the
potential to impact on outcome/dependent variable—in this case,
academic performance. Similarly, in the nurse educator's example, critical
thinking is a dependent variable and age, experience and training are
independent variables.
Purposes of regression analysis

 Regression analysis has four primary purposes: description, estimation,


prediction and control.
 By description, regression can explain the relationship between dependent
and independent variables.
 Estimation means that by using the observed values of independent
variables, the value of dependent variable can be estimated.
 Regression analysis can be useful for predicting the outcomes and changes
in dependent variables based on the relationships of dependent and
independent variables.
 Finally, regression enables in controlling the effect of one or more
independent variables while investigating the relationship of one
independent variable with the dependent variable
Types of Regression Analysis

 Simple Linear Regression: Analyzes the relationship between one


independent variable and one dependent variable.
 Multiple Linear Regression: Examines the relationship between multiple
independent variables and one dependent variable.
 Logistic Regression: Used when the dependent variable is binary (yes/no).

 Regression analysis includes several variations, such as linear, multiple linear,


and nonlinear. The most common models are simple linear and multiple
linear. Nonlinear regression analysis is commonly used for more
complicated data sets in which the dependent and independent variables
show a nonlinear relationship.
Regression Analysis – Linear Model Assumptions

 Linear regression analysis is based on six fundamental assumptions:


 The dependent and independent variables show a linear relationship
between the slope and the intercept.
 The independent variable is not random.
 The value of the residual (error) is zero.
 The value of the residual (error) is constant across all observations.
 The value of the residual (error) is not correlated across all observations.
 The residual (error) values follow the normal distribution.
Regression Analysis – Simple Linear Regression

 Simple linear regression is a model that assesses the relationship between a


dependent variable and an independent variable. The simple linear model
is expressed using the following equation:

Y = a + bX + ϵ

Where:

 Y – Dependent variable
 X – Independent (explanatory) variable
 a – Intercept
 b – Slope
 ϵ – Residual (error)
Regression Analysis – Multiple Linear Regression

 Multiple linear regression analysis is essentially similar to the simple linear


model, with the exception that multiple independent variables are used in
the model. The mathematical representation of multiple linear regression is:

Y = a + bX1 + cX2 + dX3 + ϵ

Where:

 Y – Dependent variable
 X1, X2, X3 – Independent (explanatory) variables
 a – Intercept
 b, c, d – Slopes
 ϵ – Residual (error)
Multiple Linear Regression
 Multiple linear regression follows the same conditions as the simple linear
model. However, since there are several independent variables in multiple
linear analysis, there is another mandatory condition for the model:
 Non-collinearity: Independent variables should show a minimum
correlation with each other. If the independent variables are highly
correlated with each other, it will be difficult to assess the true relationships
between the dependent and independent variables.

You might also like