0% found this document useful (0 votes)

6 views

UNIT 10

Unit 10 discusses statistical tests, defining them as tools for determining the significance of observations and explaining various types, including standard/z-scores and Student's t-tests. It covers the procedures for one-sample, paired, and unpaired t-tests, as well as ANOVA for comparing means across groups. Additionally, it introduces chi-square tests for analyzing categorical data and outlines the importance of selecting appropriate statistical tests based on study design and variable types.

Uploaded by

makhetidavid

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

UNIT 10

Uploaded by

makhetidavid

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

UNIT 10: STATISTICAL TESTS

Unit objectives
1. Define the statistical test
2. Explain types of statistical test
10.1 Definition
A test used to determine the statistical significance of an observation.
A statistical test provides a mechanism for making quantitative decisions about a
process or processes
10.2 Types of statistical test
Standard/z-score and the Student’s t-test
Standard/z-score
In statistics, a standard score indicates how many standard deviations an observation or
datum is above or below the mean. It is a dimensionless quantity derived by subtracting
the population mean from an individual raw score and then dividing the difference by the
population standard deviation or by subtracting the population parameter mean from the
sample statistics and dividing by the standard error. This conversion process is called
standardizing or normalizing.
The standard deviation is the unit of measurement of the z-score. It allows comparison of
observations from different normal distributions, which is done frequently in research.
Standard scores are also called z-values, z-scores, normal scores, and standardized
variables. The use of "Z" is because the normal distribution is also known as
distribution". They are most frequently used to compare a sample to a standard normal
deviate (standard normal distribution, with μ = 0 and σ = 1).
Formula
The standard score is

x is a raw score to be standardized;

μ is the mean of the population;
σ is the standard deviation of the population.
Or
z= xbar-µ
s/√n
Where:
x bar is the sample meanµ is the population mean
s is the sample standard deviation
n is the sample size
The quantity z represents the distance between the raw score and the population mean
in units of the standard deviation. z is negative when the raw score is below the mean,
positive when above.
A key point is that calculating z requires the population mean and the population
standard deviation, not the sample mean or sample deviation. It requires knowing the
population parameters, not the sample statistic drawn from the population of interest.
But knowing the true standard deviation of a population is often unrealistic except in
cases such as standardized testing, where the entire population is measured. In cases
where it is impossible to measure every member of a population, the standard deviation
may be estimated using a random sample. For example, a population of people who
smoke cigarettes is not fully measured.
Student's t-test
The t-statistic was introduced in 1908 by William Gosset, a chemist working for the
Guinness brewery in Dublin, Ireland ("Student").
It’s used when sample size is less than 30. When the sample size is large, it is assumed
that it will produce a normal distribution hence z-distribution is used.
W. Gosset using the name student showed that the mean of a sample from a normal
distribution with unknown variance has a distribution that is similar to but not quite the
same as a normal distribution. He called it the t-distribution (‘student’t-distribution). This
has been used in hypothesis testing for small size sample.
The t-distribution has one parameter, a quantity called the df. The concept is one of the
most elusive statistical ideas. Df is calculated as the sample size minus the number of
estimated parameters. A single sample of n observations will have n-1 df.
NB: The application of the t-test distribution has also been lately embraced in studies
with sample size more than 30.
T-test in a single group
1. One sample t test
In a study where a single group of individuals randomly drawn from a population
assumed to be normally distributed, we can perform a one sample t-test to determine
this. We need to calculate the sample statistic and then use it to estimate population
parameter. For example, we may have a sample of patients with a specific medical
condition e.g. you have been monitoring triglyceride levels in blood of healthy individuals
and know that they have a geometric mean of 1.74 mmol/litre. You wish to know
whether the average level in the population from which the patients come is the same as
this value. The test that is used to determine this is one sample t-test.
This is done by testing a hypothesis.
To test the hypothesis, you begin by:
1. Stating the hypotheses i.e. the null and alternative hypothesis
 H0; the mean in the population,µ, is equal to sample statistic xbar
 H1; the mean in the population,µ, is not equal to sample statistic xbar
2. Collect relevant data from a sample of individuals
3. Calculate the value of the test statistic specific to the null hypothesis
You can use the z-score or the t-distribution

4. Compare the value of the test statistic to values from the known probability distribution

in this case the t-distribution or the z-score distribution.

5. Interpret the p-value and results and decide to reject or fail to reject H0.

In testing the null hypothesis that the population means is equal to a specified value μ0,

one uses the statistic

Where s is the sample standard deviation of the sample and n is the sample size. The
degrees of freedom used in this test is n − 1.
2. Paired sample t test

This is usually used to compare means in two paired group. For example:
The variable mat be measured on each individual in two circumstances e.g before
intervention (baseline information) and after the intervention or a patient may have two
data sets e.g. during treatment and while taking a placebo
Means in the two different data sets can be compared. This is done by following the
following procedure:
1. Define the null and alternative hypothesis
 H0; the mean in the sample statistic 1 is equal to sample statistic 2
 H1; the mean in the sample statistic 1 is not equal to sample statistic 2
2. Collect relevant data from a sample of individuals
3. Calculate the value of the test statistic specific to the null hypothesis
4. Compare the value of the test statistic to values from the known probability distribution
in this case the t-distribution or the z-score distribution.
5. Interpret the p-value and results and decide to reject or fail to reject H0.
Note:
The non-parametric test for this is the wilcoxon signed ranks test. This done based on
the median values and not the mean.
3. Unpaired sample t-test
In the population, the variable being measured is normally distributed in each group but
the variance of the two groups are the same.
We consider the difference in the means of the two groups. Under the null hypothesis
that the population means in the two groups are the same, this difference will be equal
to zero. Therefore we use a test statistics based on the difference in the two sample
means and on the value of the difference in the 2 groups means under the null
hypothesis. The test statistic often referred to as t, follows the t-distribution
Note: The non-parametric test for this is the wilcoxon sum ranks test. This done based
on the median values and not the mean.
ANOVA (analysis of variance)
– We may have samples from a number of independent groups. We can determine a single
numerical variable in each of the groups e.g. the average platelets count in women of
different ethnic groups.
– Although we could perform tests to compare the averages in each pair of groups, it is
better to have a statistical tool that can compare the groups all at once. This is done by
ANOVA
– The one way analysis of variance separates the total variability in the data into that
which can be attributed to the differences between the individuals from the different
groups (the between group variation) and to the random variation between the
individuals within each group (within-group variation).
– Under the null hypothesis that the group means are the same, the between group
variance will be similar to the within group variance.
Key points to note in ANOVA
– ANOVA is used to see an association between a continuous outcome variable (such as
mean HAZ score) and a categorical determining variable (such as iodized salt
consumption).
– The ANOVA is a Statistics option under the Means function that allows for testing the
difference between the mean outcome scores for the two or more categories of the
determining variable.
– According to SPSS 8.0, Analysis of Variance Analysis, or ANOVA, is a method of testing
the null hypothesis that several group means are equal in the population, by comparing
the sample variance estimated from the group means to that estimated within the
groups.
– One -way ANOVA- According to SPSS 8.0, The One-Way ANOVA procedure produces a
one-way analysis of variance for a quantitative dependent variable by a single factor
(independent) variable. Analysis of variance is used to test the hypothesis that several
means are equal. This technique is an extension of the two-sample t test.

Note:
The non-parametric test for this is the Kruskal-walis test. This done based on the median
values and not the mean.
Statistical Tests used in Data analysis
Deciding which statistical test to use depends on:
– The design of the study
– The type of the variable
– The distribution of the data (normal, binomial or not normal)
When performing analysis to answer a research question, it is important to first identify
the types of variables that will be used and choosing an outcome variable and one or
more potential "independent" or determining variables. Once this is done, you must
decide how you would like to use these in a statistical test to see if a relationship exists.
The table below gives an idea of how to choose the appropriate test to use for statistical
analysis depending on the variables you have chosen:
PREDICTOR VARIABLE
OUTCOME VARIABLE
(S)
Categorical Continuous
t-test, ANOVA (Analysis of
Chi Square, Log
Categorical Variance), Linear
linear, Logistic
regression
Logistic Linear regression, Pearson
Continuous
regression correlation
Mixture of Logistic Linear regression, Analysis
Categorical and
regression of Covariance
Continuous

Chi-Square

THE 2X2 TABLE AND CHI SQUARE TEST OF ASSOCIATION

Many epidemiological investigations are based on data of a qualitative nature. If a

patient is treated with a particular drug it may be relatively easy to assess whether there

has been response to treatment and an improvement or deterioration in the condition,

but it is usually more difficult to measure such changes on a continuous scale.

In epidemiological investigation, individuals must often be placed exclusively into one of

a limited no. of categories such as dead/alive, drinks alcohol/ does not drink alcohol,

smokes/ does not smoke etc. It’s therefore important that we have means of analyzing

categorical /qualitative data of this kind.

The 2x2 table

These are tables that can arise from categorical data that only has 2 possibilities. It’s a

table in which we have divided individuals into 2 classes with respect to 2 different

attributes and we wish to test for an association between the 2 attributes.

For example if we wish to test the effectiveness of a new vaccine against cholera in an

area where the disease is common, we might take a group of 450 people and inoculate

200 of them chosen randomly with a new vaccine. The whole group of 450 is then

followed up over a sufficient period of time to determine how many people are attacked.

Suppose 15 people among the inoculated and 100 among the non-inoculated are

attacked, the results can be presented in the form of a 2x2 table showing the association

between inoculation and infection.

2x2 table showing results

Inoculation with new vaccine results of

exposure

Attacked Not attacked

Total
Yes 15 185

200

No 100 150

250

Total 115 335

450

In the figure, the last row and column are called marginal totals and the total 450, the

grand total.

Chi-Square tests
The results in the above table look as though the new vaccine offers protection to a
considerable extent. This kind of data may be used to make a decision on whether or not
to vaccinate a population of a town in order to restrict the seriousness of an outbreak of
disease. Initiating such an intervention may require lots of money. It’s therefore
important to make a right decision. This can be done through hypothesis testing.
We need to start by stating the hypothesis that the vaccine is useless (H0). We then seek
a summary measure which will give some indication of how far the observed data depart
from the H0.
Overall, 115 people were attacked out of 450 under study (25.6%). If the null were true,
we would expect 25.6% of individuals to be attacked in both the inoculated and
uninoculated. That is we would expect the following people to be attached among the
inoculated group of 225.
115/450 x 200 =51.1
With fixed margin totals the other expected result are obtained by subtraction as shown
below

Expected results of the data in table above

Inoculation attacked not attacked

Total

Yes 51.1 148.9

200
No 63.9 186.1
250

Total 115 335

450

We should then construct a summary statistic which measures the departure of the
observed from the expected results. In this case, we use the χ2
χ2 = ∑(O-E)2/E
Where O is the observed value in every cell
E is the expected value in every cell
The value is then compared with the value on a chi-square table. A 2x2 table of chi-
square has 1 degree of freedom hence with a set p value at 0.05; the chi-square value is
usually 3.84. So if your chi-square value is bigger than this, then it means that you shall
have a less probability value than 0.05. This with therefore be significantly different
hence reject your H0.
Alternative formula for χ2
If we were to denote the entries in the 2x2 table as follows

Column
1 2
Total

1 a b
r1
Row
2 c d
r2

Total s1 s2
N

Then
χ2 = (ad-bc) 2 N/ s1s2 r1 r1
Example
Suppose 53 children are selected for a study of the effectiveness of an anti-malarial
drug. 22 of them, selected at random are given the drug and the remaining 31 a dummy
tablet. At the end of two weeks the following results were recorded.

Drugs attacked not attacked

Total

Anti-malaria drug 2 20
22
Placebo 11 20
31

Total 13 40
53

Interpretation of Chi square results:

1. We can use the chi-square normal distribution by reading the P-value against
the chi-square value or
2. The critical value of 3.84 and if the chi-square value calculated does not
exceed this critical value, then we say that there is no association between the
variables in question (fail to reject the H0) and if it is greater than 3.84, then
there is an association hence reject the H0.

UNIT 11: CORRELATION AND REGRESSION ANALYSIS

Unit objectives

1. Explain the forms of correlation

2. Describe computation and interpretation of line of best fit
11.1 Correlation

This is used to study the possible linear (straight line) association between 2 quantitative
variables. This tells us how much the 2 variables are associated.

A simple and effective way of examining the relationship between quantitative variables
is to use a scatter plot. This is so called because a scatter of points one for each
individual. For example, in a study of hypertension in 37 women; clinicians were
interested in the relationship between age and systolic blood pressure (SBP).
We can demonstrate how each point is plotted to find out the values for x and y for the
arrowed point, we draw a horizontal and vertical line to the x and y axis from the point.
In practice, we know the values for x and y axis and then use them to plot.

Example 2

When we specifically want to examine whether hemoglobin level changes with age;

– Age –will be the explanatory variable for hemoglobin levels

– Hemoglobin level – will be the response variable

It is standard to plot:

– The explanatory variable usually the exposure on the x-axis (horizontal axis)
– The response variable usually the outcome variable (dependent) on the y-axis (vertical
axis)

We can then look at the scatter plot and find out whether there is an association
between two quantitative variables. To measure the degree of association, we calculate
the correlation co-efficient (r).

The standard method is to calculate the Pearson’s Correlation Co-efficient denoted as r.

Pearson’s Correlation Co-efficient

This is a correlation co-efficient that measures the scatter of the points around an
underlying linear trend (straight line). This co-efficient can take any value from -1 to +1.
Pearson’s correlation co-efficient uses the difference of all the points from the overall
mean
If the correlation co-efficient is -1 or +1, then the points in a scatter plot will lie exactly
on the straight line.
Note:
– The correlation is positive if high values of one variable are associated with higher values
of the other variable, but the points do not have to lie exactly on the straight line.
– The correlation is negative if values of one variable decreases as the values of the other
variable increases but the points do not have to lie exactly on a straight line.
– If there is no linear relationship, then the correlation is zero. But one should be careful
when r=0, there could be a strong non-linear relationship between the two variables
Assumptions for use of correlation
– A Pearson’s correlation co-efficient may be calculated for any data set. However, it is
more meaningful when the 2 variables have an approximately normal distribution
– Another assumption is that all observations should be independent i.e. that only one
observation for each variable should come from each individual in the study
Interpretation of correlation
 The correlation co-efficient r can lie between -1 to +1.
 A value of +1, indicates a perfect correlation
 A value of -1 indicates a perfect negative correlation
 A value of zero, indicates no linear association between the two variables
 A zero correlation does not always indicate no relation since the relationship may be
non-linear e.g. curvilinear relationship hence other methods may be used to describe
such relationships.
 One way of looking at correlation co-efficient is to calculate 100r2. This gives the % of
variability of the data that is explained by a linear association between the 2 variables.
By calculating this, you can be able to get the % of variability and be able to know
whether the variability is high or not.
Key Points
 In general, correlation is useful in generating hypothesis rather than testing a hypothesis
and that’s why there are no hypothesis testing and P-values in this session.
There are points to remember on correlation
1. The data should be shown on a scatter plot
2. The correlation co-efficient r, should be given to 2 decimal places
3. The number of observations should always be stated
4. Correlation tells us about the strength of association i.e. the degree of association
11. 2 Regression
We may also want to know what the other value of one of the variables likely to be when
we know the value of the other. The method used in this is linear regression. To describe
the relationship between these two quantitative variables and to predict the value of one
given the other variable we shall use linear regression
Linear Regression
Linear regression is used quite often to test the relationship of this outcome with a
combination of continuous and categorical determining variables (such as illness,
feeding practices including breastfeeding, environmental influences, and care practices
among others). According to SPSS 8.0, Linear Regression estimates the coefficients of
the linear equation, involving one or more independent variables, that best predict the
value of the dependent variable. For example, you can try to predict a salesperson's
total yearly sales (the dependent variable) from independent variables such as age,
education, and years of experience.
Regression Coefficients.
Estimates displays Regression coefficient B, standard error of B, standardized coefficient
beta, t value for B, and two-tailed significance level of t. Confidence intervals displays
95% confidence intervals for each regression coefficient, or a covariance matrix. Model
fit. The variables entered and removed from the model are listed, and the following
goodness-of-fit statistics are displayed: multiple R, R2 and adjusted R2, standard error of
the estimate, and an analysis-of-variance table.
R squared change. Displays changes in R**2 change, F change, and the significance of F
change.
When conducting regression test, one aims at describing the linear relationship between
two quantitative variables and explains the value of one variable given the value of
another.
When one of the variables is thought to depend on the other, it is more appropriate to
quantify the relationship between them. If we can do this, we can also be able to
estimate the value of the other.
The method we use to do this is called regression. In this session we shall focus on linear
regression.
Regression: Studies the relationship between 2 variables when one of them depends on
the other. This also allows one variable to be estimated given the values of the other.
Description of the relationship
The line through the data suggest that with an increase in age, SBP increases too
Remember, we referred to the variable in the horizontal axis as x and the variable on the
vertical axis as y. Unlike correlation, it is important to know which variables go to which
axis for regression.
When we are interested in explaining one variable, the variable is the explanatory
variable and the y variable is the response or outcome variable.
In the above example, researchers were interested in how SBP varied with age. Thus,
SBP will be explained by age i.e. for a given age with a regression line, we could explain
the relationship between age and SBP and say what average SBP would be for that age.
So SBP is the variable that will be explained by age.
Fitting a regression line
The aim is to fit a straight line to the data that best describes the relationship between x
and y. This is called the line of best fit
Y= α+βX
Further Explanation and Application of Linear Regression
Linear Regression modeling is used for a quantitative response outcome. For example,
data (n = 55) on the age and the systolic BP were collected and we want to set-up a
Linear Regression Model to predict BP with age. Here we could, after checking the
normality assumptions for both variables, do a bivariate correlation (Pearson’s
correlation = 0.696, p<0.001) and a graphical scatter plot would be helpful. See fig.
below

Figure: Scatter Plot

There’s a moderately strong correlation between age and systolic BP but how could we

‘quantify’ this strength

UNIT 12: INTRODUCTION TO SAMPLING METHODS

Unit objectives
1. Define terms used in sampling methods
2. Explain types of sampling and their limitation
12.1 Definition of terms
 Sampling is the process of selecting a representative group from the population under
study.
 The target population is the total group of individuals from which the sample might be
drawn.
 A sample is the group of people who take part in the investigation. The people who take
part are referred to as “participants”.
 Generalizability refers to the extent to which we can apply the findings of our research
to the target population we are interested in.
12.2 Types of Sampling
Two general approaches of sampling are used in social science research. With probability
sampling, all elements (e.g., persons, households) in the population have some
opportunity of being included in the sample, and the mathematical probability that any
one of them will be selected can be calculated. With non-probability sampling, in
contrast, population elements are selected on the basis of their availability (e.g.,
because they volunteered) or because of the researcher's personal judgment that they
are representative.
Probability Sampling includes: Simple Random sampling, Systematic sampling,
Stratified random sampling, Multistage sampling, Multiphase sampling and Cluster
sampling
Non-Probability Sampling includes: Convenience sampling, Purposive sampling and
Quota sampling.
Simple Random Sampling: It‘s applicable when population is small, homogeneous &
readily available. All subsets of the frame are given an equal probability and each
element of the frame thus has an equal probability of selection. It provides for greatest
number of possible samples where it‘s done by assigning a number to each unit in the
sampling frame. A table of random number or lottery system is used to determine which
units are to be selected.
Advantages
– Estimates are easy to calculate.
– Simple random sampling is always an EPS design, but not all EPS designs are simple
random sampling.
Disadvantages
– If sampling frame is large, this method impracticable.
– Minority subgroups of interest in population may not be present in sample in sufficient
numbers for study.

Systematic Sampling
It relies on arranging the target population according to some ordering scheme and then
selecting elements at regular intervals through that ordered list. The sampling technique
involves a random start and then proceeds with the selection of every kth element from
then onwards. In this case, k = (population size/sample size). It is important that the
starting point is not automatically the first in the list, but is instead randomly chosen
from within the first to the kth element in the list and a simple example would be to
select every 10th name from the telephone directory (an 'every
10th' sample, also referred to as 'sampling with a skip of 10').
Advantages
– Sample easy to select
– Suitable sampling frame can be identified easily
– Sample evenly spread over entire reference population
Disadvantages
– Sample may be biased if hidden periodicity in population coincides with that of selection.
– Difficult to assess precision of estimate from one survey.

Stratified Sampling
It‘s a technique where population embraces a number of distinct categories; the frame
can be organized into separate "strata." Each stratum is then sampled as an
independent sub-population, out of which individual elements can be randomly selected.
Advantages
– Every unit in a stratum has same chance of being selected.
– Using same sampling fraction for all strata ensures proportionate representation in the
sample.
– Adequate representation of minority subgroups of interest can be ensured by
stratification and varying sampling fraction between strata as required.
– Each stratum is treated as an independent population hence different sampling
approaches can be applied to different strata.
Disadvantages
– Sampling frame of entire population has to be prepared separately for each stratum
– When examining multiple criteria, stratifying variables may be related to some, but not
to others, further complicating the design, and potentially reducing the utility of the
strata.
– In some cases (such as designs with a large number of strata, or those with a specified
minimum sample size per group), stratified sampling can potentially require a larger
sample than would other method
Stratification is sometimes introduced after the sampling phase in a process called post
stratification‘. This approach is typically implemented due to a lack of prior knowledge of
an appropriate stratifying variable or when the researcher lacks the necessary
information to create a stratifying variable during the sampling phase. Although the
method is susceptible to the pitfalls of post hoc approaches, it can provide several
benefits in the right situation. Implementation usually follows a simple random sample.
In addition to allowing for stratification on an ancillary variable, post stratification can be
used to implement weighting, which can improve the precision of a sample's estimates.
Choice-based sampling is one of the stratified sampling strategies. In this, data are
stratified on the target and a sample is taken from each strata so that the rare target
class will be more represented in the sample. The model is then built on this biased
sample. The effects of the input variables on the target are often estimated with more
precision with the choice-based sample even when a smaller overall sample size is taken
compared to a random sample. The results usually must be adjusted to correct for the
oversampling
Cluster Sampling
Cluster sampling is an example of 'two-stage sampling' where in the first stage a sample
of areas are chosen and in the second stage a sample of respondents within those areas
is selected.
Population divided into clusters of homogeneous units, usually based on
geographical contiguity. Sampling units are groups rather than individuals and a sample
of such clusters is then selected where all units from the selected clusters are studied
Advantages
– It cuts down on the cost of preparing a sampling frame.
– This can reduce travel and other administrative costs.
Disadvantages
– Sampling error is higher for a simple random sample of same size.
– Often used to evaluate vaccination coverage in EPI Difference between Strata and
Clusters
– Although strata and clusters are both non-overlapping subsets of the population, they
differ in several ways.
– All strata are represented in the sample; but only subsets of clusters are in the sample.
– With stratified sampling, the best survey results occur when elements within strata are
internally homogeneous. However, with cluster sampling, the best results occur when
elements within clusters are internally heterogeneous
Multistage Sampling
It involves a complex form of cluster sampling in which two or more levels of units
are embedded one in the other. This technique is essentially the process of taking
random samples of preceding random samples though not as effective as true random
sampling, but probably solves more of the problems inherent to random sampling.
It‘s an effective strategy because it banks on multiple randomizations and its used
frequently
when a complete list of all members of the population does not exists or it‘s
inappropriate.
Advantages
Survey by such procedure is less costly, less laborious & more purposeful

Quota Sampling
In this technique of sampling, the population is first segmented into mutually exclusive
subgroups, just as in stratified sampling and then a judgment used to select subjects or
units from each segment based on a specified proportion.
For example, an interviewer may be told to sample 200 females and 300 males between
the age of 45 and 60. It is this second step which makes the technique one of non-
probability sampling.
In quota sampling the selection of the sample is non-random. For example interviewers
might be tempted to interview those who look most helpful. The problem is that these
samples may be biased because not everyone gets a chance of selection. This random
element is its greatest weakness and quota versus probability has been a matter of
controversy for many years.
Convenience Sampling
Also called grab or opportunity sampling or accidental or haphazard sampling.
It‘s a type of non-probability sampling which involves the sample being drawn from that
part of the population which is close to hand i.e. readily available and convenient.
The researcher using such a sample cannot scientifically make generalizations about the
total population from this sample because it would not be representative enough and
this type of sampling is most useful for pilot testing

Tak Wing Chan - Social Status and Cultural Consumption (2010, Cambridge University Press) PDF
No ratings yet
Tak Wing Chan - Social Status and Cultural Consumption (2010, Cambridge University Press) PDF
285 pages
Employee Attrition Risk Assessment Report - Global Organization by The Brew (Https://thebrew - In)
No ratings yet
Employee Attrition Risk Assessment Report - Global Organization by The Brew (Https://thebrew - In)
26 pages
Comparison of Means: Hypothesis Testing
No ratings yet
Comparison of Means: Hypothesis Testing
52 pages
Tests of Significance
No ratings yet
Tests of Significance
35 pages
Parametric Test
No ratings yet
Parametric Test
49 pages
T - Test
No ratings yet
T - Test
45 pages
Ttest
No ratings yet
Ttest
8 pages
Application of Student T Test and Paired T Test
No ratings yet
Application of Student T Test and Paired T Test
13 pages
T-Tests & Chi2
No ratings yet
T-Tests & Chi2
35 pages
Notes Unit-4 BRM
No ratings yet
Notes Unit-4 BRM
10 pages
Z Test Formula
No ratings yet
Z Test Formula
6 pages
Statistics - The Big Picture
No ratings yet
Statistics - The Big Picture
4 pages
IV_AI-DS_AD3491_FDSA_Unit4
No ratings yet
IV_AI-DS_AD3491_FDSA_Unit4
31 pages
Z and T Test
No ratings yet
Z and T Test
7 pages
statistics-in-psychology_compress notes
No ratings yet
statistics-in-psychology_compress notes
11 pages
Chapter 5 Descriptive Inferential Statistics
No ratings yet
Chapter 5 Descriptive Inferential Statistics
33 pages
Lecture 7.descriptive and Inferential Statistics
No ratings yet
Lecture 7.descriptive and Inferential Statistics
44 pages
Data Analysis Final
No ratings yet
Data Analysis Final
61 pages
Advance Data Analysis - Students
No ratings yet
Advance Data Analysis - Students
82 pages
Central Tendency Dispersion Visualization
No ratings yet
Central Tendency Dispersion Visualization
34 pages
Introduction To Statistics - 2023-2024
No ratings yet
Introduction To Statistics - 2023-2024
38 pages
Parametric Tests
No ratings yet
Parametric Tests
57 pages
Testing The Difference - T Test Explained
No ratings yet
Testing The Difference - T Test Explained
17 pages
Data Analysis Lecture
No ratings yet
Data Analysis Lecture
17 pages
W1 - Introduction To Statistics
No ratings yet
W1 - Introduction To Statistics
58 pages
Application of Student's T Test, Analysis of Variance, and Covariance
No ratings yet
Application of Student's T Test, Analysis of Variance, and Covariance
5 pages
ACA-22-407
No ratings yet
ACA-22-407
5 pages
1 Statistical Test and Their Issues I
No ratings yet
1 Statistical Test and Their Issues I
5 pages
Parametric and non-parametric
No ratings yet
Parametric and non-parametric
35 pages
Statistical Tests
No ratings yet
Statistical Tests
11 pages
SPSS Assignment
No ratings yet
SPSS Assignment
6 pages
LESSON-8
No ratings yet
LESSON-8
7 pages
T (Ea) For Two
No ratings yet
T (Ea) For Two
31 pages
Test On Variables: in Surveys, The Foolish Ask Questions, Wise Cannot Answers
No ratings yet
Test On Variables: in Surveys, The Foolish Ask Questions, Wise Cannot Answers
24 pages
Parametric and Non Parametric Test
100% (4)
Parametric and Non Parametric Test
36 pages
Z-Test and T-Test
No ratings yet
Z-Test and T-Test
6 pages
RM Module 4
No ratings yet
RM Module 4
22 pages
BRM Unit V
No ratings yet
BRM Unit V
99 pages
Statistical Inference: (Analytic Statistics) Lec 10
No ratings yet
Statistical Inference: (Analytic Statistics) Lec 10
42 pages
Chapter 5. t-Tests BS V
No ratings yet
Chapter 5. t-Tests BS V
18 pages
Group 8 (Semblante, Lague, Peras, Rama) T-Test: Value
No ratings yet
Group 8 (Semblante, Lague, Peras, Rama) T-Test: Value
11 pages
Unit 5 Mba 1ST
No ratings yet
Unit 5 Mba 1ST
197 pages
Hypothesis Testing Parametric and Non Parametric Tests
No ratings yet
Hypothesis Testing Parametric and Non Parametric Tests
14 pages
Z Test and T Test
No ratings yet
Z Test and T Test
7 pages
Tests of Hypothesis
No ratings yet
Tests of Hypothesis
16 pages
Types of Statistical Hypothesis: Statistics
No ratings yet
Types of Statistical Hypothesis: Statistics
18 pages
WEEK 2 Modular
No ratings yet
WEEK 2 Modular
19 pages
3 - Data Analysis - Tests of Differences
No ratings yet
3 - Data Analysis - Tests of Differences
50 pages
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
No ratings yet
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
15 pages
Hypothesis Testing. BCApptx
No ratings yet
Hypothesis Testing. BCApptx
34 pages
ADA Binder
No ratings yet
ADA Binder
171 pages
Unit 3 Hypothesis
No ratings yet
Unit 3 Hypothesis
41 pages
Geog 3mb3 Section 4
No ratings yet
Geog 3mb3 Section 4
30 pages
Theory Hypothesis Design Data: To Answer / To Test Research Study Collect
No ratings yet
Theory Hypothesis Design Data: To Answer / To Test Research Study Collect
44 pages
Week7-Inferentionalstat - (Grup Differences)
No ratings yet
Week7-Inferentionalstat - (Grup Differences)
32 pages
Research Methodology - Module: 3: Prepare By: Prof. Vijay Bhatu
No ratings yet
Research Methodology - Module: 3: Prepare By: Prof. Vijay Bhatu
75 pages
Lesson For T Tests
No ratings yet
Lesson For T Tests
53 pages
Independent T Test..Final
No ratings yet
Independent T Test..Final
8 pages
Hypothesis Testing 2
No ratings yet
Hypothesis Testing 2
7 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Chapter 4
No ratings yet
Chapter 4
111 pages
10 1002@er 4382 PDF
No ratings yet
10 1002@er 4382 PDF
8 pages
EPICS Sample Report
No ratings yet
EPICS Sample Report
59 pages
Credit Scoring Statistical Techniques and Evaluation Criteria A Review of The Literature - USIR Version
No ratings yet
Credit Scoring Statistical Techniques and Evaluation Criteria A Review of The Literature - USIR Version
41 pages
The Titile of The Proposal Is DETERMINANTS of
No ratings yet
The Titile of The Proposal Is DETERMINANTS of
24 pages
BOJ Intervention Policy
No ratings yet
BOJ Intervention Policy
32 pages
Dummy Dependent Variable
100% (1)
Dummy Dependent Variable
58 pages
4th Sem Detailed Syllabus (B. Sc. in Data Science)
No ratings yet
4th Sem Detailed Syllabus (B. Sc. in Data Science)
5 pages
Review of Literature On Breast Self Examination in India
100% (1)
Review of Literature On Breast Self Examination in India
8 pages
Mcguire1996 6 PDF
No ratings yet
Mcguire1996 6 PDF
8 pages
Liver Patient Classifi Cation Using Logistic Regression
No ratings yet
Liver Patient Classifi Cation Using Logistic Regression
5 pages
CH 03 PPTaccessible
No ratings yet
CH 03 PPTaccessible
71 pages
Journal Pone 0281367
No ratings yet
Journal Pone 0281367
18 pages
Elmousalami-Elaskary2020 Article DrillingStuckPipeClassificatio
No ratings yet
Elmousalami-Elaskary2020 Article DrillingStuckPipeClassificatio
14 pages
ML Notes
No ratings yet
ML Notes
62 pages
Credit Scoring and Default Risk Prediction: A Comparative Study Between Discriminant Analysis & Logistic Regression
No ratings yet
Credit Scoring and Default Risk Prediction: A Comparative Study Between Discriminant Analysis & Logistic Regression
15 pages
The multivariate social scientist introductory statistics using generalized linear models Sofroniou download
100% (2)
The multivariate social scientist introductory statistics using generalized linear models Sofroniou download
63 pages
Datathon at UCI Resource Sheet
No ratings yet
Datathon at UCI Resource Sheet
15 pages
BI Bankai
No ratings yet
BI Bankai
27 pages
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
No ratings yet
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
19 pages
HealthCare Analytics - Day 1-5
No ratings yet
HealthCare Analytics - Day 1-5
196 pages
Interpreting Logistic Regression Output
No ratings yet
Interpreting Logistic Regression Output
1 page
Applied Regression II (Qixuan Chen) P8110 - Syllabus 2016
No ratings yet
Applied Regression II (Qixuan Chen) P8110 - Syllabus 2016
6 pages
Qualitative Response Regression Model - Probabilistic Models
No ratings yet
Qualitative Response Regression Model - Probabilistic Models
34 pages
Impact of Urbanization of Addis Abeba City On Peri-Urban Environment and Livelihoods
No ratings yet
Impact of Urbanization of Addis Abeba City On Peri-Urban Environment and Livelihoods
32 pages
Get (Ebook) Generalized Linear Models and Extensions by James W. Hardin, Joseph M. Hilbe ISBN 9781597182256, 1597182257 free all chapters
100% (15)
Get (Ebook) Generalized Linear Models and Extensions by James W. Hardin, Joseph M. Hilbe ISBN 9781597182256, 1597182257 free all chapters
67 pages
GRA 5917: Input Politics and Public Opinion: Panel Data Regression in Political Economy
No ratings yet
GRA 5917: Input Politics and Public Opinion: Panel Data Regression in Political Economy
24 pages
Data Analysis Finals1
0% (1)
Data Analysis Finals1
10 pages

UNIT 10

Uploaded by

UNIT 10

Uploaded by

UNIT 10: STATISTICAL TESTS

x is a raw score to be standardized;

in this case the t-distribution or the z-score distribution.

one uses the statistic

THE 2X2 TABLE AND CHI SQUARE TEST OF ASSOCIATION

Many epidemiological investigations are based on data of a qualitative nature. If a

has been response to treatment and an improvement or deterioration in the condition,

but it is usually more difficult to measure such changes on a continuous scale.

In epidemiological investigation, individuals must often be placed exclusively into one of

categorical /qualitative data of this kind.

The 2x2 table

attributes and we wish to test for an association between the 2 attributes.

between inoculation and infection.

2x2 table showing results

Inoculation with new vaccine results of

Attacked Not attacked

Total 115 335

Expected results of the data in table above

Inoculation attacked not attacked

Yes 51.1 148.9

Total 115 335

Drugs attacked not attacked

Interpretation of Chi square results:

UNIT 11: CORRELATION AND REGRESSION ANALYSIS

1. Explain the forms of correlation

– Age –will be the explanatory variable for hemoglobin levels

The standard method is to calculate the Pearson’s Correlation Co-efficient denoted as r.

Pearson’s Correlation Co-efficient

Figure: Scatter Plot

‘quantify’ this strength

UNIT 12: INTRODUCTION TO SAMPLING METHODS

You might also like