100% found this document useful (1 vote)
153 views67 pages

Research Designe and Basics of Stistics Manish Jain

It's very useful for researchers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
153 views67 pages

Research Designe and Basics of Stistics Manish Jain

It's very useful for researchers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

RESEARCH DESIGN

AND
BASICS OF STATISTICS
Prepared & Presented by

MANISH P. JAIN
Research Scholar(DS13CE006)

Credit Seminar- I

Guided by
Dr. S. S. Arkatkar
Assistant Professor
Civil Engg Dept, SVNIT, Surat
 RESEARCH FORMULATION AND DESIGN

 INTRODUCTION OF STATISTICS

 SAMPLING TECHNIQUES

 SCALING TECHNIQUES

 DATA ANALYSIS- ORGANIZE, COMPARING AND SUMMARIZE

 PROBABILITY AND PROBABILITY DISTRIBUTION

 HYPOTHESIS TESTING

 CORRELATION AND REGRESSION

 SOFTWARE FOR STATISTICAL ANALYSIS


RESEARCH
FORMULATION
AND
DESIGN
RESEARCH

Research is systematic, controlled, empirical, and


critical investigation of natural phenomena guided by
theory and hypotheses about the presumed relations
among such phenomena.
Or
It is Careful, systematic, patient study and investigation
in some field of knowledge.
RESEARCH METHODS/CLASSIFICATION
Type –I

1. Exploratory Research
- which structures and identifies new problems
2. Constructive Research
- which develops solutions to a problem
3. Empirical Research
- which tests the feasibility of a solution using
empirical evidence.
Type – II

1. Qualitative Research
- understanding of human behavior and the
reasons that govern such behavior

2. Quantitative Research
- systematic empirical investigation of
quantitative properties and phenomena and
their relationships
RESEARCHES OBJECTIVES
The objectives of a research project must indicate :

What is to be achieved by the study.

They should specify :

what will be done in study, where and for what


purpose.

The results are compare to the objectives. If the


objectives have not been spelled out clearly, the
project cannot be evaluated.
Components/Framework of Research
design

 Problem
 Statement
 Purposes
 Benefits
 Theory/literature
 Assumptions
 Background
 Variables
Cont…
 Measurement
 Methodology
 Sampling
 Data Analysis
 Conclusions
 Interpretations
 Recommendations
STATISTICS
Statistics is a subject consisting of scientific
methods for collecting and analyzing data and
drawing inferences from them.

Statistical analysis refers to a techniques to:


1. Describe
2. Explore
3. Understand
4. Prove
5. Predict
about a given problem based on sample datasets collected
from populations.
 To mathematically describe/depict our findings
 To draw conclusions from our results
 To test hypotheses
 To test for relationships among variables
Statistics: Two basic types?
 Descriptive objectives/ Comparative objectives/ hypotheses
research questions:
 Descriptive statistics
 Inferential Statistics

1. Can be applied to any 1. Allows for comparisons


measurements (quantitative across variables
or qualitative) 2. Hypothesis Testing
2. Offers a summary/ overview/
description of data. Does not
explain or interpret.
Descriptive Inferential
Statistics Statistics
 Number  Co-relation
 Frequency Count  Regression
 Percentage
 Level of significance
 Deciles and quartiles
 Various tests on hypothesis
 Measures of Central
Tendency (Mean, Midpoint,  Probabilistic
Mode)
 Variability
 Variance and standard
deviation
 Graphs
 Normal Curve
SAMPLING TECHNIQUES
Sampling
 The process of selecting a number of individuals for a study in such a
way that the individuals represent the larger group from which they
were selected

The purpose for sampling…


 To gather data about the population in order to make an inference that
can be generalized to the population
Stages in the Selection of a Sample

Define the target population

Select a sampling frame

Determine if a probability or nonprobability


sampling method will be chosen

Plan procedure
for selecting sampling units

Determine sample size

Select actual sampling units

Conduct fieldwork
Basic Sampling Classifications

• Probability samples: It is one in which every unit in the


population has a chance(probability) of being selected in
the sample, and this probability(chance) can be accurately
determined.

• Non-probability samples: where some elements of


population have no chance of selection, or where
the probability of selection can't be accurately
determined.
SAMPLING

RANDOM NON
(PROBABILITY) PROBABILITY

Simple random Systematic Convenience


sample random sample sample

Stratified random Multistage


Purposive sample
sample sample

Multiphase
Cluster sample Quota
sample
SCALING TECHNIQUES
Scale
A scale is basically a continuous spectrum or series of
categories and has been defined as any series of items that are
arranged progressively according to value or magnitude, into
which an item can be placed according to its quantification

Four popular scales are:

- Nominal scales
- Ordinal scales
- Interval scales
- Ratio scales
Primary scales of measurement

Nominal Numbers
assigned to 4 81 9

runners

Ordinal Rank order of


winners

Third Second First


Place Place Place
Interval Performance
rating on a 0 to 8.2 9.1 9.6
10 Scale

Ratio Time to finish in


seconds 15.2 14.1 13.4
Type of Scale Numerical Operation Descriptive Statistics
Frequency in each category,
Nominal Counting percentage in each category,
mode

Ordinal Rank Ordering Median, range, percentile ranking

Count/Frequencies, Mean,
Arithmetic Operations on
Interval Median,Mode, standard deviation,
Intervals between numbers
variance
Ratio - Most powerful
Arithmetic Operations on Geometric mean, coefficient of
with most meaningful
actual quantities variation
answers
DATA ANALYSIS-
ORGANIZE, COMPARING
AND SUMMARIZE
Various methods for Describing, Exploring and
Comparing Data
1. Frequency Distribution
2. Central Tendency (Mean, Median, Mode)
3. Percentile Values ( Deciles, Quartiles etc. )
4. Graphical representation by Pie chart, Frequency polygon
Histograms, Stem & Leaf etc.
5. Measure of variation by Variance
6. Standard deviation,
7. Coefficient of variation
class speed frequency cumulative Average Median Mode
limit frequency value Kmph Kmph
kmph (mean)
kmph
1 30-40 5 5
2 40-50 6 11
3 50-60 12 23
4 60-70 10 33 63.43 62.5 57.5
5 70-80 8 41
6 80-90 7 48
7 90-100 3 51
total 51
Series1, 90-100, 3, Series1, 30-40, 5,
6% 10%

Series1, 80-90, 7, Series1, 40-50, 6,


14% 12% 30-40
40-50
50-60
60-70
Series1, 70-80, 8,
16% 70-80
80-90
Series1, 50-60, 12,
90-100
23%

Series1, 60-70, 10,


19%
PROBABILITY BASICS
What is the chance that a given event will occur?

e.g.- what is a chance that out of total 4-wheelers 20% are big
cars.
Basic properties of probability
 The total probability of all possible event always sums to 1. i.e for
any event probability always lies between 0 to 1

 If A’ is a complementary event of A, then P(A’) is 1 - P(A)

 If A and B are not mutually exclusive events than


P(A B) = P(A) + P(B) - P(A  B
Note : if A & B are mutually exclusive events than P(A  B = 0
& hence P(A B) = P(A) + P(B)
Conditional Probability
 The probability of an event given that another event has occurred is
called a conditional probability.

 The conditional probability of X given Y is denoted by P(X|Y).

 A conditional probability is computed as follows:

P( X | Y ) = P( X  Y )
P(Y )
Baye’s Theorem
In conditional probability we consider the probability of an event
when we have information about the occurrence of an earlier
event. Bay’s theorem determine the probability of an earlier event
based on the information about the occurrence of a later event.

This theorem gives the relationship between P(A|B) and P(B|A):

P(A )
P(A | B ) = P(B | A )
P(B )
Random Variables
Random variable - a quantity resulting from an experiment that, by chance, can assume
different values.

33
Types of Random Variables
 Discrete Random Variable can assume only certain clearly separated values. It is
usually the result of counting something

 The number of students in a class.


 The number of children in a family.
 The number of cars entering a street in a hour

 Continuous Random Variable can assume an infinite number of values within a


given range. It is usually the result of some type of measurement

 The time it takes by vehicle to reach destination.


 The length of time of a particular phone call.

34
PROBABILITY
DISTRIBUTIONS
It is a listing of all outcomes of an experiment
and the probability associated with each
outcomes.

Experiment: Toss a
coin three times.
Observe the number of
heads. The possible
results are: zero
heads, one head, two
heads, and three
heads.
What is the probability
distribution for the
number of heads?
36
Binomial Probability distribution

P(m, N , p) = C N ,m p q m N m
=  p
N
m
m
q N m
=
N!
m!( N  m)!
p m q N m

or it could also be presented as

Characteristics of a Binomial Probability Distribution


1. There are only two possible outcomes on a particular trial of an
experiment. Success ( S ) or Failure ( F )
2. The outcomes are mutually exclusive,
3. The random variable is the result of counts.
4. Each trial is independent of any other trial

37
Poisson Probability Distribution
e  m
P(m,  ) =
m!
 = mean number of successes in a particular interval
e = constant 2.71828 (base of napeerian logarithmic
system)
m = Total number of occurrence

The Poisson probability distribution describes the number of times


some event occurs during a specified interval. The interval may be
time, distance, area, or volume.
It is a limiting case of binomial distribution under the following
conditions
m , number of trials are infinite
P , probability of success is very small
mp = µ constant
Normal (Gaussian) Distribution
1  ( x   ) 2 / 2 2
f (x) = e
 2
x = random variable
e = constant 2.71828 (base of napeerian logarithmic system)
π = Constant 3.14
μ = mean
σ = standard deviation

For a standard normal distribution curve i.e. μ = 0 & σ = 1


1. The normal distribution is a descriptive model that describes real
world situations.
2. It is defined as a continuous frequency distribution of infinite range
(can take any values not just integers as in the case of binomial
and Poisson distribution).
3. This is the most important probability distribution in statistics and
important tool in analysis of large data
Characteristics of Normal Distribution
1. It links frequency distribution to probability distribution

2. Has a Bell Shape Curve and is Symmetric

3. It is Symmetric around the mean: Two halves of the curve are the
same (mirror images)

4. Value of Z ( Normal distribution) indicates how many standard


deviations away from the mean the point x lies.
Changing μ shifts the curve along its X axis. Standard deviation determines the spread
Evaluating Normality
HYPOTHESIS TESTING
Hypothesis is a assume statement (claim)
about the population under study, which has to
be tested for that statement.

In Hypothesis testing we ascertain the truth


of a statement about a population parameter by
using a proper sample statistics.
Null hypothesis
The null hypothesis(Ho) is a hypothesis of no
differences. For example the null hypothesis for
comparing mean would be that "there is no difference
between the population mean and sample mean ".
Alternative hypothesis
The alternative hypothesis (Ha) is a statement relating to
the researchers' original hypothesis.
If the null hypothesis is rejected, then the alternative
hypothesis may be accepted.
Ha: there is difference between the population mean and
sample mean ".
Error in hypothesis testing
When a decision is taken in a hypothesis testing, it can be either
correct
decision or incorrect one. This can represent in the following
table:
Fail to reject Ho Reject Ho
Ho true Correct decision Type I error
Ho false Type II error Correct decision
• A Type I error is the error when a valid null hypothesis is
wrongly rejected
• A type II error is the error when an incorrect hypothesis is fail
to reject.
Significant level

Significance level (α) represent the probability of making type I error. The
significance level indicates which portion of the sampling population is
considered too unlikely to occur only by chance. If sample mean falls into
this region, then we reject the null hypothesis.

If sample mean falls into this region, then we reject the null hypothesis.
Normally (α) values are 0.10, 0.05, 0.02 and 0.01 etc or their percentage
equivalents 10 %, 5%, 2% and 1%.
P-values

Probability of occurrence of sample mean unlikely by

chance

For given significance level,

If P< α, then reject the null hypothesis.

If P> α, then fail to reject the null hypothesis.


Non
parametric
parametric
test
test

more than
two sample
1 Mann
whitney U
ANNOVA test
2 Kruskal
Wallis test
one sample Two sample

large sample
Independent Dependent
z-value

small sample Difference Difference Paired data


t-value of variance of means test

Small
large sample large sample
F-test sample t-
Z-value Z-value
value
Small
sample t-
value

Hypothesis testing
CORRELATION ,
REGRESSION AND
STATISTICAL TESTS
Overview of Correlation and Regression

 Correlation seeks to establish whether a relationship exists between two variables

 Regression seeks to use one variable to predict another variable

 Statistical tests are used to determine the strength of the relationship


Perfect and positive correlation High degree positive correlation High degree negative correlation

No correlation
Measuring the Relationship
Pearson’s Sample Correlation Coefficient, r

measures the direction and the strength of the linear


association between two numerical paired variables.
Coefficient of Determination, r 2
• To understand the strength of the relationship between
two variables
• The correlation coefficient, r, is squared

• r shows how much of the variation in one measure


2

(travel time) is accounted for by knowing the value of


the other measure (mode of transport used)
For example, r= .42 and r2 = 0.18 i.e 18% of the variation in travel time may be
accounted for by knowing choice of travel mode by commuters (or vice versa)
Rank correlation coefficient

 In case where data is not normal or shape of distribution is


not known than rank correlation coefficient is useful.

rs = 1-

where
D = difference of rank between two variable
N = nos of data
Regression
– Specific statistical methods for finding the “line of
best fit” for one response (dependent) numerical
variable based on one or more explanatory
(independent) variables.
– It is used to
1. To describe (or model)

2. To predict (or estimate)

1. To control (or administer)


Simple Linear Regression Model
 The equation that describes how y is related to x and
an error term is called the regression model.
 The simple linear regression model is:
y = b0 + b1x +e

E(y) Regression line


Graph of the regression equation is a straight line.β0
and β1 are called parameters of the model e is a
random variable called the error term.
1. β0 is the y intercept of the regression line. Intercept
2. β1 is the slope of the regression line. b0 Slope b1
is positive
3. E(y) is the expected value of y for a given x
value.

x
Standard Error of the Estimate

 The sum of squares for error is calculated as:

 and is used in the calculation of the standard error of estimate:

 If is is zero, all the points fall on the regression line.


Multiple Regression
Y= ß0 + ß1 X1+ ß2 X2+ ß3 X3 + e

Y = estimated value for dependent variable


ß0 = intercept
ß1 = partial regression coefficients:
indicate how much Y changes for each unit of change in X, when all other
variables in the model held constant
Xi = independent (predictor) variables
SOFTWARE
FOR
STATISTICAL ANALYSIS
SPSS
Statistical package for social science(SPSS) is software for
managing data and calculating a wide variety of statistics.
SPSS-Data Entry
SPSS-Out Put
BIBLIOGRAPHY
1. Benjamin J. R., Cornell C. A., "Probability Statistics and Decision for Civil
Engineers", McGraw-Hill, 1970.
2. Bhandarkar P.L., Wilkinson T.S., Methodology & Methodology & Techniques of
Social Research, Himalaya Publishing House, 1991.
3. Brad Evanoff, MD, MPH & Brian Gage, MD, MSc - an online notes on “Correlation
and Regression”
4. C.Jotin Khisty and B.Kent Lall – A text book on “ Transportation Engineering an
introduction” published by Prentice hall of India private limited, New Delhi, India.
5. David S. Walonick, Ph.D., Elements of a research proposal and report 2005 © online
research libraty available at http://www.statpac.com/research-papers/research-
proposal.htm
6. Dr. G.K.Jani & Prin.G.C.Patel" Basic biostatistics for pharmacy" by Atul prakashan,
Ahmedabad, India.
7. Freund J. E.,"Mathematical Statistics", PHI, New Delhi, 1990.
8. Ginger Holmes Rowell, Ph. D. Associate Professor of Mathematics Middle Tennessee
State University an online notes on “Introduction to Correlation and Regression “
Cont…..
9. Grant T. Hammond, Air war college an online notes on “ how to do research”
10. Hines W. W., Montgomery D. C., et. al., Probability and Statistics in Engineering and
Management Science, John Wiley and Sons, New York, 1990.
11. P.S.G Kumar – A text book on “ Research methods and stistical Techniques – paper
XII of UGC Model curriculum” published by B.R. Publishing corporation , Delhi.
12. Richard Haddlesey – notes on “Statistical tests of significance” available at
www.medievalarchitecture.nei
13. Richard M. Jacobs, OSA, Ph.D. online notes on “Educational research : sampling a
population”
14. S.C. Gupta and V.K. Kapoor – A text book on “ fundamental of mathematical statistics
published by S.Chand & Sons.
15. Sharma J.K., Operation Research: Theory & Applications, MacMillan India Ltd.,
2000., New Delhi, India.
16. Yogesh kumar singh – A text book on “ Fundamental of research methodology and
statistics” published by new age international limited, New Delhi, India.
Websites

1. www.bized.co.uk
2. http://lib.stat.cmu.edu
3. http://www.ruf.rice.edu
4. http://www.stat.uiuc.edul
5. http:// www.idrc.ca

You might also like