100% found this document useful (1 vote)
77 views

Business Research Methods Question Guide

Business research methods can involve systematic investigation to establish new facts or solve problems. There are different types of research classified by application, objectives, and inquiry mode. Pure research increases fundamental understanding and advances knowledge, though it may not have immediate commercial benefits. It explores theories without intending to yield practical applications presently. Pure research provides new ideas and principles that later support applied progress, as computers relied on past pure mathematics research. While rarely helping practitioners directly, pure research stimulates new thinking that could revolutionize issues. It generates new knowledge and ideas fundamental to expanding human understanding over the long term.

Uploaded by

Miyon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
77 views

Business Research Methods Question Guide

Business research methods can involve systematic investigation to establish new facts or solve problems. There are different types of research classified by application, objectives, and inquiry mode. Pure research increases fundamental understanding and advances knowledge, though it may not have immediate commercial benefits. It explores theories without intending to yield practical applications presently. Pure research provides new ideas and principles that later support applied progress, as computers relied on past pure mathematics research. While rarely helping practitioners directly, pure research stimulates new thinking that could revolutionize issues. It generates new knowledge and ideas fundamental to expanding human understanding over the long term.

Uploaded by

Miyon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 68

QUESTION ANSWER GUIDE

BUSINESS RESEARCH METHODS

Q1. What do you understand by research?


Research can be defined as the search for knowledge, or as any systematic investigation, with an
open mind, to establish novel facts, solve new or existing problems, prove new ideas, or develop
new theories. The primary purposes of basic research are documentation, discovery,
interpretation, or the research and development of methods and systems for the advancement of
human knowledge. Approaches to research depend on epistemologies, which vary considerably
both within and between humanities and sciences.
Scientific research relies on the application of the scientific method, a harnessing of curiosity.
This research provides scientific information and theories for the explanation of the nature and
the properties of the world. It makes practical applications possible. Scientific research is funded
by public authorities, by charitable organizations and by private groups, including many
companies. Scientific research can be subdivided into different classifications according to their
academic and application disciplines. Scientific research is a widely used criterion for judging
the standing of an academic institution, such as business schools, but some argue that such is an
inaccurate assessment of the institution.
Q2. What is the concept of Business research?
The primary task of management is effective decision making. Business research helps decision
makers shift from intuitive information gathering to systematic and objective investigation, and
therefore reduces uncertainty.
Write down the application of research in various functions of management.
There are following application of research in management functions.
1) Identifying problem & opportunity.
2) Diagnosing problem & assessing opportunity.
3) Selecting and implementing a course of action
4) Evaluating the course of action.

Q3. What are the characteristics of the research?


CHARACTERISTICS OF RESEARCH:
Research is a process of collecting, analyzing and interpreting information to answer questions.
But to qualify as research, the process must have certain characteristics: it must, as far as
possible, be controlled, rigorous, systematic, valid and verifiable, empirical and critical.
Controlled- in real life there are many factors that affect an outcome. The concept of control
implies that, in exploring causality in relation to two variables (factors), you set up your study in
a way that minimizes the effects of other factors affecting the relationship. This can be achieved
to a large extent in the physical sciences (cookery, bakery), as most of the research is done in a
laboratory. However, in the social sciences (Hospitality and Tourism) it is extremely difficult as
research is carried out on issues related to human beings living in society, where such controls
are not possible. Therefore in Hospitality and Tourism, as you cannot control external factors;
you attempt to quantify their impact.
Rigorous-you must be scrupulous in ensuring that the procedures followed to find answers to
questions are relevant, appropriate and justified. Again, the degree of rigor varies markedly
between the physical and social sciences and within the social sciences.
Systematic-this implies that the procedure adopted to undertake an investigation follow a certain
logical sequence. The different steps cannot be taken in a haphazard way. Some procedures must
follow others.
Valid and verifiable-this concept implies that whatever you conclude on the basis of your
findings is correct and can be verified by you and others.
Empirical-this means that any conclusion drawn are based upon hard evidence gathered from
information collected from real life experiences or observations.
Critical-critical scrutiny of the procedures used and the methods employed is crucial to a
research enquiry. The process of investigation must be foolproof and free from drawbacks. The
process adopted and the procedures used must be able to withstand critical scrutiny. For a
process to be called research, it is imperative that it has the above characteristics.
Q4. Explain the different types of research?
TYPES OF RESEARCH
Research can be classified from three perspectives:
1. Application of research study
2. Objectives in undertaking the research
3. Inquiry mode employed

1. Application: From the point of view of application, there are two broad categories of
research:
-Pure research
- Applied research.
i) Pure research involves developing and testing theories and hypotheses that are
intellectually challenging to the researcher but may or may not have practical
application at the present time or in the future. The knowledge produced through pure
research is sought in order to add to the existing body of research methods.
ii) Applied research is done to solve specific, practical questions; for policy
formulation, administration and understanding of a phenomenon. It can be
exploratory, but is usually descriptive. It is almost always done on the basis of basic
research. Applied research can be carried out by academic or industrial institutions.
Often, an academic institution such as a university will has a specific applied research
program funded by an industrial partner interested in that program.
2. Objectives: From the viewpoint of objectives, a research can be classified as
-descriptive
-co relational
-explanatory
-exploratory
i)Descriptive research attempts to describe systematically a situation, problem, phenomenon,
service or programme, or provides information about , say, living condition of a community, or
describes attitudes towards an issue.
ii) Co relational research attempts to discover or establish the existence of a relationship/
interdependence between two or more aspects of a situation.
iii) Explanatory research attempts to clarify why and how there is a relationship between two
or more aspects of a situation or phenomenon.
iv) Exploratory research is undertaken to explore an area where little is known or to investigate
the possibilities of undertaking a particular research study (feasibility study / pilot study). In
practice most studies are a combination of the first three categories.

3. Inquiry Mode: From the process adopted to find answer to research questions – the two
approaches are:

- Structured approach

- Unstructured approach

i) Structured approach: The structured approach to inquiry is usually classified as quantitative


research. Here everything that forms the research process- objectives, design, sample, and the
questions that you plan to ask of respondents- is predetermined. It is more appropriate to
determine the extent of a problem, issue or phenomenon by quantifying the variation e.g. how
many people have a particular problem? How many people hold a particular attitude?

ii) Unstructured approach: The unstructured approach to inquiry is usually classified as


qualitative research. This approach allows flexibility in all aspects of the research process. It is
more appropriate to explore the nature of a problem, issue or phenomenon without quantifying it.
Main objective is to describe the variation in a phenomenon, situation or attitude. e.g.,
description of an observed situation, the historical enumeration of events, an account of different
opinions of different people have about an issue, description of working condition in a particular
industry.
Both approaches have their place in research. Both have their strengths and weaknesses. In many
studies you have to combine both qualitative and quantitative approaches. For example, suppose
you have to find the types of cuisine / accommodation available in a city and the extent of their
popularity.

Types of cuisine is the qualitative aspect of the study as finding out about them entails
description of the culture and cuisine the extent of their popularity is the quantitative aspect as it
involves estimating the number of people who visit restaurant serving such cuisine and
calculating the other indicators that reflect the extent of popularity.

Q5. Describe pure research in brief.


Pure research, basic research, or fundamental research is research carried out to increase
understanding of fundamental principles. It is not intended to yield immediate commercial
benefits; pure research can be thought of as arising out of curiosity. However, in the long term it
is the basis for many commercial products and applied research. Pure research is mainly carried
out by universities.
Pure research advances fundamental knowledge about the human world. It focuses on refuting or
supporting theories that explain how this world operates, what makes things happen, why social
relations are a certain way, and why society changes. Pure research is the source of most new
scientific ideas and ways of thinking about the world. It can be exploratory, descriptive, or
explanatory; however, explanatory research is the most common.
Pure research generates new ideas, principles and theories, which may not be immediately
utilized; though are the foundations of modern progress and development in different fields.
Today's computers could not exist without the pure research in mathematics conducted over a
century ago, for which there was no known practical application at that time. Pure research rarely
helps practitioners directly with their everyday concerns. Nevertheless, it stimulates new ways of
thinking about deviance that have the potential to revolutionize and dramatically improve how
practitioners deal with a problem.
A new idea or fundamental knowledge is not generated only by pure research, but pure research
can build new knowledge. In any case, pure research is essential for nourishing the expansion of
knowledge. Researchers at the center of the scientific community conduct most of what is pure
research.
Q6. Describe applied research in brief.
Applied research is a form of systematic inquiry involving the practical application of science.
It accesses and uses some part of the research communities' (the academy's) accumulated
theories, knowledge, methods, and techniques, for a specific, often state, business, or client
driven purpose. Applied research is compared to pure research (basic research) in discussion
about research ideals, methodologies, programs, and projects.
Applied research deals with solving practical problems and generally employs empirical
methodologies. Because applied research resides in the messy real world, strict research
protocols may need to be relaxed. For example, it may be impossible to use a random sample.
Thus, transparency in the methodology is crucial. Implications for interpretation of results
brought about by relaxing an otherwise strict canon of methodology should also be considered.
Due to its practical focus, applied research information will be found in the literature associated
with individual disciplines.
Q7. What is the difference between basic and applied research?
Basic research Applied Research
 Main motivation is to expand man’s  Main motivation is to solve a practical
knowledge or understanding about a problem, develop a new product or
phenomenon, not to create or invent process, or improve an existing product or
something or to solve a practical problem process, and not just to acquire knowledge
 There are usually no commercial for knowledge’s sake.
applications in mind  There are usually a commercial
 Examples: applications in mind
 How was the earth created?  Examples:
 Why some roses are red and some are  How can we develop more energy
pink? efficient dyeing processes?
 How can we avoid catalytic damage of
cotton fabric during bleaching?

Q8. What is the difference between exploratory research and conclusive research?
  Research Project Component Exploratory Research Conclusive Research
Research Purpose General: generate insight Specific: verify insight
Data Needs Vague Clear
Data Sources Ill-defined Well-defined
Data Collection Form Open-ended, rough Usually structured
Sample Small, subjective Large, objective
Data Collection Flexible Rigid
Data Analysis Informal, typically qualitative Formal, typically quantitative
Inferences/
More tentative More final
Recommendations

Q9. Is there any difference between research methods and research methodology? 


The major difference between the two is their core or nature. Research methodology is
basically rooted on the existence of the methods while research methods are in the primary line
of conducting the research.
Another difference between the two is their purposes. While research methodologies give light to
what and how appropriate the planned methods to be used will be, research methods are the
techniques itself that permit experiments and studies to be successfully initiated, performed and
concluded.
More so, research methods differ from research methodologies by their use. Research
methodologies are typically utilized at the beginning of every experiment to explain how and
why such chosen methods will serve its function. Research methods, on the contrary, are more
useful at the latter part of an experiment or research since they are being utilized for conclusions
to be apply made.
Q10. What are the differences between scientific and nonscientific methods?
Basically, science is a specific way of analyzing information with the goal of testing claims.
What sets science apart from other modes of knowledge acquisition is the use of what is
commonly known as the scientific method. The scientific method is rooted in observation,
experimentation, and knowledge acquisition through a process of objective reasoning and logic.
According to A. Aragon; scientific method as: “systematic process for acquiring new knowledge
that uses the basic principle of deductive (and to a lesser extent inductive) reasoning. It’s
considered the most rigorous way to elucidate cause and effect, as well as discover and analyze
less direct relationships between agents and their associated phenomena.” “Through the scientific
method, we may form the following generalizations: .

Hypothesis: A testable statement accounting for a set of observations.

Theory: A well-supported and well-tested hypothesis or set of hypotheses.

Fact: A conclusion confirmed to such an extent that it would be reasonable to offer


provisionalagreement.”

When using the scientific method one of the primary goals is objectivity. Proper use of the
scientific method leads us to rationalism (basing conclusion on intellect, logic and evidence).
Relying on science also helps us avoid dogmatism (adherence to doctrine over rational and
enlightened inquiry, or basing conclusion on authority rather than evidence). The nonscientific
approach to knowledge involves informal kinds of thinking. This approach can be thought of as
an everyday unsystematic uncritical way of thinking. Below I will discuss the major differences
between the two. .

Comparing Scientific & Nonscientific methods .

Scientific method Non -scientific method


General Approach Empirical Intuitive
Observation Controlled Uncontrolled
Reporting Unbiased Biased
Concepts Clear definitions Ambiguous definitions
Instruments Accurate/precise Inaccurate/imprecise
Measurement Reliable/repeatable Non-reliable
Hypotheses Testable Un testable
Attitude Critical Uncritical
Q11. "Research Is Much Concerned With Proper Fact Finding, Analysis And Evaluation".
Do You Agree With This Statement? Give Reasons to Support Your Answer.
Yes, research is about finding facts, analyzing them and then evaluating the results.
Research begins with a theory or thesis statement. This statement has to be proven or disproven
as true/false. Given that research is based on proving or disproving a theory, it means that
research is concerned with finding facts, analyzing facts and evaluating the response, in order to
form a conclusion regarding the research topic. .

There have been occasions, when research has been called into question. Corporate espionage
and other issues such as; corporations skewing the facts to fit their data have existed. It is a
downside to dishonesty in people; however, the ethical, moral, and scientific belief is that all
research has to be based on facts. .

This does not mean the facts will not change. Research is conducted with what information and
tools are available at that time. This means that in 100 years research being conducted now could
be found false, but at the time it is true because of the limited technology or facts that could be
found.

As always when someone learns about research and the research method, one is told that a theory
is never solely factual, but proved or disproved based on what could be found at that time. It goes
back to the fact, that proper information, analysis, and evaluation are needed in order to conduct
proper research. Inaccurate facts will skew the data and render the entire research invalid.

There is also the human interpretation of the information found. While research is concerned
with these three topics, one also has to realize the writer of the research can limit the scope of the
research and therefore change the results based on their viewpoint alone.

Q12. WHAT IS RESEARCH DESIGN


RESEARCH DESIGN
According to Trochim (2005), research design “provides the glue that holds the research project
together. A design is used to structure the research, to show how all of the major parts of the
research project work together to try to address the central research questions.” The research
design is like a recipe. Just as a recipe provides a list of ingredients and the instructions for
preparing a dish, the research design provides the components and the plan for successfully
carrying out the study. The research design is the “backbone” of the research protocol .
Research studies are designed in a particular way to increase the chances of collecting the
information needed to answer a particular question. The information collected during research is
only useful if the research design is sound and follows the research protocol . Carefully
following the procedures and techniques outlined in the research protocol will increase the
chance that the results of the research will be accurate and meaningful to others. Following the
research protocol and thus the design of the study is also important because the results can then
be reproduced by other researchers. The more often results are reproduced, the more likely it is
that researchers and the public will accept these findings as true. Additionally, the research
design must make clear the procedures used to ensure the protection of research subjects,
whether human or animal, and to maintain the integrity of the information collected in the study.
There are many ways to design a study to test a hypothesis . The research design that is chosen
depends on the type of hypothesis (e.g. Does X cause Y? or How can I describe X and Y? or
what is the relationship between X and Y?), how much time and money the study will cost, and
whether or not it is possible to find participants. The PI has considered each of these points when
designing the study and writing the research protocol.
Q13. Describe Descriptive and Experimental research.
There are many kinds of research, however, most of them fall into two categories: descriptive
and experimental.
Descriptive
A descriptive study is one in which information is collected without changing the environment
(i.e., nothing is manipulated). Sometimes these are referred to as“co relational” or observational”
studies. The Office of Human Research Protections (OHRP) defines a descriptive study as “Any
study that is not truly experimental.” In human research, a descriptive study can provide
information about the naturally occurring health status, behavior, attitudes or other
characteristics of a particular group. Descriptive studies are also conducted to demonstrate
associations or relationships between things in the world around you.
Descriptive studies can involve a one-time interaction with groups of people ( cross-sectional
study ) or a study might follow individuals over time ( longitudinal study ). Descriptive studies,
in which the researcher interacts with the participant, may involve surveys or interviews to
collect the necessary information. Descriptive studies in which the researcher does not interact
with the participant include observational studies of people in an environment and studies
involving data collection using existing records (e.g., medical record review).
Case example of a descriptive study
Descriptive studies are usually the best methods for collecting information that will demonstrate
relationships and describe the world as it exists. These types of studies are often done before an
experiment to know what specific things to manipulate and include in an experiment. Bickman
and Rog (1998) suggest that descriptive studies can answer questions such as “what is” or “what
was.” Experiments can typically answer “why” or “how.”
Unlike a descriptive study, an experiment is a study in which a treatment, procedure, or program
is intentionally introduced and a result or outcome is observed. The American Heritage
Dictionary of the English Language defines an experiment as “A test under controlled conditions
that is made to demonstrate a known truth, to examine the validity of a hypothesis, or to
determine the efficacy of something previously untried.”
True experiments have four elements: manipulation , control, random assignment , and random
selection . The most important of these elements are manipulation and control. Manipulation
means that something is purposefully changed by the researcher in the environment. Control is
used to prevent outside factors from influencing the study outcome. When something is
manipulated and controlled and then the outcome happens, it makes us more confident that the
manipulation “caused” the outcome. In addition, experiments involve highly controlled and
systematic procedures in an effort to minimize error and bias which also increases our
confidence that the manipulation “caused” the outcome.
Another key element of a true experiment is random assignment. Random assignment means that
if there are groups or treatments in the experiment, participants are assigned to these groups or
treatments, or randomly (like the flip of a coin). This means that no matter who the participant is,
he/she has an equal chance of getting into all of the groups or treatments in an experiment. This
process helps to ensure that the groups or treatments are similar at the beginning of the study so
that there is more confidence that the manipulation (group or treatment) “caused” the outcome.
More information about random assignment may be found in section Random assignment.
Experimental studies – Example 1
An investigator wants to evaluate whether a new technique to teach math to elementary school
students is more effective than the standard teaching method. Using an experimental design, the
investigator divides the class randomly (by chance) into two groups and calls them “group A”
and “group B.” The students cannot choose their own group. The random assignment process
results in two groups that should share equal characteristics at the beginning of the experiment.
In group A, the teacher uses a new teaching method to teach the math lesson. In group B, the
teacher uses a standard teaching method to teach the math lesson. The investigator compares test
scores at the end of the semester to evaluate the success of the new teaching method compared to
the standard teaching method.  At the end of the study, the results indicated that the students in
the new teaching method group scored significantly higher on their final exam than the students
in the standard teaching group.
Q14. Write down notes on the followings.
1. Quantitative methods – provide numerical estimates, often referred to as rates, risk,
Percentages, prevalence, incidence, trends, etc. They help answer questions such as how many,
how much, how often and so forth. This type of research enables you to establish baseline
indicators and measure change over time. It also allows you to evaluate the impact of your
program on specific indicators.
2. Qualitative methods – describes data at an in-depth level, without using statistics.
They help explain how the consumer is thinking or why something occurs. This type
of research can be used for identifying possible behavioral determinants prior to quantitative
studies, for exploring statistical data in-depth, for developing concepts or for pre-testing
communications materials.
There are four main types of quantitative methodologies.
1. Behavioral Surveillance Surveys – These are conducted against a specific target group
within a specific geographical area, for example sex workers in a red light district, to measure
their behavior. For example, FHI(Family Health International) often conducts such surveys in
cooperation with national governments, the results of which are usually considered to be
“official” national data.
2. Social Marketing Intercept Surveys – Intercept surveys can be done of consumers leaving a
retail outlet, called Consumer Intercepts, or of people leaving a PSI(Population Services
International) event or service, called Client Exit Surveys. They are useful representations of
consumers or of clients, but cannot be compared with data from household surveys (which are
representative of the general population as a whole).
3. LQAS and Project MAP – Lot Quality Assurance Sampling (LQAS) uses a small sample
size to monitor level and trends of exposure and OAM or coverage of a product in one “lot” or
local geographic area. Product coverage and quality of coverage in outlets at PSI is measured
through an LQAS tool called MAP (Measuring Access and Performance.
4. Household Social Marketing Surveys – DHS (Demographic and Health Survey) surveys
are a well-known example of household surveys. TRC (Tracking Results Continuously) is the
PSI tool used for national or sub national populations. This can be used for monitoring program
progress, for evaluating program effectiveness, and for segmenting the population between
beavers’ and non-beavers’ to determine what OAM(Open Aid Map) indicators are linked with
practicing the desired health behavior.

There are three types of qualitative methodologies.


In-Depth Interviews –This technique is good for more personal and sensitive questions and is
particularly useful for understanding a process, such as decision making. The main drawback to
this method is that the resulting data can be lengthy and time consuming to analyze because
many views must be collected to represent a complete sample. For less sensitive topics,
researchers may choose to conduct key informant interviews. These are used for the least
personal and sensitive issues and are conducted with those familiar with a given topic. For
example, third party experts or stakeholders, gatekeepers or group leaders could be interviewed
about a group they are part of or very knowledgeable about.
2. Focus Group Discussion – This technique involves a single moderator a small group of six to
eight people from the target group. It is useful for probing sensitive questions about behavioral
determinants, especially if participants are asked to describe what others do, rather than what
they personally do. In some cases interviews may be conducted with just two or three
participants (called a dyad or triad) when keeping the interview among friends or family
members is beneficial to prompting better responses.
3. Peer Ethnographic – This is used to explore the most personal and sensitive topics,
particularly among marginalized target groups. The method trains members of the target group to
conduct interviews with their peers and then report their findings back to the research team.
Other research techniques that could fall into either category:
1. Mapping Target Groups –This can be used to create a geographical map of a target area
(e.g., a brothel district) and possibly inform a sampling frame for a quantitative study. It
generally involves mapping where target groups are and how many people are in each place.
Geographic information system (GIS) software is increasingly being used for this.
2. Mystery Client Interviews – These are usually used for evaluating the quality of a service
(e.g. clinic, VCT, other PSI-trained service provider). A person is sent in as a “Client” to take
part in the service without the knowledge of the service provider. This is seen to be the most
objective way to measure true service quality. Afterwards the client can be interviewed about
their experience or fill in a survey to produce data for evaluating and improving the service.
estuary हाना
albeit यद्यपि
beatitude मोक्ष , परम सख

euphoria उत्साह
Parametric मानदण्डों
Ethnographic मानव जाति विज्ञान संबंधी
Q15. What are the problems encountered by researchers in India?
Some of the problems faced by researchers are:
1) Lack of training: Having less knowledge of research methodology is a great impediment
for researchers. Many researchers take a leap in the dark without knowing research
methods. Most of the work is not methodologically sound. Research to many researchers
and even guides is mostly a cut and pest job without any insight shed on the collected
materials. The consequence is obvious, viz., the research results, quite often, do not
reflect the reality.
2) Most of the business units do not have the confidence that the information supplied by
them to researchers will not be misused and such they are reluctant in supplying the
needed information to researcher.
3) There is insufficient interaction between the universities research departments, business
establishments, government departments and research institutions. A great deal ofprimary
data of non confidential nature remain untouched by the researchers for want of proper
contacts.
4) Research studies overlapping one another. This results in duplication and fritters away
resources.
5) There is no code of conduct for researchers. As well inter university and inter
departmental rivalries are also quite common.
6) Some time researcher face the difficulty of adequate and timely monitory support. These
cause unnecessary delays in the completion of research studies.
7) Library management and functioning is not satisfactory at many places and much of the
time and energy of researchers are spent in tracing out the books, journals, reports etc.
8) Many libraries’ are not able to get copies of old and new Acts/Rules, reports and other
government publications in time.
Research forms a cycle.  It starts with a problem and ends with a solution to the problem.  The
problem statement is therefore the axis which the whole research revolves around, beacause it
explains in short the aim of the research.
 Q16. WHAT IS A RESEARCH PROBLEM?
A research problem is the situation that causes the researcher to feel apprehensive, confused and
ill at ease.  It is the demarcation of a problem area within a certain context involving the WHO or
WHAT, the WHERE, the WHEN and the WHY of the problem situation.
There are many problem situations that may give rise to research.   Three sources usually
contribute to problem identification.  Own experience or the experience of others may be a
source of problem supply.  A second source could be scientific literature.  You may read about
certain findings and notice that a certain field was not covered.  This could lead to a research
problem.  Theories could be a third source.  Shortcomings in theories could be researched.
Research can thus be aimed at clarifying or substantiating an existing theory, at clarifying
contradictory findings, at correcting a faulty methodology, at correcting the inadequate or
unsuitable use of statistical techniques, at reconciling conflicting opinions, or at solving existing
practical problems.
IDENTIFICATION OF THE PROBLEM
The prospective researcher should think on what caused the need to do the research (problem
identification).  The question that he/she should ask is: Are there questions about this problem to
which answers have not been found up to the present?
Research originates from a need that arises.  A clear distinction between the PROBLEM and the
PURPOSE should be made.  The problem is the aspect the researcher worries about, think about,
wants to find a solution for.  The purpose is to solve the problem, ie find answers to the
question(s).  If there is no clear problem formulation, the purpose and methods are meaningless.
Keep the following in mind:
 Outline the general context of the problem area.
 Highlight key theories, concepts and ideas current in this area.
 What appear to be some of the underlying assumptions of this area?
 Why are these issues identified important?
 What needs to be solved?
 Read round the area (subject) to get to know the background and to identify unanswered
questions or controversies, and/or to identify the most significant issues for further
exploration.
The research problem should be stated in such a way that it would lead to analytical thinking on
the part of the researcher with the aim of possible concluding solutions to the stated problem. 
Research problems can be stated in the form of either questions or statements.
 The research problem should always be formulated grammatically correct and as
completely as possible.  You should bear in mind the wording (expressions) you use. 
Avoid meaningless words.  There should be no doubt in the mind of the reader what your
intentions are.
 Demarcating the research field into manageable parts by dividing the main problem into
sub problems is of the utmost importance.
SUBPROBLEM(S)
Sub problems are problems related  to the main problem identified.   Sub problems flow from the
main problem and make up the main problem.  It is the means to reach the set goal in a
manageable way and contribute to solving the problem.
STATEMENT OF THE PROBLEM
The statement of the problem involves the demarcation and formulation of the problem, i.e. the
WHO/WHAT, WHERE, WHEN, WHY.  It usually includes the statement of the hypothesis.

   CHECKLIST FOR TESTING THE FEASIBILITY OF THE RESEARCH PROBLEM


YES NO
Is the problem of current interest?  Will the research results have social,
1
educational or scientific value?
2 Will it be possible to apply the results in practice?
3 Does the research contribute to the science of education?
4 Will the research opt new problems and lead to further research?
5 Is the research problem important?  Will you be proud of the result?
6 Is there enough scope left within the area of research (field of research)?
Can you find an answer to the problem through research?   Will you be able to
7
handle the research problem?
8 Will it be practically possible to undertake the research?
9 Will it be possible for another researcher to repeat the research?
10 Is the research free of any ethical problems and limitations?
11 Will it have any value?
Do you have the necessary knowledge and skills to do the research?  Are you
12
qualified to undertake the research?
13 Is the problem important to you and are you motivated to undertake the research?
Is the research viable in your situation?  Do you have enough time and energy to
14
complete the project?
15 Do you have the necessary funds for the research?
16 Will you be able to complete the project within the time available?
Do you have access to the administrative, statistic and computer facilities the
17
research necessitates?
TOTAL:
Q17. What is a problem audit? What is the difference between symptoms and a problem?
How can a skillful researcher differentiate between the two and identify a true problem.
Problem Audit
It is a comprehensive examination of a marketing problem to understand its origin and nature.
The problem audit provides a useful framework for interaction between a manger and researcher
in identifying the underlying causes of the problem. It is important to perform problem audit
before undertaking any research because many times the manager has only a vague idea of what
the problem is, as the manger often tends to focus on symptoms rather than on causes. This will
enable to understand not only the symptoms but also some of the underlying causes before taking
a decision to undertake any research.
Difference between Problem and Symptom
Problem and Symptom are two words that are often confused as words that give similar purport,
but they are actually not so. A problem has a solution whereas a symptom helps you to identify a
problem.
A symptom merely alerts marketer’s mat a problem exists. Hence it can be said that problem and
symptom are related rather than synonymous in character. Both a problem and a symptom can be
persisting too. The word ‘problem’ is used with the intention of finding solution to it. On the
other hand the word ‘symptom’ is used with the intention of curing the symptom.
In other words if a symptom is known, there will be an effort to see to it that the symptom ceases
to exist or the symptom is cured completely. In the same way when a problem comes to be
identified, then there will be an effort to find a solution for the problem. In short it can be said
that there will be an effort to see to it that the problem is solved completely.
Thus it is understood both a problem and a symptom are not desired by anybody for that matter.
If a problem is not solved then the symptoms cannot be removed. On the other it tends to remain
the same. It continues to exist. On the contrary if a symptom is not cured or diagnosed properly
then it is bound to aggravate. A symptom does not remain the same. On the other hand it tends to
increase further if not cured properly.
Q18. What are the steps in research process?
Research process consists of series of actions or steps necessary to effectively carryout research
and the desired sequences of these steps.
1. Formulating the research problem: There are two types of problems, one relates to states of
nature and other relate to relationship between variables. Initially, researcher must single out the
problem he wants to study.
2. Review the literatures: At this juncture the researcher should undertake extensive
literature survey connected with the problem.
3. Formulate hypothesis: Researcher should state in clear terms the working hypothesis.
Working hypothesis is tentative assumption made in order to draw out and test its logical and
empirical consequences.
4. Design research: Researcher will have to state the conceptual structure within which research
would be conducted. The preparation of such a design facilitates research to be as efficient as
possible yielding maximum information. The function of research design is to provide for the
collection of relevant evidence with minimum expenditure of effort, time, and money.
5. Collect data: Generally it is found that data at hand are inadequate, and hence, it becomes
necessary to collect data that are appropriate. There are several ways of collecting the
appropriate data which differ considerably in context of money costs, time and other resources at
the disposal of the researcher.
6. Analyze data: The analysis of data requires a number of closely related operations such as
establishment of categories, the application of these categories to raw data through coding,
tabulation and then drawing statistical inferences. The unwieldy data should necessarily be
condensed into a few manageable groups and tables for further analysis. Coding operation is
usually done at this stage through which the categories of data are transformed into symbol that
may be tabulated and counted. Editing is the procedure that improves the quality of the data for
coding. With coding the stage is ready for tabulation. Tabulation is the part of the technical
procedure wherein the classified dada are put in the form of tables.
7. Interpret and report preparation: Finally, researcher has to prepare the report of what has
been done by him/her. Writing of report must be done with great care keeping in view the
following:
i) The layout of the report should be as follows:
a) The preliminary pages
b) The main text
c) The end matter
ii) The main text of the report should have the following parts:
Introduction: It should contain a clear statement of the objective of the research and an
explanation of the methodology adopted in accomplishing the research .The scope of the study
along with various limitations should as well be stated in this part.
a) Summary of findings: There would appear a statement of findings and recommendations
in non technical language.
b) Main report: The main body the report should be presented in logical sequence and and
broken down into readily identifiable sections.
c) Conclusion: Towards the end of the main text, researcher should again put down the
results of his research clearly and precisely.
iii) At the end of the report, appendices should be enlisted in respect of all technical data.
Bibliography, i.e., list of books, journals, reports, etc, consulted should also be given in
the end. Index should also be given specially in a published research report.

Q19. What do you mean by research design?


RESEARCH DESIGNS
The next step after stating the management problem, research purpose, and
r e s e a r c h hypotheses and questions, is to formulate a research design. The starting
point for the r e s e a r c h d e s i g n i s , i n f a c t , t h e r e s e a r c h q u e s t i o n s a n d
h y p o t h e s e s t h a t h a v e b e e n s o carefully developed. In essence, the research
design answers the question: How are we going to get answers to these research questions
and test these hypotheses? The research d e s i g n i s a p l a n o f a c t i o n i n d i c a t i n g t h e
s p e c i f i c s t e p s t h a t a r e n e c e s s a r y t o p r o v i d e answers to those questions, test the
hypotheses, and thereby achieve the research purpose that helps choose among the decision
alternatives to solve the management problem or  capitalize on the market opportunity

DEFINITIONS OF RESEARCH DESIGN:


(1) According to David J. Luck and Ronald S. Rubin ,
 " A r e s e a r c h d e s i g n i s t h e determination and statement of the general research approach
or strategy adopted/or the particular project. It is the heart of planning. If the design adheres to
the research objective, it will ensure that the client's needs will be served." 

(2) According to Kerlinger


 " R e s e a r c h d e s i g n i n t h e p l a n , s t r u c t u r e a n d s t r a t e g y o f   investigation conceived
so as to obtain answers to research questions and to control variance." 
(3) According to Green and Tull 
"A research design is the specification of methods and procedures for acquiring the information
needed. It is the over-all operational pattern or framework of the project that stipulates
what information is to be collected from which source by what procedures." 
The second definition includes three important terms - plan, structure and strategy. The  p l a n i s
the outline of the research scheme on which the researcher is to work. The
structure of the research work is a more specific scheme and the strategy suggests
how the research will be carried out i.e. methods to be used for the collection and analysis
of d a t a . I n b r i e f , r e s e a r c h d e s i g n i s t h e b l u e p r i n t o f r e s e a r c h . I t i s t h e
s p e c i f i c a t i o n o f   methods and procedures for acquiring the information needed for
solving the problem .Questionnaires, forms and samples for investigation are decided while
framing research design. Finally, the research design enables the researcher to arrive at certain
meaningful conclusions at the end of proposed study. 
 
IN PLANNING THE RESEARCH DESIGN:

There are four broad steps involved in planning the research design as explained below:
(1)Determining work involved in the project:
 The first step in planning research design is determining the work involved in the project-and
designing a workable plan to carry out the research work within specific time limit.
The work involved includes the following:(a)To formulate the marketing problem(b)To
determine information requirement(c)To identify information sources (d)To prepare
detailed plan for the execution of research project This preliminary step indicates the
nature and volume of work involved in the research work. Various forms require for
research work will be decided and finalized. The sample to be selected for the survey work
will also be decided. Staff requirement will also be e s t i m a t e d . D e t a i l s w i l l b e
w o r k e d o u t a b o u t t h e i r t r a i n i n g a n d s u p e r v i s i o n o n f i e l d investigators, etc.In
addition, the questionnaire will be prepared and tested. This is how the researcher will  p r e p a r e
a blue-print of the research project. According to this blueprint the whole
research project will be implemented. The researcher gets clear idea of the work involved i n
the project through such initial planning of the project. Such
p l a n n i n g a v o i d s confusion, misdirection and wastage of time, money and efforts at later
stages of research work. The whole research project moves smoothly due to initial planning of
the research project.
(2)Estimating costs involved:
 T h e s e c o n d s t e p i n p l a n n i n g r e s e a r c h d e s i g n i s e s t i m a t i n g t h e c o s t s
i n v o l v e d i n t h e research project. MR projects are costly as the questionnaire is to
be prepared in large number of copies, interviewers are to be appointed for data
collection and staff will be required for tabulation and analysis of data collected. Finally,
experts will be required for drawing conclusions and for writing the research report.
The researcher has to estimate the expenditure required for the execution of the
project. The sponsoring organization will approve the research project and make suitable
budget provision accordingly. The cost calculation is a complicated job as expenditure on
different heads will have to be estimated accurately. The cost of the project also needs to be
viewed from the viewpoint o f i t s u t i l i t y i n s o l v i n g t h e m a r k e t i n g p r o b l e m . A
c o m p r e h e n s i v e r e s e a r c h s t u d y f o r   solving comparatively minor marketing problem
will be uneconomical.
(3)Preparing time schedule:
 T i m e f a c t o r i s i m p o r t a n t i n t h e e x e c u t i o n o f t h e r e s e a r c h p r o j e c t .
P l a n n i n g o f t i m e schedule is essential at the initial stage. Time calculation relates
to the preparation of questionnaire and its pre-testing, training of interviewers, actual survey
work, tabulation and analysis of data and finally reports writing. Time requirement of each stage
needs to be worked out systematically. Such study will indicate the time requirement of the
whole  p r o j e c t . T o o l o n g p e r i o d f o r t h e c o m p l e t i o n o f r e s e a r c h w o r k i s
undesirable as the conclusions and recommendations may become outdated
w h e n a c t u a l l y a v a i l a b l e . Similarly, time-consuming research projects are not useful for
solving urgent marketing problems faced by a company. Preparing time schedule is not
adequate in research design. In addition, all operations involved in the research work
should be carried out strictly as per time schedule already  p r e p a r e d . I f n e c e s s a r y
r e m e d i a l m e a s u r e s s h o u l d b e a d o p t e d i n o r d e r t o a v o i d a n y deviation in the
time schedule. This brings certainty as regards the completion of the whole research
project in time.
(4)Verifying results:
 R e s e a r c h f i n d i n g s n e e d t o b e d e p e n d a b l e t o t h e s p o n s o r i n g o r g a n i z a t i o n .
R e s e a r c h e r m a y create new problems before the sponsoring organization if the research
work is conducted in a faulty manner. Such unreliable study is dangerous as it may create new
problems. It is therefore, necessary to keep effective check on the whole research
work during the implementing stage. For this suitable provisions need to be made in the
research design. After deciding the details of the steps noted above, the background
for research design will be ready. Thereafter, the researcher has to prepare the research design
of the whole  p r o j e c t . H e h a s t o p r e s e n t t h e p r o j e c t d e s i g n t o t h e
s p o n s o r i n g a g e n c y o r h i g h e r   authorities for detailed consideration and approval.
The researcher can start the research  p r o j e c t ( a s p e r d e s i g n ) a f t e r s e c u r i n g t h e
n e c e s s a r y a p p r o v a l t o t h e r e s e a r c h d e s i g n  prepared.

 
• To attribute cause and effect relationships among two or more variables so that we
can better understand and predict the outcome of one variable (e.g., sales) when
varying another (e.g., advertising).This classification is frequently used and is quite
popular. Before we discuss each of these design types, a cautionary note is in order.
Some might think that the research design decision suggests a choice among the
design types. Although there are research situations in which all the research
questions might be answered by doing only one of  these types (e.g., a causal research
experiment to determine which of three prices results in the greatest profits), it is more
often the case that the research design might involve more than one of these types
performed in some sequence. The overall research design is intended to indicate exactly how the
different design types will be utilized to get answers to the research questions or test the
hypothesis.

Primary & Secondary Data


Primary data are data that you collect yourself using such methods as:
 direct observation - lets you focus on details of importance to you; lets you see a system
in real rather than theoretical use (other faults are unlikely or trivial in theory but quite
real and annoying in practice);
 Surveys - written surveys let you collect considerable quantities of detailed data. You
have to either trust the honesty of the people surveyed or build in self-verifying questions
(e.g. questions 9 and 24 ask basically the same thing but using different words - different
answers may indicate the surveyed person is being inconsistent, dishonest or inattentive).
 Interviews - slow, expensive, and they take people away from their regular jobs, but they
allow in-depth questioning and follow-up questions. They also show non-verbal
communication such as face-pulling, fidgetting, shrugging, hand gestures, sarcastic
expressions that add further meaning to spoken words. e.g. "I think it's a GREAT system"
could mean vastly different things depending on whether the person was sneering at the
time! A problem with interviews is that people might say what they think the interviewer
wants to hear; they might avoid being honestly critical in case their jobs or reputation
might suffer.
Primary data can be relied on because you know where it came from and what was done to it. It's
like cooking something yourself. You know what went into it.
Secondary data are collected from external sources such as:
 TV, radio, internet
 magazines, newspapers
 reviews
 research articles
 stories told by people you know
There's a lot more secondary data than primary data, and secondary data are a whole lot cheaper
and easier to acquire than primary data. The problem is that often the reliability, accuracy and
integrity of the data are uncertain. Who collected it? Can they be trusted? Did they do any
preprocessing of the data? Is it biased? How old is it? Where was it collected? Can the data be
verified, or does it have to be taken on faith?
Often secondary data have been pre-processed to give totals or averages and the original details
are lost so you can't verify it by replicating the methods used by the original data collectors.
In short, primary data are expensive and difficult to acquire, but they are trustworthy. Secondary
data are cheap and easy to collect, but must be treated with caution.
Primary Data Secondary Data
1. Needs more funds . 1. Needs comparatively less funds .
2. Investigating Agency collect the data. 2. Some other Investigating Agency collects
it for its own use.
3. Requires longer time for collection. 3. Requires less time for collection
4. More reliable and suitable to the 4. Less reliable and suitable as some one
enquiry because the investigator else has done the job of collection which
himself collect it. may not serve the purpose.
5. Requires elaborate organization 5. No need of any organizational set up.
6. No extra precaution is required. 6. Secondary data need more care and attention.
Importance of obtaining Secondary data before primary data:
Secondary data are information that has already been collected for some purpose other than
the problem in hand. When under taking a research project, it is important to understand the
value of secondary data, which may assist in resolving or partly answering the research
problem. Because the these data already exist ,it is often more cost and time effecting to
analyze them before collecting primary data; they may in fact assist in refining the question
and data needs. Nevertheless, secondary data should be examined first, because they can
provide background information that can be used for the project, such as defining, refining
and developing various components of the project. They can act as a solid information base
from which information gaps, which can only be filled by field work, can be identified.
Further, secondary data can assist in better stating problems, suggesting improved methods or
data for the problem, and be a source of comparative data by which primary data can be more
insightfully interpreted.

Q20. Sampling Process / Procedure


The sampling process comprises several stages:
1. Defining the population of concern
2. Specifying a sampling frame, a set of items or events possible to measure
3. Specifying a sampling method for selecting items or events from the frame
4. Determining the sample size
5. Implementing the sampling plan
6. Sampling and data collecting
1. Population definition:
Successful statistical practice is based on focused problem definition. In sampling, this includes
defining the population from which our sample is drawn. A population can be defined as
including all people or items with the characteristic one wish to understand. Because there is
very rarely enough time or money to gather information from everyone or everything in a
population, the goal becomes finding a representative sample (or subset) of that population.
Sometimes that which defines a population is obvious. For example, a manufacturer needs to
decide whether a batch of material from production is of high enough quality to be released to
the customer, or should be sentenced for scrap or rework due to poor quality. In this case, the
batch is the population.
Although the population of interest often consists of physical objects, sometimes we need to
sample over time, space, or some combination of these dimensions.
Note also that the population from which the sample is drawn may not be the same as the
population about which we actually want information. Often there is large but not complete
overlap between these two groups due to frame issues etc. (see below). Sometimes they may be
entirely separate - for instance, we might study rats in order to get a better understanding of
human health, or we might study records from people born in 2008 in order to make predictions
about people born in 2009.
Time spent in making the sampled population and population of concern precise is often well
spent, because it raises many issues, ambiguities and questions that would otherwise have been
overlooked at this stage.
2. Sampling frame
In the most straightforward case, such as the sentencing of a batch of material from production
(acceptance sampling by lots), it is possible to identify and measure every single item in the
population and to include any one of them in our sample. However, in the more general case this
is not possible. There is no way to identify all rats in the set of all rats. Where voting is not
compulsory, there is no way to identify which people will actually vote at a forthcoming election
(in advance of the election). These imprecise populations are not amenable to sampling in any of
the ways below and to which we could apply statistical theory.
As a remedy, we seek a sampling frame which has the property that we can identify every single
element and include any in our sample. The most straightforward type of frame is a list of
elements of the population (preferably the entire population) with appropriate contact
information. For example, in an opinion poll, possible sampling frames include an electoral
register and a telephone directory.
3. Specifying a sampling methods
Within any of the types of frame identified above, a variety of sampling methods can be
employed, individually or in combination. Factors commonly influencing the choice between
these designs include:
 Nature and quality of the frame
 Availability of auxiliary information about units on the frame
 Accuracy requirements, and the need to measure accuracy
 Whether detailed analysis of the sample is expected
 Cost/operational concerns
Simple random sampling
In a simple random sample ('SRS') of a given size, all such subsets of the frame are given an
equal probability. Each element of the frame thus has an equal probability of selection: the frame
is not subdivided or partitioned. Furthermore, any given pair of elements has the same chance of
selection as any other such pair (and similarly for triples, and so on). This minimizes bias and
simplifies analysis of results. In particular, the variance between individual results within the
sample is a good indicator of variance in the overall population, which makes it relatively easy to
estimate the accuracy of results.
Systematic sampling
Systematic sampling relies on arranging the target population according to some ordering
scheme and then selecting elements at regular intervals through that ordered list. Systematic
sampling involves a random start and then proceeds with the selection of every kth element from
then onwards. In this case, k=(population size/sample size). It is important that the starting point
is not automatically the first in the list, but is instead randomly chosen from within the first to the
kth element in the list. A simple example would be to select every 10th name from the telephone
directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10').
As long as the starting point is randomized, systematic sampling is a type of probability
sampling. It is easy to implement and the stratification induced can make it efficient, if the
variable by which the list is ordered is correlated with the variable of interest. 'Every 10th'
sampling is especially useful for efficient sampling from databases.
Stratified sampling
Where the population embraces a number of distinct categories, the frame can be organized by
these categories into separate "strata." Each stratum is then sampled as an independent sub-
population, out of which individual elements can be randomly selected. There are several
potential benefits to stratified sampling.
First, dividing the population into distinct, independent strata can enable researchers to draw
inferences about specific subgroups that may be lost in a more generalized random sample.
Second, utilizing a stratified sampling method can lead to more efficient statistical estimates
(provided that strata are selected based upon relevance to the criterion in question, instead of
availability of the samples). Even if a stratified sampling approach does not lead to increased
statistical efficiency, such a tactic will not result in less efficiency than would simple random
sampling, provided that each stratum is proportional to the group's size in the population.
Third, it is sometimes the case that data are more readily available for individual, pre-existing
strata within a population than for the overall population; in such cases, using a stratified
sampling approach may be more convenient than aggregating data across groups (though this
may potentially be at odds with the previously noted importance of utilizing criterion-relevant
strata).
Finally, since each stratum is treated as an independent population, different sampling
approaches can be applied to different strata, potentially enabling researchers to use the approach
best suited (or most cost-effective) for each identified subgroup within the population.
Cluster sampling
Sometimes it is more cost-effective to select respondents in groups ('clusters'). Sampling is often
clustered by geography, or by time periods. (Nearly all samples are in some sense 'clustered' in
time - although this is rarely taken into account in the analysis.) For instance, if surveying
households within a city, we might choose to select 100 city blocks and then interview every
household within the selected blocks.
Clustering can reduce travel and administrative costs. In the example above, an interviewer can
make a single trip to visit several households in one block, rather than having to drive to a
different block for each household.
It also means that one does not need a sampling frame listing all elements in the target
population. Instead, clusters can be chosen from a cluster-level frame, with an element-level
frame created only for the selected clusters. In the example above, the sample only requires a
block-level city map for initial selections, and then a household-level map of the 100 selected
blocks, rather than a household-level map of the whole city.
Quota sampling
In quota sampling, the population is first segmented into mutually exclusive sub-groups, just as
in stratified sampling. Then judgment is used to select the subjects or units from each segment
based on a specified proportion. For example, an interviewer may be told to sample 200 females
and 300 males between the age of 45 and 60.
It is this second step which makes the technique one of non-probability sampling. In quota
sampling the selection of the sample is non-random. For example interviewers might be tempted
to interview those who look most helpful. The problem is that these samples may be biased
because not everyone gets a chance of selection. This random element is its greatest weakness
and quota versus probability has been a matter of controversy for many years.
Accidental sampling
Accidental sampling (sometimes known as grab, convenience or opportunity sampling) is a
type of non probability sampling which involves the sample being drawn from that part of the
population which is close to hand. That is, a population is selected because it is readily available
and convenient. It may be through meeting the person or including a person in the sample when
one meets them or chosen by finding them through technological means such as the internet or
through phone. The researcher using such a sample cannot scientifically make generalizations
about the total population from this sample because it would not be representative enough. For
example, if the interviewer were to conduct such a survey at a shopping center early in the
morning on a given day, the people that he/she could interview would be limited to those given
there at that given time, which would not represent the views of other members of society in such
an area, if the survey were to be conducted at different times of day and several times per week.
Line-intercept sampling
Line-intercept sampling is a method of sampling elements in a region whereby an element is
sampled if a chosen line segment, called a "transect", intersects the element.
Panel sampling
Panel sampling is the method of first selecting a group of participants through a random
sampling method and then asking that group for the same information again several times over a
period of time. Therefore, each participant is given the same survey or interview at two or more
time points; each period of data collection is called a "wave". This longitudinal sampling-method
allows estimates of changes in the population, for example with regard to chronic illness to job
stress to weekly food expenditures. Panel sampling can also be used to inform researchers about
within-person health changes due to age or to help explain changes in continuous dependent
variables such as spousal interaction. There have been several proposed methods of analyzing
panel data, including MANOVA, growth curves, and structural equation modeling with lagged
effects.
4. Determining the sample size
Formulas, tables, and power function charts are well known approaches to determine sample
size.
Steps for using sample size tables
1. Postulate the effect size of interest, α, and β.
2. Check sample size table
1. Select the table corresponding to the selected α
2. Locate the row corresponding to the desired power
3. Locate the column corresponding to the estimated effect size.
4. The intersection of the column and row is the minimum sample size required.
5. Implementing the sampling plan
6. Sampling and data collection
Good data collection involves:
 Following the defined sampling process
 Keeping the data in time order
 Noting comments and other contextual events
 Recording non-responses
Probability and non probability sampling
A probability sampling scheme is one in which every unit in the population has a chance
(greater than zero) of being selected in the sample, and this probability can be accurately
determined. The combination of these traits makes it possible to produce unbiased estimates of
population totals, by weighting sampled units according to their probability of selection.
Example: We want to estimate the total income of adults living in a given street. We visit each
household in that street, identify all adults living there, and randomly select one adult from each
household. (For example, we can allocate each person a random number, generated from a
uniform distribution between 0 and 1, and select the person with the highest number in each
household). We then interview the selected person and find their income. People living on their
own are certain to be selected, so we simply add their income to our estimate of the total. But a
person living in a household of two adults has only a one-in-two chance of selection. To reflect
this, when we come to such a household, we would count the selected person's income twice
towards the total. (The person who is selected from that household can be loosely viewed as also
representing the person who isn't selected.)
In the above example, not everybody has the same probability of selection; what makes it a
probability sample is the fact that each person's probability is known. When every element in the
population does have the same probability of selection, this is known as an 'equal probability of
selection' (EPS) design. Such designs are also referred to as 'self-weighting' because all sampled
units are given the same weight.
Probability sampling includes: Simple Random Sampling, Systematic Sampling, Stratified
Sampling, Probability Proportional to Size Sampling, and Cluster or Multistage Sampling. These
various ways of probability sampling have two things in common:
1. Every element has a known nonzero probability of being sampled and
2. Involves random selection at some point.
Non probability sampling is any sampling method where some elements of the population have
no chance of selection (these are sometimes referred to as 'out of coverage'/'under covered'), or
where the probability of selection can't be accurately determined. It involves the selection of
elements based on assumptions regarding the population of interest, which forms the criteria for
selection. Hence, because the selection of elements is nonrandom, non probability sampling does
not allow the estimation of sampling errors. These conditions give rise to exclusion bias, placing
limits on how much information a sample can provide about the population. Information about
the relationship between sample and population is limited, making it difficult to extrapolate from
the sample to the population.
Example: We visit every household in a given street, and interview the first person to answer the
door. In any household with more than one occupant, this is a non probability sample, because
some people are more likely to answer the door (e.g. an unemployed person who spends most of
their time at home is more likely to answer than an employed housemate who might be at work
when the interviewer calls) and it's not practical to calculate these probabilities.
Non probability sampling methods include accidental sampling, quota sampling and purposive
sampling. In addition, non response effects may turn any probability design into a non
probability design if the characteristics of non response are not well understood, since non
response effectively modifies each element's probability of being sampled
Simple random sampling
In a simple random sample ('SRS') of a given size, all such subsets of the frame are given an
equal probability. Each element of the frame thus has an equal probability of selection: the frame
is not subdivided or partitioned. Furthermore, any given pair of elements has the same chance of
selection as any other such pair (and similarly for triples, and so on). This minimizes bias and
simplifies analysis of results. In particular, the variance between individual results within the
sample is a good indicator of variance in the overall population, which makes it relatively easy to
estimate the accuracy of results.
However, SRS can be vulnerable to sampling error because the randomness of the selection may
result in a sample that doesn't reflect the makeup of the population. For instance, a simple
random sample of ten people from a given country will on average produce five men and five
women, but any given trial is likely to over represent one sex and under represent the other.
Systematic and stratified techniques, discussed below, attempt to overcome this problem by
using information about the population to choose a more representative sample.
SRS may also be cumbersome and tedious when sampling from an unusually large target
population. In some cases, investigators are interested in research questions specific to subgroups
of the population. For example, researchers might be interested in examining whether cognitive
ability as a predictor of job performance is equally applicable across racial groups. SRS cannot
accommodate the needs of researchers in this situation because it does not provide subsamples of
the population. Stratified sampling, which is discussed below, addresses this weakness of SRS.
Simple random sampling is always an EPS design (equal probability of selection), but not all
EPS designs are simple random sampling.
Systematic sampling
Systematic sampling relies on arranging the target population according to some ordering
scheme and then selecting elements at regular intervals through that ordered list. Systematic
sampling involves a random start and then proceeds with the selection of every kth element from
then onwards. In this case, k=(population size/sample size). It is important that the starting point
is not automatically the first in the list, but is instead randomly chosen from within the first to the
kth element in the list. A simple example would be to select every 10th name from the telephone
directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10').
As long as the starting point is randomized, systematic sampling is a type of probability
sampling. It is easy to implement and the stratification induced can make it efficient, if the y
whvariable bich the list is ordered is correlated with the variable of interest. 'Every 10th'
sampling is especially useful for efficient sampling from databases.
Example: Suppose we wish to sample people from a long street that starts in a poor district
(house #1) and ends in an expensive district (house #1000). A simple random selection of
addresses from this street could easily end up with too many from the high end and too few from
the low end (or vice versa), leading to an unrepresentative sample. Selecting (e.g.) every 10th
street number along the street ensures that the sample is spread evenly along the length of the
street, representing all of these districts. (Note that if we always start at house #1 and end at
#991, the sample is slightly biased towards the low end; by randomly selecting the start between
#1 and #10, this bias is eliminated.)
However, systematic sampling is especially vulnerable to periodicities in the list. If periodicity is
present and the period is a multiple or factor of the interval used, the sample is especially likely
to be unrepresentative of the overall population, making the scheme less accurate than simple
random sampling.
Example: Consider a street where the odd-numbered houses are all on the north (expensive) side
of the road, and the even-numbered houses are all on the south (cheap) side. Under the sampling
scheme given above, it is impossible' to get a representative sample; either the houses sampled
will all be from the odd-numbered, expensive side, or they will all be from the even-numbered,
cheap side.
Another drawback of systematic sampling is that even in scenarios where it is more accurate than
SRS, its theoretical properties make it difficult to quantify that accuracy. (In the two examples of
systematic sampling that are given above, much of the potential sampling error is due to
variation between neighboring houses - but because this method never selects two neighboring
houses, the sample will not give us any information on that variation.)
As described above, systematic sampling is an EPS method, because all elements have the same
probability of selection (in the example given, one in ten). It is not 'simple random sampling'
because different subsets of the same size have different selection probabilities - e.g. the set
{4,14,24,...,994} has a one-in-ten probability of selection, but the set {4,13,24,34,...} has zero
probability of selection.
Systematic sampling can also be adapted to a non-EPS approach; for an example, see discussion
of PPS samples below.
Stratified sampling
Where the population embraces a number of distinct categories, the frame can be organized by
these categories into separate "strata." Each stratum is then sampled as an independent sub-
population, out of which individual elements can be randomly selected. [1] There are several
potential benefits to stratified sampling.
First, dividing the population into distinct, independent strata can enable researchers to draw
inferences about specific subgroups that may be lost in a more generalized random sample.
Second, utilizing a stratified sampling method can lead to more efficient statistical estimates
(provided that strata are selected based upon relevance to the criterion in question, instead of
availability of the samples). Even if a stratified sampling approach does not lead to increased
statistical efficiency, such a tactic will not result in less efficiency than would simple random
sampling, provided that each stratum is proportional to the group's size in the population.
Third, it is sometimes the case that data are more readily available for individual, pre-existing
strata within a population than for the overall population; in such cases, using a stratified
sampling approach may be more convenient than aggregating data across groups (though this
may potentially be at odds with the previously noted importance of utilizing criterion-relevant
strata).
Finally, since each stratum is treated as an independent population, different sampling
approaches can be applied to different strata, potentially enabling researchers to use the approach
best suited (or most cost-effective) for each identified subgroup within the population.
There are, however, some potential drawbacks to using stratified sampling. First, identifying
strata and implementing such an approach can increase the cost and complexity of sample
selection, as well as leading to increased complexity of population estimates. Second, when
examining multiple criteria, stratifying variables may be related to some, but not to others,
further complicating the design, and potentially reducing the utility of the strata. Finally, in some
cases (such as designs with a large number of strata, or those with a specified minimum sample
size per group), stratified sampling can potentially require a larger sample than would other
methods (although in most cases, the required sample size would be no larger than would be
required for simple random sampling.
A stratified sampling approach is most effective when three conditions are met
1. Variability within strata are minimized
2. Variability between strata are maximized
3. The variables upon which the population is stratified are strongly correlated with the
desired dependent variable.
Advantages over other sampling methods
1. Focuses on important subpopulations and ignores irrelevant ones.
2. Allows use of different sampling techniques for different subpopulations.
3. Improves the accuracy/efficiency of estimation.
4. Permits greater balancing of statistical power of tests of differences between strata by
sampling equal numbers from strata varying widely in size.
Disadvantages
1. Requires selection of relevant stratification variables which can be difficult.
2. Is not useful when there are no homogeneous subgroups.
3. Can be expensive to implement.
Post stratification
Stratification is sometimes introduced after the sampling phase in a process called "post
stratification". This approach is typically implemented due to a lack of prior knowledge of an
appropriate stratifying variable or when the experimenter lacks the necessary information to
create a stratifying variable during the sampling phase. Although the method is susceptible to the
pitfalls of post hoc approaches, it can provide several benefits in the right situation.
Implementation usually follows a simple random sample. In addition to allowing for
stratification on an ancillary variable, post stratification can be used to implement weighting,
which can improve the precision of a sample's estimates.
Oversampling
Choice-based sampling is one of the stratified sampling strategies. In choice-based sampling, the
data are stratified on the target and a sample is taken from each stratum so that the rare target
class will be more represented in the sample. The model is then built on this biased sample. The
effects of the input variables on the target are often estimated with more precision with the
choice-based sample even when a smaller overall sample size is taken compared to a random
sample. The results usually must be adjusted to correct for the oversampling.
Probability proportional to size sampling
In some cases the sample designer has access to an "auxiliary variable" or "size measure",
believed to be correlated to the variable of interest, for each element in the population. These
data can be used to improve accuracy in sample design. One option is to use the auxiliary
variable as a basis for stratification, as discussed above.
Another option is probability-proportional-to-size ('PPS') sampling, in which the selection
probability for each element is set to be proportional to its size measure, up to a maximum of 1.
In a simple PPS design, these selection probabilities can then be used as the basis for Poisson
sampling. However, this has the drawback of variable sample size, and different portions of the
population may still be over- or under-represented due to chance variation in selections. To
address this problem, PPS may be combined with a systematic approach.
Example: Suppose we have six schools with populations of 150, 180, 200, 220, 260, and 490
students respectively (total 1500 students), and we want to use student population as the basis for
a PPS sample of size three. To do this, we could allocate the first school numbers 1 to 150, the
second school 151 to 330 (= 150 + 180), the third school 331 to 530, and so on to the last school
(1011 to 1500). We then generate a random start between 1 and 500 (equal to 1500/3) and count
through the school populations by multiples of 500. If our random start was 137, we would select
the schools which have been allocated numbers 137, 637, and 1137, i.e. the first, fourth, and
sixth schools.
The PPS approach can improve accuracy for a given sample sizes by concentrating sample on
large elements that have the greatest impact on population estimates. PPS sampling is commonly
used for surveys of businesses, where element size varies greatly and auxiliary information is
often available - for instance, a survey attempting to measure the number of guest-nights spent in
hotels might use each hotel's number of rooms as an auxiliary variable. In some cases, an older
measurement of the variable of interest can be used as an auxiliary variable when attempting to
produce more current estimates.
Cluster sampling
Sometimes it is more cost-effective to select respondents in groups ('clusters'). Sampling is often
clustered by geography, or by time periods. (Nearly all samples are in some sense 'clustered' in
time - although this is rarely taken into account in the analysis.) For instance, if surveying
households within a city, we might choose to select 100 city blocks and then interview every
household within the selected blocks.
Clustering can reduce travel and administrative costs. In the example above, an interviewer can
make a single trip to visit several households in one block, rather than having to drive to a
different block for each household.
It also means that one does not need a sampling frame listing all elements in the target
population. Instead, clusters can be chosen from a cluster-level frame, with an element-level
frame created only for the selected clusters. In the example above, the sample only requires a
block-level city map for initial selections, and then a household-level map of the 100 selected
blocks, rather than a household-level map of the whole city.
Cluster sampling generally increases the variability of sample estimates above that of simple
random sampling, depending on how the clusters differ between themselves, as compared with
the within-cluster variation. For this reason, cluster sampling requires a larger sample than SRS
to achieve the same level of accuracy - but cost savings from clustering might still make this a
cheaper option.
Cluster sampling is commonly implemented as multistage sampling. This is a complex form of
cluster sampling in which two or more levels of units are embedded one in the other. The first
stage consists of constructing the clusters that will be used to sample from. In the second stage, a
sample of primary units is randomly selected from each cluster (rather than using all units
contained in all selected clusters). In following stages, in each of those selected clusters,
additional samples of units are selected, and so on. All ultimate units (individuals, for instance)
selected at the last step of this procedure are then surveyed. This technique, thus, is essentially
the process of taking random subsamples of preceding random samples.
Multistage sampling can substantially reduce sampling costs, where the complete population list
would need to be constructed (before other sampling methods could be applied). By eliminating
the work involved in describing clusters that are not selected, multistage sampling can reduce the
large costs associated with traditional cluster sampling.
Quota sampling
In quota sampling, the population is first segmented into mutually exclusive sub-groups, just as
in stratified sampling. Then judgment is used to select the subjects or units from each segment
based on a specified proportion. For example, an interviewer may be told to sample 200 females
and 300 males between the age of 45 and 60.
It is this second step which makes the technique one of non-probability sampling. In quota
sampling the selection of the sample is non-random. For example interviewers might be tempted
to interview those who look most helpful. The problem is that these samples may be biased
because not everyone gets a chance of selection. This random element is its greatest weakness
and quota versus probability has been a matter of controversy for many years.
Accidental sampling
Accidental sampling (sometimes known as grab, convenience or opportunity sampling) is a
type of non probability sampling which involves the sample being drawn from that part of the
population which is close to hand. That is, a population is selected because it is readily available
and convenient. It may be through meeting the person or including a person in the sample when
one meets them or chosen by finding them through technological means such as the internet or
through phone. The researcher using such a sample cannot scientifically make generalizations
about the total population from this sample because it would not be representative enough. For
example, if the interviewer were to conduct such a survey at a shopping center early in the
morning on a given day, the people that he/she could interview would be limited to those given
there at that given time, which would not represent the views of other members of society in such
an area, if the survey were to be conducted at different times of day and several times per week.
This type of sampling is most useful for pilot testing. Several important considerations for
researchers using convenience samples include:
1. Are there controls within the research design or experiment which can serve to lessen the
impact of a non-random convenience sample, thereby ensuring the results will be more
representative of the population?
2. Is there good reason to believe that a particular convenience sample would or should
respond or behave differently than a random sample from the same population?
3. Is the question being asked by the research one that can adequately be answered using a
convenience sample?
In social science research, snowball sampling is a similar technique, where existing study
subjects are used to recruit more subjects into the sample. Some variants of snowball sampling,
such as respondent driven sampling, allow calculation of selection probabilities and are
probability sampling methods under certain conditions.
Line-intercept sampling
Line-intercept sampling is a method of sampling elements in a region whereby an element is
sampled if a chosen line segment, called a "transect", intersects the element.

Panel sampling
Panel sampling is the method of first selecting a group of participants through a random
sampling method and then asking that group for the same information again several times over a
period of time. Therefore, each participant is given the same survey or interview at two or more
time points; each period of data collection is called a "wave". This longitudinal sampling-method
allows estimates of changes in the population, for example with regard to chronic illness to job
stress to weekly food expenditures. Panel sampling can also be used to inform researchers about
within-person health changes due to age or to help explain changes in continuous dependent
variables such as spousal interaction. There have been several proposed methods of analyzing
panel data, including MANOVA, growth curves, and structural equation modeling with lagged
effects.
Replacement of selected units
Sampling schemes may be without replacement ('WOR' - no element can be selected more than
once in the same sample) or with replacement ('WR' - an element may appear multiple times in
the one sample). For example, if we catch fish, measure them, and immediately return them to
the water before continuing with the sample, this is a WR design, because we might end up
catching and measuring the same fish more than once. However, if we do not return the fish to
the water (e.g. if we eat the fish), this becomes a WOR design.
Sample size
Formulas, tables, and power function charts are well known approaches to determine sample
size.
Steps for using sample size tables :
Postulate the effect size of interest, α, and β.
Check sample size table
Select the table corresponding to the selected α
Locate the row corresponding to the desired power
Locate the column corresponding to the estimated effect size.
The intersection of the column and row is the minimum sample size required.

Errors in sample surveys


Survey results are typically subject to some error. Total errors can be classified into sampling
errors and non-sampling errors. The term "error" here includes systematic biases as well as
random errors.
Sampling errors and biases
Sampling errors and biases are induced by the sample design. They include:
1. Selection bias: When the true selection probabilities differ from those assumed in
calculating the results.
2. Random sampling error: Random variation in the results due to the elements in the
sample being selected at random.
Non-sampling error
Non-sampling errors are other errors which can impact the final survey estimates, caused by
problems in data collection, processing, or sample design. They include:
1. Over coverage: Inclusion of data from outside of the population.
2. Under coverage: Sampling frame does not include elements in the population.
3. Measurement error: e.g. when respondents misunderstand a question, or find it difficult
to answer.
4. Processing error: Mistakes in data coding.
5. Non-response: Failure to obtain complete data from all selected individuals.
After sampling, a review should be held of the exact process followed in sampling, rather than
that intended, in order to study any effects that any divergences might have on subsequent
analysis. A particular problem is that of non-response.
Two major types of non response exist: unit non response (referring to lack of completion of any
part of the survey) and item non response (submission or participation in survey but failing to
complete one or more components/questions of the survey). In survey sampling, many of the
individuals identified as part of the sample may be unwilling to participate, not have the time to
participate (opportunity cost), or survey administrators may not have been able to contact them.
In this case, there is a risk of differences, between respondents and non respondents, leading to
biased estimates of population parameters. This is often addressed by improving survey design,
offering incentives, and conducting follow-up studies which make a repeated attempt to contact
the unresponsive and to characterize their similarities and differences with the rest of the frame.
The effects can also be mitigated by weighting the data when population benchmarks are
available or by imputing data based on answers to other questions.
Non response is particularly a problem in internet sampling. Reasons for this problem include
improperly designed surveys, over-surveying (or survey fatigue), and the fact that potential
participants hold multiple e-mail addresses, which they don't use anymore or don't check
regularly.
Survey weights
In many situations the sample fraction may be varied by stratum and data will have to be
weighted to correctly represent the population. Thus for example, a simple random sample of
individuals in the United Kingdom might include some in remote Scottish islands who would be
inordinately expensive to sample. A cheaper method would be to use a stratified sample with
urban and rural strata. The rural sample could be under-represented in the sample, but weighted
up appropriately in the analysis to compensate.
More generally, data should usually be weighted if the sample design does not give each
individual an equal chance of being selected. For instance, when households have equal selection
probabilities but one person is interviewed from within each household, this gives people from
large households a smaller chance of being interviewed. This can be accounted for using survey
weights. Similarly, households with more than one telephone line have a greater chance of being
selected in a random digit dialing sample, and weights can adjust for this.
Weights can also serve other purposes, such as helping to correct for non-response.

Q21. What do you mean by primary data? What are the various methods of collecting
primary data?
Primary sources are original sources from which the researcher directly collects data that have
not been previously collected e.g.., collection of data directly by the researcher on brand
awareness, brand preference, brand loyalty and other aspects of consumer behavior from a
sample of consumers by interviewing them,. Primary data are first hand information collected
through various methods such as observation, interviewing, mailing etc.
Methods of Collecting Primary Data
Primary data are directly collected by the researcher from their original sources. In this case, the
researcher can collect the required date precisely according to his research needs, he can collect
them when he wants them and in the form he needs them. But the collection of primary data is
costly and time consuming. Yet, for several types of social science research required data are not
available from secondary sources and they have to be directly gathered from the primary sources.
In such cases where the available data are inappropriate, inadequate or obsolete, primary data
have to be gathered. They include: socio economic surveys, social anthropological studies of
rural communities and tribal communities, sociological studies of social problems and social
institutions. Marketing research, leadership studies, opinion polls, attitudinal surveys, readership,
radio listening and T.V. viewing surveys, knowledge-awareness practice (KAP) studies, farm
managements studies, business management studies etc.
There are various methods of data collection. A ‘Method’ is different from a ‘Tool’ while a
method refers to the way or mode of gathering data, a tool is an instruments used for the method.
For example, a schedule is used for interviewing. The important methods are
(a) Observation, (b) Interviewing, (c) Mail survey, (d) Experimentation,
(a) Observation: Observation means viewing or seeing. Observation may be defined as a
systematic viewing of a specific phenomenon in its proper setting for the specific purpose of
gathering data for a particular study. Observation is classical method of scientific study.
(b) Interviewing: Interviewing is one of the prominent methods of data collection. It may be
defined as a two way systematic conversation between an investigator and an informant, initiated
for obtaining information relevant to a specific study. It involves not only conversation, but also
learning from the respondent’s gesture, facial expressions and pauses, and his environment.
Interviewing requires face to face contact or contact over telephone and calls for interviewing
skills. It is done by using a structured schedule or an unstructured guide.
(c) Mail survey: The mail survey is a data collection process for researchers. Research
practitioners should recognize that this is a viable means of collecting specific market data.
(d) Experimentation: The popularity of experimentation in marketing research has much to do
with the possibilities of establishing cause and effect. Experiments can be configured in such a
way as to allow the variable causing a particular effect to be isolated. Other methods commonly
used in marketing research, like surveys, provide much more ambiguous findings. In fact,
experimentation is the most scientific method employed in marketing research.
Q22. What are the differences between observation and interviewing as methods of data
collection? Give two specific examples of situations where either observation or
interviewing would be more appropriate.
Observation: Observation means viewing or seeing. Observation may be defined as a systematic
viewing of a specific phenomenon in its proper setting for the specific purpose of gathering data
for a particular study. Observation is classical method of scientific study.
Observation as a method of data collection has certain characteristics.
1. It is both a physical and a mental activity: The observing eye catches many things that are
present. But attention is focused on data that are pertinent to the given study.
2. Observation is selective: A researcher does not observe anything and everything, but selects
the range of things to be observed on the basis of the nature, scope and objectives of his study.
For example, suppose a researcher desires to study the causes of city road accidents and also
formulated a tentative hypothesis that accidents are caused by violation of traffic rules and over
speeding. When he observed the movements of vehicles on the road, many things are before his
eyes; the type, make, size and colour of the vehicles, the persons sitting in them, their hair style,
etc. All such things which are not relevant to his study are ignored and only over speeding and
traffic violations are keenly observed by him.
3. Observation is purposive and not casual: It is made for the specific purpose of noting things
relevant to the study. It captures the natural social context in which persons behaviour occur. It
grasps the significant events and occurrences that affect social relations of the participants.
4. Observation should be exact and be based on standardized tools of research and such as
observation schedule, social metric scale etc., and precision instruments, if any.
Observation is suitable for a variety of research purposes. It may be used for studying
(a) The behavior of human beings in purchasing goods and services. Life style, customs, and
manner, interpersonal relations, group dynamics, crowd behavior, leadership styles, managerial
style, other behaviors and actions;
(b) The behavior of other living creatures like birds, animals etc.
(c) Physical characteristics of inanimate things like stores, factories, residences etc.
(d) Flow of traffic and parking problems
(e) Movement of materials and products through a plant.

Interviewing: Interviewing is one of the prominent methods of data collection. It may be


defined as a two way systematic conversation between an investigator and an informant, initiated
for obtaining information relevant to a specific study. It involves not only conversation, but also
learning from the respondent’s gesture, facial expressions and pauses, and his environment.
Interviewing requires face to face contact or contact over telephone and calls for interviewing
skills. It is done by using a structured schedule or an unstructured guide. Interviewing may be
used either as a main method or as a supplementary one in studies of persons. Interviewing is the
only suitable method for gathering information from illiterate or less educated respondents. It is
useful for collecting a wide range of data from factual demographic data to highly personal and
intimate information relating to a person’s opinions, attitudes, values, beliefs past experience and
future intentions. When qualitative information is required or probing is necessary to draw out
fully, and then interviewing is required. Where the area covered for the survey is a compact, or
when a sufficient number of qualified interviewers are available, personal interview is feasible.
Interview is often superior to other data-gathering methods. People are usually more willing to
talk than to write. Once report is established, even confidential information may be obtained. It
permits probing into the context and reasons for answers to questions. Interview can add flesh to
statistical information. It enables the investigator to grasp the behavioral context of the data
furnished by the respondents.

Q23. What is questionnaire? Discuss the main points that you will take into account while
drafting a questionnaire? .

A questionnaire is a research instrument consisting of a series of questions and other prompts for
the purpose of gathering information from respondents. Although they are often designed for
statistical analysis of the responses, this is not always the case. The questionnaire was invented
by Sir Francis Galton. .
Questionnaires have advantages over some other types of surveys in that they are cheap, do not
require as much effort from the questioner as verbal or telephone surveys, and often have
standardized answers that make it simple to compile data. However, such standardized answers
may frustrate users. Questionnaires are also sharply limited by the fact that respondents must be
able to read the questions and respond to them. Thus, for some demographic groups conducting a
survey by questionnaire may not be practical. .

The main points that will be taken into account while drafting a questionnaire:
* Use statements which are interpreted in the same way by members of different subpopulations
of the population of interest. .
* Use statements where persons that have different opinions or traits will give different answers.
* Think of having an "open" answer category after a list of possible answers.
* Use only one aspect of the construct you are interested in per item.
* Use positive statements and avoid negatives or double negatives.
* Do not make assumptions about the respondent.
* Use clear and comprehensible wording, easily understandable for all educational levels
* Use correct spelling, grammar and punctuation.
* Avoid items that contain more than one question per item (e.g. Do you like strawberries and
potatoes
Below is a checklist you can use when forming your questions:
q Is this question necessary? How will it be useful? What will it tell you?
q Will you need to ask several related questions on a subject to be able to answer your critical
question?
q Do respondents have the necessary information to answer the question?
q Will the words in each question be universally understood by your target audience?
q Are abbreviations used? Will everyone in your sample understand what they mean?
q Are unconventional phrases used? If so, are they really necessary? Can they be deleted?
q Is the question too vague? Does it get directly to the subject matter?
q Can the question be misunderstood? Does it contain unclear phrases?
q Is the question misleading because of unstated assumptions or unseen implications?
Are your assumptions the same as the target audience?
q Have you assumed that the target audience has adequate knowledge to answer the question?
q Is the question too demanding? For example, does it ask too much on the part of the respondent
in terms of mathematical calculations, or having to look up records?
q Is the question biased in a particular direction, without accompanying questions to balance the
emphasis?
q Are you asking two questions at one time?
q Does the question have a double negative?
q Is the question wording likely to be objectionable to the target audience in any way?
q Are the answer choices mutually exclusive?
q Is the question technically accurate?
q Is an appropriate referent provided? For example: per year, per acre.
Schedule Method
In case the informants are largely uneducated and non-responsive data cannot be collected by the
mailed questionnaire method. In such cases, schedule method is used to collect data. Here the
questionnaires are sent through the enumerators to collect informations. Enumerators are persons
appointed by the investigator for the purpose. They directly meet the informants with the
questionnaire. They explain the scope and objective of the enquiry to the informants and solicit
their cooperation. The enumerators ask the questions to the informants and record their answers
in the questionnaire and compile them. The success of this method depends on the sincerity and
efficiency of the enumerators. So the enumerator should be sweet-tempered, good-natured,
trained and well-behaved.
Schedule method is widely used in extensive studies. It gives fairly correct result as the
enumerators directly collect the information. The accuracy of the information depends upon the
honesty of the enumerators. They should be unbiased. This method is relatively more costly and
time-consuming than the mailed questionnaire method.
DATA ANALYSIS
Processing of data--editing, coding, classification and tabulation
After collecting data, the method of converting raw data into meaningful statement; includes data
processing, data analysis, and data interpretation and presentation.

Data reduction or processing mainly involves various manipulations necessary for preparing the
data for analysis. The process (of manipulation) could be manual or electronic. It involves
editing, categorizing the open-ended questions, coding, computerization and preparation of
tables and diagrams. .

Editing data: .

Information gathered during data collection may lack uniformity. Example: Data collected
through questionnaire and schedules may have answers which may not be ticked at proper
places, or some questions may be left unanswered. Sometimes information may be given in a
form which needs reconstruction in a category designed for analysis, e.g., converting
daily/monthly income in annual income and so on. The researcher has to take a decision as to
how to edit it. .

Editing also needs that data are relevant and appropriate and errors are modified. Occasionally,
the investigator makes a mistake and records and impossible answer. “How much red chilies do
you use in a month” The answer is written as “4 kilos”. Can a family of three members use four
kilo chilies in a month? The correct answer could be “0.4 kilo”.

Care should be taken in editing (re-arranging) answers to open-ended questions. Example:


Sometimes “don’t know” answer is edited as “no response”. This is wrong. “Don’t know” means
that the respondent is not sure and is in a double mind about his reaction or considers the
questions personal and does not want to answer it. “No response” means that the respondent is
not familiar with the situation/object/event/individual about which he is asked.

Coding of data: .

Coding is translating answers into numerical values or assigning numbers to the various
categories of a variable to be used in data analysis. Coding is done by using a code book, code
sheet, and a computer card. Coding is done on the basis of the instructions given in the
codebook. The code book gives a numerical code for each variable.
Now-a-days, codes are assigned before going to the field while constructing the
questionnaire/schedule. Pose data collection; pre-coded items are fed to the computer for
processing and analysis. For open-ended questions, however, post-coding is necessary. In such
cases, all answers to open-ended questions are placed in categories and each category is assigned
a code. .

Manual processing is employed when qualitative methods are used or when in quantitative
studies, a small sample is used, or when the questionnaire/schedule has a large number of open-
ended questions, or when accessibility to computers is difficult or inappropriate. However,
coding is done in manual processing also. .

Data classification/distribution: .

Sarantakos (1998: 343) defines distribution of data as a form of classification of scores obtained
for the various categories or a particular variable. There are four types of distributions:

1.Frequencydistribution
2.Percentagedistribution
3.Cumulativedistribution
4.Statisticaldistributions

1.Frequency distribution: .

In social science research, frequency distribution is very common. It presents the frequency of
occurrences of certain categories. This distribution appears in two forms:

Ungrouped: Here, the scores are not collapsed into categories, e.g., distribution of ages of the
students of a BJ (MC) class, each age value (e.g., 18, 19, 20, and so on) will be presented
separately in the distribution. .
Grouped: Here, the scores are collapsed into categories, so that 2 or 3 scores are presented
together as a group. For example, in the above age distribution groups like 18-20, 21-22 etc., can
be formed) .

2.Percentage distribution: .

It is also possible to give frequencies not in absolute numbers but in percentages. For instance
instead of saying 200 respondents of total 2000 had a monthly income of less than Rs. 500, we
can say 10% of the respondents have a monthly income of less than Rs. 500.

3. Cumulative distribution: .

It tells how often the value of the random variable is less than or equal to a particular reference
value.

4. Statistical data distribution: .

In this type of data distribution, some measure of average is found out of a sample of
respondents. Several kind of averages are available (mean, median, mode) and the researcher
must decide which is most suitable to his purpose. Once the average has been calculated, the
question arises: how representative a figure it is, i.e., how closely the answers are bunched
around it. Are most of them very close to it or is there a wide range of variation?

Tabulation of data: .

After editing, which ensures that the information on the schedule is accurate and categorized in a
suitable form, the data are put together in some kinds of tables and may also undergo some other
forms of statistical analysis. .

Table can be prepared manually and/or by computers. For a small study of 100 to 200 persons,
there may be little point in tabulating by computer since this necessitates putting the data on
punched cards. But for a survey analysis involving a large number of respondents and requiring
cross tabulation involving more than two variables, hand tabulation will be inappropriate and
time consuming.
Usefulness of tables: .

Tables are useful to the researchers and the readers in three ways:

1. The present an overall view of findings in a simpler way. .


2. They identify trends. .
3. They display relationships in a comparable way between parts of the findings.
By convention, the dependent variable is presented in the rows and the independent variable in
the columns.
Measures of Central Tendency
A measure of central tendency (also referred to as measures of centre or central location) is
a summary measure that attempts to describe a whole set of data with a single value that
represents the middle or centre of its distribution. The most common measures of central
tendency are the arithmetic mean, the median and the mode.
The mean is the sum of the value of each observation in a dataset divided by the number of
observations. This is also known as the arithmetic average.

Looking at the retirement age distribution again:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

The mean is calculated by adding together all the values


(54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the number of observations
(11) which equals 56.6 years. .

Advantage of the mean: .

The mean can be used for both continuous and discrete numeric data.
Limitations of the mean: .

The mean cannot be calculated for categorical data, as the values cannot be summed.

As the mean includes every value in the distribution the mean is influenced by outliers and
skewed distributions. .

What else do I need to know about the mean? .

The population mean is indicated by the Greek symbol µ (pronounced ‘mu’). When the mean is
calculated on a distribution from a sample it is indicated by the symbol x̅ (pronounced X-bar).
The median is the middle value in distribution when the values are arranged in ascending
or descending order. .

The median divides the distribution in half (there are 50% of observations on either side of the
median value). In a distribution with an odd number of observations, the median value is the
middle value. .

Looking at the retirement age distribution (which has 11 observations), the median is the middle
value, which is 57 years: .

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

When the distribution has an even number of observations, the median value is the mean of the
two middle values. In the following distribution, the two middle values are 56 and 57, therefore
the median equals 56.5 years: .

52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

Advantage of the median: .

The median is less affected by outliers and skewed data than the mean, and is usually the
preferred measure of central tendency when the distribution is not symmetrical.

Limitation of the median: .

The median cannot be identified for categorical nominal data, as it cannot be logically ordered.

The mode is the most commonly occurring value in a distribution. .

Consider this dataset showing the retirement age of 11 people, in whole years:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age data.
Age Frequency
54 3
55 1
56 1
57 2
58 2
60 2

The most commonly occurring value is 54; therefore the mode of this distribution is 54 years.

Advantage of the mode: .

The mode has an advantage over the median and the mean as it can be found for both numerical
and categorical (non-numerical) data. .

Limitations of the mode: .

The are some limitations to using the mode. In some distributions, the mode may not reflect the
centre of the distribution very well. When the distribution of retirement age is ordered from
lowest to highest value, it is easy to see that the centre of the distribution is 57 years, but the
mode is lower, at 54 years. .

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

It is also possible for there to be more than one mode for the same distribution of data, (bi-modal,
or multi-modal). The presence of more than one mode can limit the ability of the mode in
describing the centre or typical value of the distribution because a single value to describe the
centre cannot be identified. .

In some cases, particularly where the data are continuous, the distribution may have no mode at
all (i.e. if all values are different). .

In cases such as these, it may be better to consider using the median or mean, or group the data in
to appropriate intervals, and find the modal class.
The shape of a distribution influence the Measures of Central Tendency
Symmetrical distributions: .

When a distribution is symmetrical, the mode, median and mean are all in the middle of the
distribution. The following graph shows a larger retirement age dataset with a distribution which
is symmetrical. The mode, median and mean all equal 58 years.
Skewed distributions: .

When a distribution is skewed the mode remains the most commonly occurring value, the
median remains the middle value in the distribution, but the mean is generally ‘pulled’ in the
direction of the tails. In a skewed distribution, the median is often a preferred measure of central
tendency, as the mean is not usually in the middle of the distribution.

A distribution is said to be positively or right skewed when the tail on the right side of the
distribution is longer than the left side. In a positively skewed distribution it is common for the
mean to be ‘pulled’ toward the right tail of the distribution. Although there are exceptions to this
rule, generally, most of the values, including the median value, tend to be less than the mean
value.

The following graph shows a larger retirement age data set with a distribution which is right
skewed. The data has been grouped into classes, as the variable being measured (retirement age)
is continuous. The mode is 54 years, the modal class is 54-56 years, the median is 56 years and
the mean is 57.2 years.
A distribution is said to be negatively or left skewed when the tail on the left side of the
distribution is longer than the right side. In a negatively skewed distribution, it is common for the
mean to be ‘pulled’ toward the left tail of the distribution. Although there are exceptions to this
rule, generally, most of the values, including the median value, tend to be greater than the mean
value.

The following graph shows a larger retirement age dataset with a distribution which left skewed.
The mode is 65 years, the modal class is 63-65 years, the median is 63 years and the mean is
61.8 years.

Measures of Central Tendency


Measures of central tendency are numbers that tend to cluster around the “middle” of a set of
values. Three such middle numbers are the mean, the median, and the mode.
For example, suppose your earnings for the past week were the values shown in Table’1’

Mean
The arithmetic mean is the sum of the measures in the set divided by the number of measures in
the set. Totaling all the measures and dividing by the number of measures, we get
$1,000 ÷ 5 = $200.
Median

Another measure of central tendency is the median, which is defined as the middle value when
the numbers are arranged in increasing or decreasing order. When we order the daily earnings
shown in Table, we get

$50, $100, $150, $350, $350.


The middle value is $150;
therefore, median = $150

If there is an even number of items in a set, the median is the average of the two middle values.
For example,
If we had four values— 4, 10, 12, and 26
The median would be the average of the two middle values,
10 and 12; in this case, (10+12/2)
Median = 11
The median may sometimes be a better indicator of central tendency than the mean, especially
when there are outliers, or extreme values.

Example 1
Given the four annual salaries of a corporation shown in Table’2’, determine the mean and the
median.
The mean of these four salaries is $275,000.
The median is the average of the middle two salaries, or $40,000.
In this instance, the median appears to be a better indicator of central tendency because the
CEO's salary is an extreme outlier, causing the mean to lie far from the other three salaries.

Mode
Another indicator of central tendency is the mode, or the value that occurs most often in a set of
numbers. In the set of weekly earnings in Table 1, the mode would be $350 because it appears
twice and the other values appear only once.
Notation and formulae
MEAN
The mean of a sample is typically denoted as (read as x bar). The mean of a population is
typically denoted as μ (pronounced mew). The sum (or total) of measures is typically denoted
with a Σ. The formula for a sample mean is

  where n is the number of values.


Mean for grouped data
Occasionally we may have data that consist not of actual values but rather of grouped measures.
For example, we may know that, in a certain working population, 32 percent earn between
$25,000 and $29,999; 40 percent earn between $30,000 and $34,999; 27 percent earn between
$35,000 and $39,999; and the remaining 1 percent earn between $80,000 and $85,000. This type
of information is similar to that presented in a frequency table. Although we do not have precise
individual measures, we still can compute measures for grouped data, data presented in a
frequency table.
The formula for a sample mean for grouped data is

Where
x is the midpoint of the interval,
f is the frequency for the interval,
fx is the product of the midpoint times the frequency, and
n is the number of values.
For example, if 8 is the midpoint of a class interval and there are ten measurements in the
interval, fx = 10(8) = 80, the sum of the ten measurements in the interval.
Σ fx denotes the sum of all the products in all class intervals. Dividing that sum by the number of
measurements yields the sample mean for grouped data.
For example, consider the information shown in Table 3.

Substituting into the formula: 

Therefore, the average price of items sold was about $15.19. The value may not be the exact
mean for the data, because the actual values are not always known for grouped data.
Median for grouped data
As with the mean, the median for grouped data may not necessarily be computed precisely
because the actual values of the measurements may not be known. In that case, you can find the
particular interval that contains the median and then approximate the median.
Using Table 3, you can see that there is a total of 32 measures. The median is between the 16th
and 17th measure; therefore, the median falls within the $11.00 to $15.99 interval. The formula
for the best approximation of the median for grouped data is
  OR Median = L + N/2-p.c.f x i
f
where
L is the lower class limit of the interval that contains the median,
n is the total number of measurements,
w is the class width,
f med is the frequency of the class containing the median, and
Σ f b is the sum of the frequencies for all classes before the median class.
Consider the information in Table 4.

As we already know, the median is located in class interval $11.00 to $15.99. So L = 11, n = 32,
w = 4.99, f med = 4, and Σ f b = 14.
Substituting into the formula:

 
Symmetric distribution
In a distribution displaying perfect symmetry, the mean, the median, and the mode are all at the
same point, as shown in Figure 1.
Figure 1.For a symmetric distribution, mean, median, and mode are equal.

Skewed curves
As you have seen, an outlier can significantly alter the mean of a series
of numbers, whereas the median will remain at the center of the series.
In such a case, the resulting curve drawn from the values will appear to
be skewed, tailing off rapidly to the left or right. In the case of negatively skewed or positively
skewed curves, the median remains in the center of these three measures.
Figure 2 shows a negatively skewed curve.
Figure 2.A negatively skewed distribution, mean < median < mode.

Figure 3 shows a positively skewed curve.


Figure 3.A positively skewed distribution, mode < median < mean.

Dispersion
Quartile
In descriptive statistics, the quartiles of a set of values are the three points that divide the data set
into four equal groups, each representing a fourth of the population being sampled.
In epidemiology, sociology and finance, the quartiles of a population are the four subpopulations
defined by classifying individuals according to whether the value concerned falls into one of the
four ranges defined by the three values discussed above. Thus an individual item might be
described as being "in the upper quartile". .

Definitions
first quartile (designated Q1) = lower quartile = splits lowest 25% of data = 25th percentile
second quartile (designated Q2) = median = cuts data set in half = 50th percentile
third quartile (designated Q3) = upper quartile = splits highest 25% of data, or lowest 75% =
75th percentile. .
The difference between the upper and lower quartiles is called the inter quartile range.
If a data set of values is arranged in ascending order of magnitude, then:

The inter quartile range is a more useful measure of spread than the range as it describes the
middle 50% of the data values. .

Computing methods .

There is no universal agreement on choosing the quartile values.


One standard formula for locating the position of the observation at a given percentile, y, with n
data points sorted in ascending order is:
Case 1: If L is a whole number, then the value will be found halfway between positions L and
L+1.
Case 2: If L is a fraction, round to the nearest whole number. (for example, L = 1.2 becomes 1).

Examples:

Method1
Use the median to divide the ordered data set into two halves. Do not include the median into the
halves, or the minimum and maximum.
The lower quartile value is the median of the lower half of the data. The upper quartile value is
the median of the upper half of the data.
This rule is employed by the TI-83 calculator boxplot and "1-Var Stats" functions.

Method2
Use the median to divide the ordered data set into two halves. If the median is a datum (as
opposed to being the average of the middle two data), include the median in both halves.
The lower quartile value is the median of the lower half of the data. The upper quartile value is
the median of the upper half of the data.
Dispersion
A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the
same and increases as the data become more diverse.
Most measures of dispersion have the same units as the quantity being measured. In other words,
if the measurements are in meters or seconds, so is the measure of dispersion. Such measures of
dispersion include:
Dispersion (Measures of):
Measures of dispersion express quantitatively the degree of variation or dispersion of values in a
population or in a sample. Along with measures of central tendency, measures of dispersion are
widely used in practice as descriptive statistics. Some measures of dispersion are the standard
deviation , the average deviation , the range , the inter quartile range .
For example, the dispersion in the sample of 5 values (98,99,100,101,102) is smaller than the
dispersion in the sample (80,90,100,110,120), although both samples have the same central
location - "100", as measured by, say, the mean or the median . Most measures of dispersion
would be 10 times greater for the second sample than for the first one (although the values
themselves may be different for different measures of dispersion).
1. Range:
The range of a data set is easy to calculate, but it is an insensitive measure of variation (does not
change with a change in the distribution of data), and is not very informative.
The range is the most obvious measure of dispersion and is the difference between the lowest
and highest values in a dataset. The range of a data set is easy to calculate, but it is an insensitive
measure of variation (does not change with a change in the distribution of data), and is not very
informative.
If, L1 = lowest measurement, L2 = highest measurement
Then Range = L2 - L1 (Largest value - Smallest value)
And Coefficient of range = L2 - L1/ L2 + L1
An example of the use of the range to compare spread within datasets is provided in table 1. The
scores of individual students in the examination and coursework component of a module are
shown.
 

To find the range in marks the highest and lowest values need to be found from the table. The
highest coursework mark was 48 and the lowest was 27 giving a range of 21. In the examination,
the highest mark was 45 and the lowest 12 producing a range of 33. This indicates that there was
wider variation in the students’ performance in the examination than in the coursework for this
module.
Since the range is based solely on the two most extreme values within the dataset, if one of these
is either exceptionally high or low (sometimes referred to as outlier) it will result in a range that
is not typical of the variability within the dataset.  For example, imagine in the above example
that one student failed to hand in any coursework and was awarded a mark of zero, however they
sat the exam and scored 40. The range for the coursework marks would now become 48 (48-0),
rather than 21, however the new range is not typical of the dataset as a whole and is distorted by
the outlier in the coursework marks.
Merit
The range is an adequate measure of variation for a small set of data, like class scores for a test.
Think of other measures where range might be useful: Salaries for a particular job category; or
Indoor versus outdoor temperatures?
An Exercise in Calculating the Range
In a previous example, we examined the net worth for 8 theoretical individuals. Here again are
the numbers:
$2,000 $10,000 $25,000 $32,000 $45,000 $50,000 $80,000 $23,000,000,000
Range = Largest value - Smallest value
Range = $23,000,000,000 - $2,000
Range = $22,999,998,000
This is obviously a very broad range, but it does not tell us much about the normal circumstances
of most of the members of our data group. A more informative way to describe these numbers
would be that they have a median net worth of $38,500 with a range of $2,000 to
$23,000,000,000.

Variance and Standard Deviation (To understand)


The Variance is defined as:
The average of the squared differences from the Mean.
Standard Deviation and Variance
Deviation just means how far from the normal
Standard Deviation
The Standard Deviation is a measure of how spread out numbers are.
Its symbol is σ (the greek letter sigma)
The formula is easy: it is the square root of the Variance. So now you ask, "What is the
Variance?"
Variance
The Variance is defined as:
The average of the squared differences from the Mean.
To calculate the variance follow these steps:
 Work out the Mean (the simple average of the numbers)
 Then for each number: subtract the Mean and square the result (the squared difference).
 Then work out the average of those squared differences.
Example
You and your friends have just measured the heights of your dogs (in millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation.
Your first step is to find the Mean:
Answer:
600 + 470 + 170 + 430 + 300 1970
Mean =  =   = 394
5 5
so the mean (average) height is 394 mm. Let's plot this on the chart:

Now, we calculate each dogs difference from the Mean:

To calculate the Variance, take each difference, square it, and then average the result:
So, the Variance is 21,704.
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation: σ = √21,704 = 147.32... = 147 (to the nearest mm)
 
And the good thing about the Standard Deviation is that it is useful. Now we can show which
heights are within one Standard Deviation (147mm) of the Mean:

So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what
is extra large or extra small.
Rottweilers are tall dogs. And Dachshunds are a bit short
Now try the Standard Deviation Calculator.
 But ... there is a small change with Sample Data
Our example was for a Population (the 5 dogs were the only dogs we were interested in).
But if the data is a Sample (a selection taken from a bigger Population), then the calculation
changes!
When you have "N" data values that are:
 The Population: divide by N when calculating Variance (like we did)
 A Sample: divide by N-1 when calculating Variance
All other calculations stay the same, including how we calculated the mean.
Example: if our 5 dogs were just a sample of a bigger population of dogs, we would divide by 4
instead of 5 like this:
Sample Variance = 108,520 / 4 = 27,130
Sample Standard Deviation = √27,130 = 164 (to the nearest mm)
The formulae for the variance and standard deviation are given below.  means the mean of the
data.
(x
Variance = 2 = 2 r n -
)
The standard deviation, , is the square root of the variance.
What the formula means:
(1) xr -   means take each value in turn and subtract the mean from each value.
(2) (Xr - )2  means square each of the results obtained from step (1). This is to get rid of any
minus signs.
(3)   (xr - )2  means add up all of the results obtained from step (2).
(4) Divide step (3) by n, which is the number of numbers
(5) For the standard deviation, square root the answer to step (4).
Example
Find the variance and standard deviation of the following numbers: 1, 3, 5, 5, 6, 7, 9, 10 .
The mean = 46/ 8 = 5.75
(Step 1): (1 - 5.75), (3 - 5.75), (5 - 5.75), (5 - 5.75), (6 - 5.75), (7 - 5.75), (9 - 5.75), (10 - 5.75)
= -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
(Step 2): 22.563, 7.563, 0.563, 0.563, 0.063, 1.563, 10.563, 18.063
(Step 3): 22.563 + 7.563 + 0.563 + 0.563 + 0.063 + 1.563 + 10.563 + 18.063
= 61.504
(Step 4): n = 8, therefore variance = 61.504/ 8 = 7.69 (3sf)
(Step 5): standard deviation = 2.77 (3sf)
Adding or Multiplying Data by a Constant
If a constant, k, is added to each number in a set of data, the mean will be increased by k and the
standard deviation will be unaltered (since the spread of the data will be unchanged).
If the data is multiplied by the constant k, the mean and standard deviation will both be
multiplied by k.
Grouped Data
There are many ways of writing the formula for the standard deviation. The one above is for a
basic list of numbers. The formula for the variance when the data is grouped is as follows. The
standard deviation can be found by taking the square root of this value.

Example: The table shows marks (out of 10) obtained by 20 people in a test
Mark (x) Frequency (f)
1 0
2 1
3 1
4 3
5 2
6 5
7 5
8 2
9 0
10 1
Work out the variance of this data.
In such questions, it is often easiest to set your working out in a table:

fx fx2
0 0
2 4
3 9
12 48
10 50
30 180
35 245
16 128
0 0
10 100
f=20
fx=118
Sfx2 = 764
variance=fx2-(fx)2
  f(f)2
 =764-(118)2
 20(20)2
 =  38.2 - 34.81 = 3.39
Quartiles
If we divide a cumulative frequency curve into quarters, the value at the lower quarter is referred
to as the lower quartile, the value at the middle gives the median and the value at the upper
quarter is the upper quartile.
A set of numbers may be as follows: 8, 14, 15, 16, 17, 18, 19, 50. The mean of these numbers is
19.625 . However, the extremes in this set (8 and 50) distort the range. The inter-quartile range is
a method of measuring the spread of the numbers by finding the middle 50% of the values.
It is useful since it ignore the extreme values. It is a method of measuring the spread of the data.
The lower quartile is (n+1)/4 th value (n is the cumulative frequency, i.e. 157 in this case) and
the upper quartile is the 3(n+1)/4 the value. The difference between these two is the inter-quartile
range (IQR).
In the above example, the upper quartile is the 118.5th value and the lower quartile is the 39.5th
value. If we draw a cumulative frequency curve, we see that the lower quartile, therefore, is
about 17 and the upper quartile is about 37. Therefore the IQR is 20 (bear in mind that this is a
rough sketch- if you plot the values on graph paper you will get a more accurate value).
2. Inter quartile range (IQR)
In descriptive statistics, the inter quartile range (IQR), also called the mid spread or middle
fifty, is a measure of statistical dispersion, being equal to the difference between the upper and
lower quartiles,
IQR = Q3 −  Q1.
In other words, the IQR is the 1st Quartile subtracted from the 3rd Quartile; these quartiles can
be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed
mid-range, and is the most significant basic scale. The inter quartile range has a breakdown point
of 25%, and is thus often preferred to the total range.
The IQR is used to build box plots, simple graphical representations of a probability distribution.
For a symmetric distribution (where the median equals the midline, the average of the first and
third quartiles), half the IQR equals the median absolute deviation (MAD).
The median is the corresponding measure of central tendency.
Data set in a table
i x[i] Quartile
1 102 For the data in this table the inter quartile range is IQR = 115 − 105 = 10.
2 104 Outliers are observations that fall below Q1 - 1.5(IQR) or above Q3 +
3 105 -Q1 1.5(IQR). In a box plot, the highest and lowest occurring value within
this limit are drawn as bar of the whiskers, and the outliers as individual
4 107
points.
5 108
-Q
6 109 2
(median)
7 110 Coefficient of Inter-Quartile Range:
8 112 Q3 −  Q1
9 115 -Q3 Q3+Q1
10 116
11 118 3. Semi-Inter –Quartile Range OR Quartile Deviation
It is based on the lower quartile and the upper quartile . The
difference is called the inter quartile range. The difference divided by
is called semi-inter-quartile range or the quartile deviation. Thus Q.D

Quartile Deviation (Q.D) .


The quartile deviation is a slightly better measure of absolute dispersion than the range. But it
ignores the observation on the tails. If we take difference samples from a population and
calculate their quartile deviations, their values are quite likely to be sufficiently different. This is
called sampling fluctuation. It is not a popular measure of dispersion. The quartile deviation
calculated from the sample data does not help us to draw any conclusion (inference) about the
quartile deviation in the population.

Coefficient of Quartile Deviation: .


A relative measure of dispersion based on the quartile deviation is called the coefficient of
quartile deviation.

Coefficient of Quartile Deviation .


Example:
The wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040, 1080, 1200, 1440,
1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600, 1470, 1750, and 1885. Find the
quartile deviation and coefficient of quartile deviation.

Solution:
After arranging the observations in ascending order, we get
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730, 1750,
1755, 1785, 1880, 1885, 1960. .

Quartile Deviation (Q.D)

Coefficient of Quartile Deviation


4. The Standard Deviation
In statistics and probability theory, the standard deviation (represented by the Greek letter
sigma, σ) shows how much variation or dispersion from the average exists.[1] A low standard
deviation indicates that the data points tend to be very close to the mean (also called expected
value); a high standard deviation indicates that the data points are spread out over a large range
of values.
The standard deviation is a measure that summarizes the amount by which every value within a
dataset varies from the mean. Effectively it indicates how tightly the values in the dataset are
bunched around the mean value. It is the most robust and widely used measure of dispersion
since, unlike the range and inter-quartile range; it takes into account every variable in the dataset.
When the values in a dataset are pretty tightly bunched together the standard deviation is small.
When the values are spread apart the standard deviation will be relatively large. The standard
deviation is usually presented in conjunction with the mean and is measured in the same units.
In many datasets the values deviate from the mean value due to chance and such datasets are said
to display a normal distribution. In a dataset with a normal distribution most of the values are
clustered around the mean while relatively few values tend to be extremely high or extremely
low. Many natural phenomena display a normal distribution.
For datasets that have a normal distribution the standard deviation can be used to determine the
proportion of values that lie within a particular range of the mean value. For such distributions it
is always the case that 68% of values are less than one standard deviation (1SD) away from the
mean value, that 95% of values are less than two standard deviations (2SD) away from the mean
and that 99% of values are less than three standard deviations (3SD) away from the mean. Figure
3 shows this concept in diagrammatical form.

If the mean of a dataset is 25 and its standard deviation is 1.6, then


 68% of the values in the dataset will lie between MEAN-1SD (25-1.6=23.4) and
MEAN+1SD (25+1.6=26.6)
 99% of the values will lie between MEAN-3SD (25-4.8=20.2) and MEAN+3SD
(25+4.8=29.8).
If the dataset had the same mean of 25 but a larger standard deviation (for example, 2.3) it would
indicate that the values were more dispersed. The frequency distribution for a dispersed dataset
would still show a normal distribution but when plotted on a graph the shape of the curve will be
flatter as in figure.
 
Population and sample standard deviations
There are two different calculations for the Standard Deviation. Which formula you use depends
upon whether the values in your dataset represent an entire population or whether they form a
sample of a larger population. For example, if all student users of the library were asked how
many books they had borrowed in the past month then the entire population has been studied
since all the students have been asked. In such cases the population standard deviation should be
used. Sometimes it is not possible to find information about an entire population and it might be
more realistic to ask a sample of 150 students about their library borrowing and use these results
to estimate library borrowing habits for the entire population of students. In such cases the
sample standard deviation should be used.
Formulae for the standard deviation
Whilst it is not necessary to learn the formula for calculating the standard deviation, there may
be times when you wish to include it in a report or dissertation.
The standard deviation of an entire population is known as σ (sigma) and is calculated using:

Where x represents each value in the population, μ is the mean value of the population, Σ is the
summation (or total), and N is the number of values in the population.
The standard deviation of a sample is known as S and is calculated using:

Where x represents each value in the population, x is the mean value of the sample, Σ is the
summation (or total), and n-1 is the number of values in the sample minus 1.
For a finite set of numbers, the standard deviation is found by taking the square root of the
average of the squared differences of the values from their average value. For example, consider
a population consisting of the following eight values:
These eight data points have the mean (average) of 5:

First, calculate the difference of each data point from the mean, and square the result of each:
Next, calculate the mean of these values, and take the square root:

*Footnote: Why square the differences?


If we just added up the differences from the mean ... the negatives would cancel the positives:

4+4-4-4
=0
4

So that won't work. How about we use absolute values?

|4| + |4| + |-4| + |-4| 4+4+4+4


 =  =4
4 4

That looks good (and is the Mean Deviation), but what about this case:

|7| + |1| + |-6| + |-2| 7+1+6+2


 =  =4
4 4

Oh No! It also gives a value of 4, Even though the differences are more spread out!
So let us try squaring each difference (and taking the square root at the end):

42 + 42 + 42 + 42 64
√ =√ =4
4 4

72 + 12 + 62 + 22 90
√ =√ = 4.74...
4 4

That is nice! The Standard Deviation is bigger when the differences are more spread out ... just
what we want!
In fact this method is a similar idea to distance between points, just applied in a different way.
And it is easier to use algebra on squares and square roots than absolute values, which makes the
standard deviation easy to use in other areas of mathematics.
Return to Top
 Pie Chart:
A pie chart (or a circle graph) is a circular chart divided into sectors, illustrating proportion. In
a pie chart, the arc length of each sector (and consequently its central angle and area), is
proportional to the quantity it represents. When angles are measured with 1 turn as unit then a
number of percent is identified with the same number of centiturns. Together, the sectors create a
full disk. It is named for its resemblance to a pie which has been sliced. The earliest known pie
chart is generally credited to William Playfair's Statistical Breviary of 1801.[1][2]
The pie chart is perhaps the most widely used statistical chart in the business world and the mass
media.[3] However, it has been criticized,[4] and some recommend avoiding it,[5][6][7][8] pointing out
in particular that it is difficult to compare different sections of a given pie chart, or to compare
data across different pie charts. Pie charts can be an effective way of displaying information in
some cases, in particular if the intent is to compare the size of a slice with the whole pie, rather
than comparing the slices among them.[1] Pie charts work particularly well when the slices
represent 25 to 50% of the data,[9] but in general, other plots such as the bar chart or the dot plot,
or non-graphical methods such as tables, may be more adapted for representing certain
information. It also shows the frequency within certain groups of information.
Example

A pie chart for the example data.


The following example chart is based on preliminary results of the election for the European
Parliament in 2004. The table lists the number of seats allocated to each party group, along with
the derived percentage of the total that they each make up. The values in the last column, the
derived central angle of each sector, are found by multiplying the percentage by 360°.
Group Seats Percent (%) Central angle (°)
EUL 39 5.3 19.2
PES 200 27.3 98.4
EFA 42 5.7 20.7
EDD 15 2.0 7.4
ELDR 67 9.2 33.0
EPP 276 37.7 135.7
UEN 27 3.7 13.3
Other 66 9.0 32.5
Total 732 99.9* 360.2*
*Because of rounding, these totals do not add up to 100 and 360.
The size of each central angle is proportional to the size of the corresponding quantity, here the
number of seats. Since the sum of the central angles has to be 360°, the central angle for a
quantity that is a fraction Q of the total is 360Q degrees. In the example, the central angle for the
largest group (European People's Party (EPP)) is 135.7° because 0.377 times 360, rounded to one
decimal place(s), equals 135.7.
Use, effectiveness and visual perception

Three sets of data plotted using pie charts and bar charts.
Pie charts are common in journalism. However statisticians generally regard pie charts as a poor
method of displaying information, and they are uncommon in scientific literature. One reason is
that it is more difficult for comparisons to be made between the size of items in a chart when
area is used instead of length and when different items are shown as different shapes.
Further, in research performed at AT&T Bell Laboratories, it was shown that comparison by
angle was less accurate than comparison by length. This can be illustrated with the diagram to
the right, showing three pie charts, and, below each of them, the corresponding bar chart
representing the same data. Most subjects have difficulty ordering the slices in the pie chart by
size; when the bar chart is used the comparison is much easier. Similarly, comparisons between
data sets are easier using the bar chart. However, if the goal is to compare a given category (a
slice of the pie) with the total (the whole pie) in a single chart and the multiple is close to 25 or
50 percent, then a pie chart can often be more effective than a bar graph. However, the research
of Spence and Lewandowsky did not find pie charts to be inferior. Participants were able to
estimate values with pie charts just as well as with other presentation forms.
Variants and similar charts
Exploded pie chart

An exploded pie chart for the example data, with the largest party group exploded.
A chart with one or more sectors separated from the rest of the disk is known as an exploded pie
chart. This effect is used to either highlight a sector, or to highlight smaller segments of the chart
with small proportions.
Bar Charts
Bar charts compare distinct items or show single items at distinct intervals. Usually, a bar chart is
laid out with categories along the vertical axis and values along the horizontal axis. In other
words, the bars are horizontally placed on the page. Bar charts are useful for comparing data
items that are in competition, so it makes sense to place the longest bars on top and the others in
descending order beneath the longest one.
A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values
that they represent. The bars can be plotted vertically or horizontally.
Bar charts are used for marking clear data which has discrete values. Some examples of
discontinuous data include 'shoe size' or 'eye color', for which a bar chart is appropriate. In
contrast, some examples of continuous data would be 'height' or 'weight'. A bar chart is very
useful for recording certain information whether it is continuous or not continuous data. Bar
charts also look a lot like a histogram. They are often mistaken for each other .The first bar graph
appeared in the 1786 book The Commercial and Political Atlas, by William Playfair (1759-
1823). Playfair was a pioneer in the use of graphical displays and wrote extensively about them.

Example of a bar chart, with 'Country' as the discrete data set.


Central Tendency
In statistics, the term central tendency relates to the way in which quantitative data tend to
cluster around some value.[1] A measure of central tendency is any of a number of ways of
specifying this "central value". In practical statistical analyses, the terms are often used before
one has chosen even a preliminary form of analysis: thus an initial objective might be to "choose
an appropriate measure of central tendency".
In the simplest cases, the measure of central tendency is an average of a set of measurements, the
word average being variously construed as mean, median, or other measure of location,
depending on the context. However, the term is applied to multidimensional data as well as to
uni-variate data and in situations where a transformation of the data values for some or all
dimensions would usually be considered necessary: in the latter cases, the notion of a "central
location" is retained in converting an "average" computed for the transformed data back to the
original units. In addition, there are several different kinds of calculations for central tendency,
where the kind of calculation depends on the type of data (level of measurement).
Both "central tendency" and "measure of central tendency" apply to either statistical populations
or to samples from a population.
Basic measures of central tendency
The following may be applied to individual dimensions of multidimensional data, after
transformation, although some of these involve their own implicit transformation of the data.
 Arithmetic mean – the sum of all measurements divided by the number of observations in
the data set
 Median – the middle value that separates the higher half from the lower half of the data
set
 Mode – the most frequent value in the data set
 Geometric mean – the nth root of the product of the data values
 Harmonic mean – the reciprocal of the arithmetic mean of the reciprocals of the data
values
 Weighted mean – an arithmetic mean that incorporates weighting to certain data elements
 Midrange – the arithmetic mean of the maximum and minimum values of a data set.
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics. The
mean (often called the average) is most likely the measure of central tendency that you are most
familiar with, but there are others, such as, the median and the mode.
The mean, median and mode are all valid measures of central tendency but, under different
conditions, some measures of central tendency become more appropriate to use than others. In
the following sections we will look at the mean, mode and median and learn how to calculate
them and under what conditions they are most appropriate to be used.
Mean (Arithmetic)
The mean (or average) is the most popular and well known measure of central tendency. It can
be used with both discrete and continuous data, although its use is most often with continuous
data (see our Types of Variable guide for data types). The mean is equal to the sum of all the
values in the data set divided by the number of values in the data set. So, if we have n values in a
data set and they have values x1, x2, ..., xn, then the sample mean, usually denoted by
(pronounced x bar), is:

This formula is usually written in a slightly different manner using the Greek capitol letter, ,
pronounced "sigma", which means "sum of...":

You may have noticed that the above formula refers to the sample mean. So, why call have we
called it a sample mean? This is because, in statistics, samples and populations have very
different meanings and these differences are very important, even if, in the case of the mean,
they are calculated in the same way. To acknowledge that we are calculating the population
mean and not the sample mean, we use the Greek lower case letter "mu", denoted as µ:
The mean is essentially a model of your data set. It is the value that is most common. You will
notice, however, that the mean is not often one of the actual values that you have observed in
your data set. However, one of its important properties is that it minimises error in the prediction
of any one value in your data set. That is, it is the value that produces the lowest amount of error
from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part of the
calculation. In addition, the mean is the only measure of central tendency where the sum of the
deviations of each value from the mean is always zero.
When not to use the mean
The mean has one main disadvantage: it is particularly susceptible to the influence of outliers.
These are values that are unusual compared to the rest of the data set by being especially small or
large in numerical value. For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this
mean value might not be the best way to accurately reflect the typical salary of a worker, as most
workers have salaries in the $12k to 18k range. The mean is being skewed by the two large
salaries. Therefore, in this situation we would like to have a better measure of central tendency.
As we will find out later, taking the median would be a better measure of central tendency in this
situation.
Another time when we usually prefer the median over the mean (or mode) is when our data is
skewed (i.e. the frequency distribution for our data is skewed). If we consider the normal
distribution - as this is the most frequently assessed in statistics - when the data is perfectly
normal then the mean, median and mode are identical. Moreover, they all represent the most
typical value in the data set. However, as the data becomes skewed the mean loses its ability to
provide the best central location for the data as the skewed data is dragging it away from the
typical value. However, the median best retains this position and is not as strongly influenced by
the skewed values. This is explained in more detail in the skewed distribution section later in this
guide.
Median
The median is the middle score for a set of data that has been arranged in order of magnitude.
The median is less affected by outliers and skewed data. In order to calculate the median,
suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case 56 (highlighted in bold). It is the middle mark
because there are 5 scores before it and 5 scores after it. This works fine when you have an odd
number of scores but what happens when you have an even number of scores? What if you had
only 10 scores? Well, you simply have to take the middle two scores and average the result. So,
if we look at the example below:
65 55 89 56 35 14 56 55 87 45
We again rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Only now we have to take the 5th and 6th score in our data set and average them to get a median
of 55.5.
Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest bar
in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most
popular option. An example of a mode is presented below:

Normally, the mode is used for categorical data where we wish to know which is the most
common category as illustrated below:
We can see above that the most common form of transport, in this particular data set, is the bus.
However, one of the problems with the mode is that it is not unique, so it leaves us with
problems when we have two or more values that share the highest frequency, such as below:

We are now stuck as to which mode best describes the central tendency of the data. This is
particularly problematic when we have continuous data, as we are more likely not to have any
one value that is more frequent than the other. For example, consider measuring 30 peoples'
weight (to the nearest 0.1 kg). How likely is it that we will find two or more people with exactly
the same weight, e.g. 67.4 kg? The answer, is probably very unlikely - many people might be
close but with such a small sample (30 people) and a large range of possible weights you are
unlikely to find two people with exactly the same weight, that is, to the nearest 0.1 kg. This is
why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure of central
tendency when the most common mark is far away from the rest of the data in the data set, as
depicted in the diagram below:

In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is
not representative of the data, which is mostly concentrated around the 20 to 30 value range. To
use the mode to describe the central tendency of this data set would be misleading.
Skewed Distributions and the Mean and Median
We often test whether our data is normally distributed as this is a common assumption
underlying many statistical tests. An example of a normally distributed set of data is presented
below:
When you have a normally distributed sample you can legitimately use both the mean or the
median as your measure of central tendency. In fact, in any symmetrical distribution the mean,
median and mode are equal. However, in this situation, the mean is widely preferred as the best
measure of central tendency as it is the measure that includes all the values in the data set for its
calculation, and any change in any of the scores will affect the value of the mean. This is not the
case with the median or mode.
However, when our data is skewed, for example, as with the right-skewed data set below:

we find that the mean is being dragged in the direct of the skew. In these situations, the median is
generally considered to be the best representative of the central location of the data. The more
skewed the distribution the greater the difference between the median and mean, and the greater
emphasis should be placed on using the median as opposed to the mean. A classic example of the
above right-skewed distribution is income (salary), where higher-earners provide a false
representation of the typical income if expressed as a mean and not a median.
If dealing with a normal distribution and tests of normality show that the data is non-normal,
then it is customary to use the median instead of the mean. This is more a rule of thumb than a
strict guideline however. Sometimes, researchers wish to report the mean of a skewed
distribution if the median and mean are not appreciably different (a subjective assessment) and if
it allows easier comparisons to previous research to be made.
Summary of when to use the mean, median and mode
Please use the following summary table to know what the best measure of central tendency is
with respect to the different types of variable.
Type of Variable Best measure of central tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median

Contents of Research Report


The researcher must keep in mind that his research report must contain following aspects:
1. Purpose of study
2. Significance of his study or statement of the problem
3. Review of literature
4. Methodology
5. Interpretation of data
6. Conclusions and suggestions
7. Bibliography
8. Appendices
These can be discussed in detail as under:
(1) Purpose of study:
Research is one direction oriented study. He should discuss the problem of his study. He must
give background of the problem. He must lay down his hypothesis of the study. Hypothesis is the
statement indicating the nature of the problem. He should be able to collect data, analyze it and
prove the hypothesis. The importance of the problem for the advancement of knowledge or
removed of some evil may also be explained. He must use review of literature or the data from
secondary source for explaining the statement of the problems.
(2) Significance of study:
Research is research and hence the researcher may highlight the earlier research in new manner
or establish new theory. He must refer earlier research work and distinguish his own research
from earlier work. He must explain how his research is different and how his research topic is
different and how his research topic is important. In a statement of his problem, he must be able
to explain in brief the historical account of the topic and way in which he can make and attempt.
In his study to conduct the research on his topic.
(3) Review of Literature:
Research is a continuous process. He cannot avoid earlier research work. He must start with
earlier work. He should note down all such research work, published in books, journals or
unpublished thesis. He will get guidelines for his research from taking a review of literature. He
should collect information in respect of earlier research work. He should enlist them in the given
below:
1. Author/researcher
2. Title of research /Name of book
3. Publisher
4. Year of publication
5. Objectives of his study
6. Conclusion/suggestions
Then he can compare this information with his study to show separate identity of his study. He
must be honest to point out similarities and differences of his study from earlier research work.
(4) Methodology:
It is related to collection of data. There are two sources for collecting data; primary and
secondary. Primary data is original and collected in field work, either through questionnaire
interviews. The secondary data relied on library work. Such primary data are collected by
sampling method. The procedure for selecting the sample must be mentioned. The methodology
must give various aspects of the problem that are studied for valid generalization about the
phenomena. The scales of measurement must be explained along with different concepts used in
the study.
While conducting a research based on field work, the procedural things like definition of
universe, preparation of source list must be given. We use case study method, historical research
etc. He must make it clear as to which method is used in his research work. When questionnaire
is prepared, a copy of it must be given in appendix.
(5) Interpretation of data:
Mainly the data collected from primary source need to be interpreted in systematic manner. The
tabulation must be completed to draw conclusions. All the questions are not useful for report
writing. One has to select them or club them according to hypothesis or objectives of study.
(6) Conclusions/suggestions:
Data analysis forms the crux of the problem. The information collected in field work is useful to
draw conclusions of study. In relation with the objectives of study the analysis of data may lead
the researcher to pin point his suggestions. This is the most important part of study. The
conclusions must be based on logical and statistical reasoning. The report should contain not
only the generalization of inference but also the basis on which the inferences are drawn. All
sorts of proofs, numerical and logical, must be given in support of any theory that has been
advanced. He should point out the limitations of his study.
(7) Bibliography:
The list of references must be arranged in alphabetical order and be presented in appendix. The
books should be given in first section and articles are in second section and research projects in
the third. The pattern of bibliography is considered convenient and satisfactory from the point of
view of reader.
(8) Appendices:
The general information in tabular form which is not directly used in the analysis of data but
which is useful to understand the background of study can be given in Appendices.
Layout of the Research Report
There is scientific method for the layout of the research report. The layout of the report means as
to what the research report should contain. The contents of the research report are noted below:
1. Preliminary Page
2. Main Text
3. End Matter
(1) Preliminary Pages:
These must be title of the research topic and data. There must be preface of foreword to the
research work. It should be followed by table of contents. The list of tables, maps should be
given.
(2) Main Text:
It provides the complete outline of research report along with all details. The title page is
reported in the main text. Details of text are given continuously as divided in different chapters.
 (a)    Introduction
 (b)   Statement of the problem
 (c)    The analysis of data
 (d)   The implications drawn from the results
 (e)    The summary
(a)    Introduction:
Its purpose is to introduce the research topic to readers. It must cover statement of the problem,
hypotheses, objectives of study, review of literature, and the methodology to cover primary and
secondary data, limitations of study and chapter scheme. Some may give in brief in the first
chapter the introduction of the research project highlighting the importance of study. This is
followed by research methodology in separate chapter.
The methodology should point out the method of study, the research design and method of data
collection.
(b)   Statement of the problem:
This is crux of his research. It highlights main theme of his study. It must be in nontechnical
language. It should be in simple manner so ordinary reader may follow it. The social research
must be made available to common man. The research in agricultural problems must be easy for
farmers to read it.
(c)    Analysis of data:
Data so collected should be presented in systematic manner and with its help, conclusions can be
drawn. This helps to test the hypothesis. Data analysis must be made to confirm the objectives of
the study.
(d)   Implications of Data:
The results based on the analysis of data must be valid. This is the main body of research. It
contains statistical summaries and analysis of data. There should be logical sequence in the
analysis of data. The primary data may lead to establish the results. He must have separate
chapter on conclusions and recommendations. The conclusions must be based on data analysis.
The conclusions must be such which may lead to generalization and its applicability in similar
circumstances. The conditions of research work limiting its scope for generalization must be
made clear by the researcher.
(e)    Summary:
This is conclusive part of study. It makes the reader to understand by reading summary the
knowledge of the research work. This is also a synopsis of study.
(3) End Matter:
It covers relevant appendices covering general information, the concepts and bibliography. The
index may also be added to the report.

You might also like