0% found this document useful (0 votes)
6 views36 pages

BRM Notes

The document provides an overview of marketing research, detailing its systematic process for identifying and solving marketing problems. It covers various research methodologies, including problem-identification and problem-solving research, as well as the roles of internal and external suppliers in the marketing research industry. Additionally, it discusses qualitative and quantitative research designs, data collection methods, and the importance of defining research problems accurately.

Uploaded by

akashb1998
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views36 pages

BRM Notes

The document provides an overview of marketing research, detailing its systematic process for identifying and solving marketing problems. It covers various research methodologies, including problem-identification and problem-solving research, as well as the roles of internal and external suppliers in the marketing research industry. Additionally, it discusses qualitative and quantitative research designs, data collection methods, and the importance of defining research problems accurately.

Uploaded by

akashb1998
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

lCHAPTER 1: Intro to Mkt research

Marketing research is the systematic and objective identification, collection, analysis, dissemination, and
use of information for the purpose of improving decision making related to the identification and solution of
problems and opportunities in marketing.
Problem-identification research is undertaken to help identify problems that are, perhaps, not apparent on
the surface and yet exist or are likely to arise in the future

Once a problem or opportunity has been identified, problem-solving research is undertaken to arrive at a
solution.
MARKETING RESEARCH PROCESS
1. Problem definition
2. Development of an approach to the problem
3. Research Design formulation
a. Definition of the information needed
b. Secondary data analysis
c. Qualitative research
d. Methods of collecting quantitative data (survey, observation, and experimentation)
e. Measurement and scaling procedures
f. Questionnaire design
g. Sampling process and sample size
h. Plan of data analysis.
4. Data Collection
5. Data Prep & Analysis
6. Report Preparation
THE MARKETING RESEARCH INDUSTRY
Competitive intelligence (CI) may be defined as the process of enhancing marketplace competitiveness
through a greater understanding of a firm’s competitors and the competitive environment.
An internal supplier is a marketing research department within the firm.
External suppliers are outside firms hired to supply marketing research data.
Full-service suppliers offer the entire range of marketing research services, from problem definition,
approach development, questionnaire design, sampling, data collection, data analysis, and interpretation, to
report preparation and presentation
Syndicated services collect information of known commercial value that they provide to multiple clients on
a subscription basis. (This is pooling of data)
Customized services offer a wide variety of marketing research services customized to suit a client’s
specific needs
Internet services are offered by several marketing research firms, including some that have specialized in
conducting marketing research on the Internet.
Limited-service suppliers specialize in one or a few phases of the marketing research project. Services
offered by such suppliers are classified as field services, focus groups and qualitative research, technical and
analytical services, and other services.
Field services collect data through mail, personal, telephone, or electronic interviewing, and firms that
specialize in interviewing are called field service organizations.
Focus groups and qualitative services provide facilities and recruitment of respondents for focus groups
and other forms of qualitative research such as one-on-one depth interviews
Technical and analytical services are offered by firms that specialize in design issues and computer
analysis of quantitative data, such as those obtained in large surveys

A marketing information system (MIS) is a formalized set of procedures for generating, analyzing,
storing, and distributing information to marketing decision makers on an ongoing basis
Decision support systems (DSS) are integrated systems including hardware, communications network,
database, model base, software base, and the DSS user (decision maker) that collect and interpret
information for decision making.
CHAPTER 2 – Defining the marketing research problem & developing an approach
Problem definition involves stating the general problem and identifying the specific components of the
marketing research problem
PROBLEM DEFINITION PROCESS

The problem audit, like any other type of audit, is a comprehensive examination of a marketing problem
with the purpose of understanding its origin and nature.
Interaction between decision maker and researcher should be characterized by
1. Communication
2. Cooperation
3. Confidence (Mutual Trust)
4. Candor (No hidden agendas, openness)
5. Closeness
6. Continuity (Interact regularly, not sporadically)
7. Creativity (Not a formulaic interaction)
Expert interviews can be used to formulate the marketing research problem. If experts are generalized to
include knowledgeable individuals, it is known as an experience survey or key-informant technique.
Lead-User Survey obtains information from lead-users, individuals who work intensively in a field.
Secondary data are data collected for some purpose other than the problem at hand.
Primary data are originated by the researcher for the specific purpose of addressing the research problem
Qualitative research is unstructured, exploratory in nature, based on small samples, and may utilize
popular qualitative techniques such as focus groups (group interviews), word association (asking
respondents to indicate their first responses to stimulus words), and depth interviews (one-on-one interviews
that probe the respondents’ thoughts in detail)
Pilot surveys tend to be less structured than large-scale surveys in that they generally contain more open-
ended questions, and the sample size is much smaller. Case studies involve an intensive examination of a
few selected cases of the phenomenon of interest.
ENVIRONMENTAL CONTEXT

A conceptual map involves the following three components:


Management wants to (take an action). Therefore, we should study (topic). So that we can explain
(question).

PROBLEM STATEMENT DEFINITION


Researchers make two common errors in problem definition. The first arises when the research problem is
defined too broadly. The second type of error is just the opposite: The marketing research problem is defined
too narrowly
The broad statement provides perspective on the problem and acts as a safeguard against committing the
second type of error. The specific components focus on the key aspects of the problem and provide clear
guidelines on how to proceed further, thereby reducing the likelihood of the first type of error.
ANALYTICAL MODEL
An analytical model is a set of variables and their interrelationships designed to represent, in whole or in
part, some real system or process.
In verbal models, the variables and their relationships are stated in prose form.
Graphical models are visual. They are used to isolate variables and to suggest directions of relationships
but are not designed to provide numerical results.
Mathematical models explicitly specify the relationships among variables, usually in equation form.
CHAPTER 3 – Research Design
A research design is a framework or blueprint for conducting the marketing research project.
Exploratory research is used in cases when you must define the problem more precisely, identify relevant
courses of action, or gain additional insights before an approach can be developed
Conclusive research is typically more formal and structured than exploratory research. It is based on large,
representative samples, and the data obtained are subjected to quantitative analysis
Causal research is used to obtain evidence of cause-and-effect (causal) relationships
SAMPLE SURVEY RESEARCH DESIGNS

Cross-sectional designs involve the collection of information from any given sample of population
elements only once. In single cross-sectional designs, only one sample of respondents is drawn from the
target population, and information is obtained from this sample only once. In multiple cross-sectional
designs, there are two or more samples of respondents, and information from each sample is obtained only
once.
Cohort analysis consists of a series of surveys conducted at appropriate time intervals, where the cohort
serves as the basic unit of analysis. A cohort is a group of respondents who experience the same event within
the same time interval
In longitudinal designs, a fixed sample (or samples) of population elements is measured repeatedly on the
same variables. A longitudinal design differs from a cross-sectional design in that the sample or samples
remain the same over time
Disadvantages of Panels:
1. Refusal to cooperate: Many individuals or households do not wish to be bothered with the panel operation
and refuse to participate. Consumer panels requiring members to keep a record of purchases have a
cooperation rate of 60 percent or less.
2. Mortality: Panel members who agree to participate may subsequently drop out because they move away
or lose interest. Mortality rates can be as high as 20 percent per year.
3. Payment: Payment may cause certain types of people to be attracted, making the group unrepresentative
of the population.
POSSIBLE ERROR SOURCES
Total error is the variation between the true mean value in the population of the variable of interest and the
observed mean value obtained in the marketing research project.
Random sampling error occurs because the sample selected is an imperfect representation of the
population
Nonsampling errors can be attributed to sources other than sampling, and they may be random or
nonrandom
A useful approach for managing a project is the critical path method (CPM). An advanced version of CPM
is the program evaluation and review technique (PERT), which is a probability-based scheduling
approach that recognizes and measures the uncertainty of the project completion times.23 An even more
advanced scheduling technique is the graphical evaluation and review technique (GERT), in which both
the completion probabilities and the activity costs can be built into a network representation
SUMMARY
CHAPTER 4 – Exploratory Research Design
Primary data are originated by a researcher for the specific purpose of addressing the problem at hand
Secondary data are data that have already been collected for purposes other than the problem at hand.

Criteria for Evaluating Secondary data: (Anagram: SECOND)


1. Specifications: Methodology Used to Collect the Data
a. size and nature of the sample, response rate and quality, questionnaire design and
administration, procedures used for fieldwork, and data analysis and reporting procedures
2. Error: Accuracy of the Data
a. Secondary data can have several sources of error, or inaccuracy, including errors in the
approach, research design, sampling, data collection, analysis, and reporting stages of the
project.
3. Currency: When the Data Were Collected
a. Secondary data may not be current, and the time lag between data collection and publication
may be long, as is the case with much census data. Marketing research requires current data;
therefore, the value of secondary data is diminished as they become dated.
4. Objective: The Purpose for Which the Data Were Collected
a. Data collected with a specific objective in mind may not be appropriate in another situation
5. Nature: The Content of the Data
a. If the key variables have not been defined or are defined in a manner inconsistent with the
researcher’s definition, then the usefulness of the data is limited.
6. Dependability: How Dependable Are the Data?
a. An overall indication of the dependability of data may be obtained by examining the
expertise, credibility, reputation, and trustworthiness of the source.
Database marketing involves the use of computers to capture and track customer profiles
Online databases consist of a central data bank, which is accessed with a computer via a telecom network.
Internet databases can be accessed, searched, and analyzed on the Internet. It is also possible to download
data from the Internet and store it in the computer or an auxiliary storage device.
Offline databases make the information available on diskettes and CD-ROM disks.
Bibliographic databases are composed of citations to articles in journals, magazines, newspapers,
marketing research studies, technical reports, government documents, and the like. Numeric databases
contain numerical and statistical information, such as survey and timeseries data. Full-text databases
contain the complete text of the source documents comprising the database. Directory databases provide
information on individuals, organizations, and services
Syndicated services, also referred to as syndicated sources, are companies that collect and sell common
pools of data of known commercial value, designed to serve information needs shared by a number of
clients. These data are not collected for the purpose of marketing research problems specific to individual
clients, but the data and reports supplied to client companies can be personalized to fit particular needs.
Surveys involve interviews with a large number of respondents using a predesigned questionnaire
Syndicated panel surveys measure the same group of respondents over time but not necessarily on the
same variable
Psychographics refer to the psychological profiles of individuals and to psychologically based measures of
lifestyle. Lifestyles refer to the distinctive modes of living of a society or some of its segments
AIOs: Activities, Interests and Opinions
In purchase panels, respondents record their purchases of a variety of different products
In media panels, electronic devices automatically record viewing behavior, thus supplementing a diary or
an online panel
Scanner data reflect some of the latest technological developments in the marketing research industry.
Scanner data are collected by passing merchandise over a laser scanner, which optically reads the bar-coded
description (the universal product code or UPC) printed on the merchandise
Volume tracking data provide information on purchases by brand, size, price, and flavor or formulation,
based on sales data collected from the checkout scanner tapes.
An audit is a formal examination and verification of product movement traditionally carried out by auditors
who make in-person visits to retail and wholesale outlets and examine physical records or analyze inventory.
Industry services provide syndicated data about industrial firms, businesses, and other institutions
Computer mapping combines geography with demographic information and a company’s sales data or
other proprietary information to develop thematic map
SUMMARY:
CHAPTER 5 – Qualitative Research
Qualitative research provides insights and understanding of the problem setting, while quantitative
research seeks to quantify the data and, typically, applies some form of statistical analysis.

In direct procedure, the interviewee knows the purpose of the study. This is not the case in the indirect
procedure
A focus group is an interview conducted by a trained moderator in a nonstructured and natural manner with
a small group of respondents. The moderator leads the discussion
Depth interviews are an unstructured and direct way of obtaining information, but unlike focus groups,
depth interviews are conducted on a one-on-one basis.

TECHNIQUES:
In laddering, the line of questioning proceeds from product characteristics to user characteristics. This
technique allows the researcher to tap into the consumer’s network of meanings.
In hidden issue questioning, the focus is not on socially shared values but rather on personal “sore spots”;
not on general lifestyles but on deeply felt personal concerns.
Symbolic analysis attempts to analyze the symbolic meaning of objects by comparing them with their
opposites. To learn what something is, the researcher attempts to learn what it is not

Grounded theory uses an inductive and more structured approach in which each subsequent depth
interview is adjusted based on the cumulative findings from previous depth interviews with the purpose of
developing general concepts or theories.
In a protocol interview, a respondent is placed in a decision-making situation and asked to verbalize the
process and the activities that he or she would undertake to make the decision

PROJECTIVE TECHNIQUES
A projective technique is an unstructured, indirect form of questioning that encourages respondents to
project their underlying motivations, beliefs, attitudes, or feelings regarding the issues of concern.
In association techniques, an individual is presented with a stimulus and asked to respond with the first
thing that comes to mind. Word association is the best known of these techniques. In word association,
respondents are presented with a list of words, one at a time, and asked to respond to each with the first
word that comes to mind.
In completion techniques, the respondent is asked to complete an incomplete stimulus situation. Sentence
completion is similar to word association. Respondents are given incomplete sentences and asked to
complete them. In story completion, respondents are given part of a story—enough to direct attention to a
particular topic but not to hint at the ending. They are required to give the conclusion in their own words.
Construction techniques require the respondent to construct a response in the form of a story, dialogue, or
description. Picture response techniques can be traced to the Thematic Apperception Test (TAT), which
consists of a series of pictures of ordinary as well as unusual events. In some of these pictures, the persons or
objects are clearly depicted, while in others they are relatively vague. The respondent is asked to tell stories
about these pictures. In cartoon tests, cartoon characters are shown in a specific situation related to the
problem. The respondents are asked to indicate what one cartoon character might say in response to the
comments of another character
In expressive techniques, respondents are presented with a verbal or visual situation and asked to relate the
feelings and attitudes of other people to the situation. The respondents express not their own feelings or
attitudes, but those of others. See: Role-playing. In third-person technique, the respondent is presented
with a verbal or visual situation and asked to relate the beliefs and attitudes of a third person rather than
directly expressing personal beliefs and attitudes. This third person may be a friend, neighbor, colleague, or
a “typical” person
There are three general steps that should be followed when analyzing qualitative data.
1. Data reduction. In this step, the researcher chooses which aspects of the data are emphasized, minimized,
or set aside for the project at hand.
2. Data display. In this step, the researcher develops a visual interpretation of the data with the use of such
tools as a diagram, chart, or matrix. The display helps to illuminate patterns and interrelationships in data.
3. Conclusion drawing and verification. In this step, the researcher considers the meaning of analyzed data
and assesses its implications for the research question at hand
SUMMARY
CHAPTER 6 – Descriptive Research Design
The survey method of obtaining information is based on the questioning of respondents. Respondents are
asked a variety of questions regarding their behavior, intentions, attitudes, awareness, motivations, and
demographic and lifestyle characteristics
In structured data collection, a formal questionnaire is prepared and the questions are asked in a
prearranged order; thus the process is also direct. (Note: Direct as previously described, means the
interviewee knows what’s going on). Structured here refers to the degree of standardization imposed on the
data collection process
Fixed-alternative questions require the respondent to select from a predetermined set of responses

Eg:

A mail panel consists of a large, nationally representative sample of households that have agreed to
participate in periodic mail questionnaires and product tests
EVALUATING SURVEY METHODS
1. Task Factors:
a. Diversity of questions and flexibility
b. Use of Physical Stimuli
c. Sample Control - Sample control is the ability of the survey mode to reach the units specified
in the sample effectively and efficiently.
d. Quantity of data
e. Response Rate (Also see: Nonresponse Bias)
2. Situational Factors:
a. Control of the data collection environment
b. Control of field force (Personnel conducting the interview/survey)
c. Potential for interviewer bias
d. Speed
e. Cost
3. Respondent factors
a. Perceived Anonymity: Perceived anonymity refers to the respondents’ perceptions that the
interviewer or the researcher will not discern their identities
b. Social Desirability/Sensitive information: Social desirability is the tendency of the
respondents to give answers that are socially acceptable, whether or not they are true.
c. Low Incidence Rate: Incidence rate refers to the % of persons eligible to participate in the
study
d. Respondent Control: Control over when to answer the survey, and the flexibility to even
answer it in parts at different times, especially if the survey is long.
In structured observation, the researcher specifies in detail what is to be observed and how the
measurements are to be recorded
In unstructured observation, the observer monitors all aspects of the phenomenon that seem relevant to the
problem at hand
Natural observation involves observing behavior as it takes place in the environment. For example, one
could observe the behavior of respondents eating fast food at Burger King. In contrived observation,
respondents’ behavior is observed in an artificial environment, such as a test kitchen set up in a mall.
MODE OF ADMINISTRATION

In mechanical observation, mechanical devices, rather than human observers, record the phenomenon
being observed (Think of tech being used, so heart rate monitoring, neural response monitoring, etc).
Response latency is the time a respondent takes before answering a question. It is used as a measure of the
relative preference for various alternatives.
Content analysis is an appropriate method when the phenomenon to be observed is communication, rather
than behavior or physical objects. It is defined as the objective, systematic, and quantitative description of
the manifest content of a communication
In trace analysis, data collection is based on physical traces, or evidence, of past behavior. These traces
may be left intentionally or unintentionally by the respondents.
SUMMARY
Chapter 7 not included in course reading list
CHAPTER 8 – Measurement & Scaling (Comparative techniques)
SCALE CHARCTERISTICS
Description means the unique labels or descriptors that are used to designate each value of the scale.
Order means the relative sizes or positions of the descriptors. There are no absolute values associated with
order, only relative values.
The characteristic of distance means that absolute differences between the scale descriptors are known and
may be expressed in units.
The origin characteristic means that the scale has a unique or fixed beginning or true zero point
PRIMARY SCALES OF MEASUREMENT

A nominal scale is a figurative labelling scheme in which the numbers serve only as labels or tags for
identifying and classifying objects. The only characteristic possessed by these scales is description.
An ordinal scale is a ranking scale in which numbers are assigned to objects to indicate the relative extent
to which the objects possess some characteristic. An ordinal scale allows you to determine whether an object
has more or less of a characteristic than some other object, but not how much more or less.
In an interval scale, numerically equal distances on the scale represent equal values in the characteristic
being measured. An interval scale contains all the information of an ordinal scale, but it also allows you to
compare the differences between objects. The difference between any two scale values is identical to the
difference between any other two adjacent values of an interval scale.
A ratio scale possesses all the properties of the nominal, ordinal, and interval scales and, in addition, an
absolute zero point. Thus, ratio scales possess the characteristic of origin (and distance, order, and
description). Thus, in ratio scales we can identify or classify objects, rank the objects, and compare intervals
or differences. It is also meaningful to compute ratios of scale values
SCALING TECHNIQUES

Comparative scales involve the direct comparison of stimulus objects. For example, respondents might be
asked whether they prefer Coke or Pepsi
In noncomparative scales, also referred to as monadic or metric scales, each object is scaled independently
of the others in the stimulus set
In paired comparison scaling, a respondent is presented with two objects and asked to select one according
to some criterion
In rank order scaling, respondents are presented with several objects simultaneously and asked to order or
rank them according to some criterion. For example, respondents may be asked to rank brands of toothpaste
according to overall preference.
In constant sum scaling, respondents allocate a constant sum of units, such as points, dollars, or chips,
among a set of stimulus objects with respect to some criterion.
Q-sort scaling was developed to discriminate quickly among a relatively large number of objects. This
technique uses a rank order procedure in which objects are sorted into piles based on similarity with respect
to some criterion. For example, respondents are given 100 attitude statements on individual cards and asked
to place them into 11 piles, ranging from “most highly agreed with” to “least highly agreed with.”
SUMMARY
CHAPTER 9 - Measurement & Scaling (Non-Comparative techniques)
Respondents using a noncomparative scale employ whatever rating standard seems appropriate to them.
They do not compare the object being rated either to another object or to some specified standard, such as
“your ideal brand.” They evaluate only one object at a time, and for this reason noncomparative scales are
often referred to as nomadic scales.
In a continuous rating scale, also referred to as a graphic rating scale, respondents rate the objects by
placing a mark at the appropriate position on a line that runs from one extreme of the criterion variable to the
other. Thus, the respondents are not restricted to selecting from marks previously set by the researcher
In an itemized rating scale, the respondents are provided with a scale that has a number or brief description
associated with each category. There are 3 commonly used itemized rating scales: Likert scale, semantic
differential scale, Stapel scale
The Likert scale is a widely used rating scale that requires the respondents to indicate the degree of
agreement or disagreement with each of a series of statements about the stimulus objects.3 Typically, each
scale item has five response categories, ranging from “strongly disagree” to “strongly agree.”
The semantic differential is a 7-point rating scale with endpoints associated with bipolar labels that have
semantic meaning. In a typical application, respondents rate objects on a number of itemized, 7-point rating
scales bounded at each end by one of two bipolar adjectives, such as “cold” and “warm.”
The Stapel scale, named after its developer, Jan Stapel, is a unipolar rating scale with 10 categories
numbered from
-5 to +5, without a neutral point (zero). This scale is usually presented vertically.
In a balanced scale, the number of favorable and unfavorable categories are equal; in an unbalanced scale,
they are unequal

On forced rating scales, the respondents are forced to express an opinion, because a “no opinion” option is
not provided. In such a case, respondents without an opinion may mark the middle scale position.
A multi-item scale consists of multiple items, where an item is a single question or statement to be
evaluated.
Systematic error affects the measurement in a constant way. It represents stable factors that affect the
observed score in the same way each time the measurement is made, such as mechanical factors. Random
error, on the other hand, is not constant. It represents transient factors that affect the observed score in
different ways each time the measurement is made, such as transient personal or situational factors
Reliability refers to the extent to which a scale produces consistent results if repeated measurements are
made.
a. In test-retest reliability, respondents are administered identical sets of scale items at two different
times under as nearly equivalent conditions as possible. The degree of similarity between the two
measurements is determined by computing a correlation coefficient. The higher the correlation
coefficient, the greater the reliability
b. In alternative-forms reliability, two equivalent forms of the scale are constructed. The same
respondents are measured at two different times, usually two to four weeks apart, with a different
scale form being administered each time. The scores from the administration of the alternative-scale
forms are correlated to assess reliability.
c. Internal consistency reliability is used to assess the reliability of a summated scale where several
items are summed to form a total score.
a. The simplest measure of internal consistency is split-half reliability. The items on the scale
are divided into two halves and the resulting half scores are correlated. High correlations
between the halves indicate high internal consistency.
b. The coefficient alpha, or Cronbach’s alpha, is the average of all possible split-half
coefficients resulting from different ways of splitting the scale items. This coefficient varies
from 0 to 1, and a value of 0.6 or less generally indicates unsatisfactory internal consistency
reliability
The validity of a scale may be defined as the extent to which differences in observed scale scores reflect
true differences among objects on the characteristic being measured, rather than systematic or random error
a. Content validity, sometimes called face validity, is a subjective but systematic evaluation of how
well the content of a scale represents the measurement task at hand
b. Criterion validity reflects whether a scale performs as expected in relation to other variables
selected as meaningful criteria
c. Construct validity addresses the question of what construct or characteristic the scale is, in fact,
measuring
d. Convergent validity is the extent to which the scale correlates positively with other measures of the
same construct.
e. Discriminant validity is the extent to which a measure does not correlate with other constructs from
which it is supposed to differ
f. Nomological validity is the extent to which the scale correlates in theoretically predicted ways with
measures of different but related constructs
Generalizability refers to the extent to which one can generalize from the observations at hand to a universe
of generalizations. The set of all conditions of measurement over which the investigator wishes to generalize
is the universe of generalization
SUMMARY
Chapter 10 – Questionnaire & Form Design

 Questionnaire - a formalized set of questions for obtaining information from respondents


 (Questionnaire is an element of a data collection package)
Other components that Data Collection Package may include:
- Fieldwork Procedure
- Reward offering to Respondents
- Communication Aids
 Objectives of a Questionnaire:
o Translate the information needed into a set of specific questions that the respondents can
and will answer
o Uplift, motivate, and encourage the respondent to become involved in the interview, to
cooperate, and to complete the interview
o Minimize response error (refer Ch 3)
 Questionnaire Design Process
Information required + Idea of Target Group

Consider how questions are administered in each method

Is any question required for a particular info + How many?

Is respondent informed? + able to remember? + articulate?

Efforts + Context suitable + Valid purpose + info sensitivity

(Un)structured, Major Types of Structured Questions

Target - To avoid item nonresponse & response errors

To translate questions into clear & understandable words

Format, spacing & positioning of qns effect the results

Good-quality paper, prefer booklets avoid question


splitting, avoid overcrowding, directions close to qns
Pretest on small sample respondents

 Additional Information on each step


o Specify the type of interviewing method
Some types of interviewing methods are:
 Personal Interviews – lengthy, detailed questions can be asked
 Telephone Interviews – Interaction with interviewer, but no visible questionnaire
(opposed to personal interviews) ⇒ short, simple questions
 Mail questionnaire – simple questions, detailed instructions
 Computer assisted
 Internet assisted
o Determine the content of individual questions
 Is a question needed? – Purpose may include specific info, filter questions for disguise,
etc.
 Are multiple questions needed? – Double-barrelled question – “Do you think Coca-
Cola is a tasty and refreshing soft drink?”, “Why” questions, etc.
o Decide on the question structure
Types of Question Structure:
 Unstructured - open-ended questions that respondents answer in their own words –
Free-response/Free-answer questions (Good as first questions)

Advantages
 Less response bias, Rich Insights in answers, free expression of views

Disadvantages
 High interviewer bias, Response coding is costly & time-consuming

 Structured - Specify the set of response alternatives and the response format. (A
structured question may be multiple choice, dichotomous, or a scale)

MCQs – suffer with order/position bias (respondents’ tendency to check an alternative


merely because it occupies a certain position or is listed in a certain order)

Dichotomous Qns - only 2 response options (responses are easiest to code)


(Cannot capture uncertainty, neutral alternative dilemma)

o Determine the question wording


Recommended guidlines:
 Define the issue –
“Which brand of shampoo do you use?” (Incorrect)
“Which brand or brands of shampoo have you personally used at home during the last
month? In case of more than one brand, please list all the brands that apply.” (Correct)

 Use ordinary words –


“Do you think the distribution of soft drinks is adequate?” (Incorrect)
“Do you think soft drinks are readily available when you want to buy them?” (Correct)

 Use unambiguous words


 Avoid leading questions –
“Do you think that patriotic Americans should buy imported automobiles when that
would put American labor out of work?” (Incorrect)
“Do you think that Americans should buy imported automobiles?” (Correct)

 Avoid implicit alternatives –


“Do you like to fly when traveling short distances?” (Incorrect)
“Do you like to fly when traveling short distances, or would you rather drive?”
(Correct)

 Avoid implicit assumptions


“Are you in favor of a balanced budget?” (Incorrect)
“Are you in favor of a balanced budget if it would result in an increase in the personal
income tax?” (Correct)

 Avoid generalizations and estimates


“What is the annual per capita expenditure on groceries in your household?”
(Incorrect)
“What is the monthly (or weekly) expenditure on groceries in your household?” and
“How many members are there in your household?” (Correct)

 Use positive and negative statements

o Arrange the questions in proper order


Opening Question – crucial to gaining confidence, should be simple, interesting, non-
threatening, eg. Opinion questions.

Type of Information
 Basic Info (should be first): relates directly to the research
 Classification information (should be 2nd amongst these): includes socio-economic &
demographic info
 Identification information (should be last amongst these) - includes name, postal
address, e-mail address, and telephone number, etc.

Difficult Questions – sensitive, embarrassing, complex, dull should be placed late in the
sequence

Effect on subsequent qns - general questions should precede specific questions

Logical order - All the questions that deal with a particular topic should be asked before
beginning a new topic
 Branching qns - direct respondents to different places in the questionnaire based on
how they respond to the question at hand
(These questions ensure that all possible contingencies are covered)

o Identify the form and layout


Form Layout becomes important in self-administered questionnaires.
(e.g., Top qns receive more attention, Instructions in red appear more complicated)

Precoding – Assigning a code to every conceivable response before data collection


(Facilitates control, coding and analysis of the questionnaire and the responses)

o Eliminate bugs by pretesting


Pretesting - Testing of the questionnaire on a small sample of respondents to identify and
eliminate potential problems

Important points to note:


o All aspects of the questionnaire should be tested, including question content, wording,
sequence, form and layout, question difficulty, and instructions.
o Respondents for the pretest and for the actual survey should be drawn from the same
population.
o Pretests are best done by personal interviews.
o A variety of interviewers should be used for pretests.
o 2 common procedures in pretesting:
 Protocol Analysis – Respondents “think-aloud” while answering the questions.
 Debriefing (Occurs post the questionnaire) - Respondents are told that the
questionnaire they just completed was a pretest and the objectives of pretesting are
described to them. They are then asked to describe the meaning of each question,
to explain their answers, and to state any problems they encountered while
answering the questionnaire
CHAPTER 11 – Sampling: Design and Procedures

A population is the aggregate of all the elements that share some common set of characteristics and that
comprise the universe for the purposes of the marketing research problem
The target population is the collection of elements or objects that possess the information sought by the
researcher and about which inferences are to be made
An element is the object about which or from which the information is desired. In survey research, the
element is usually the respondent. A sampling unit is an element, or a unit containing the element, that is
available for selection at some stage of the sampling process.
A sampling frame is a representation of the elements of the target population. It consists of a list or set of
directions for identifying the target population
In the Bayesian approach, the elements are selected sequentially. After each element is added to the
sample, the data are collected, sample statistics computed, and sampling costs determined. The Bayesian
approach explicitly incorporates prior information about population parameters as well as the costs and
probabilities associated with making wrong decisions
Nonprobability sampling relies on the personal judgment of the researcher rather than chance to select
sample elements
In probability sampling, sampling units are selected by chance.
NON-PROBABILITY TECHNIQUES

Convenience sampling attempts to obtain a sample of convenient elements. The selection of sampling units
is left primarily to the interviewer.

Judgmental sampling is a form of convenience sampling in which the population elements are selected
based on the judgment of the researcher

Quota sampling may be viewed as two-stage restricted judgmental sampling. The first stage consists of
developing control categories, or quotas, of population elements. To develop these quotas, the researcher
lists relevant control characteristics and determines the distribution of these characteristics in the target
population. Once the quotas have been assigned, there is considerable freedom in selecting the elements to
be included in the sample. The only requirement is that the elements selected fit the control characteristics.
In snowball sampling, an initial group of respondents is selected, usually at random. After being
interviewed, these respondents are asked to identify others who belong to the target population of interest.
Subsequent respondents are selected based on the referrals.
PROBABILISTIC TECHNIQUES
In simple random sampling (SRS), each element in the population has a known and equal probability of
selection
In systematic sampling, the sample is chosen by selecting a random starting point and then picking every
ith element in succession from the sampling frame. The sampling interval, i, is determined by dividing the
population size N by the sample size n and rounding to the nearest integer
Stratified sampling is a two-step process in which the population is partitioned into subpopulations, or
strata. (NOTE: Groups should be internally homogenous; Heterogenous between groups. This helps in
increasing accuracy because there is more accurate representation.)
In cluster sampling, the target population is first divided into mutually exclusive and collectively
exhaustive subpopulations, or clusters. Then a random sample of clusters is selected, based on a probability
sampling technique such as SRS. (NOTE: Groups should be internally heterogenous & representative;
Homogenous between groups. This helps reduce costs of sampling bcoz of smaller sample sizes)
A common form of cluster sampling is area sampling, in which the clusters consist of geographic areas,
such as counties, housing tracts, or blocks. If only one level of sampling takes place in selecting the basic
elements, the design is called single-stage area sampling. If two (or more) levels of sampling take place
before the basic elements are selected, the design is called two-(multi)stage area sampling.
In probability proportionate to size sampling, the clusters are sampled with probability proportional to
size. The size of a cluster is defined in terms of the number of sampling units within that cluster
In sequential sampling, the population elements are sampled sequentially, data collection and analysis are
done at each stage, and a decision is made as to whether additional population elements should be sampled.
In double sampling, also called two-phase sampling, certain population elements are sampled twice. In the
first phase, a sample is selected and some information is collected from all the elements in the sample. In the
second phase, a subsample is drawn from the original sample and additional information is obtained from
the elements in the subsample
SUMMARY
Chapter 15: Frequency Distribution, Cross Tabulation and Hypothesis Testing

In a frequency distribution, one variable is considered at a time. The objective is to obtain a count of the
number of responses associated with different values of the variable. The relative occurrence, or frequency,
of different values of the variable is then expressed in percentages.

Statistics associated with Frequency Distribution:

The most commonly used statistics associated with frequencies are –

i. Measures of location

The mean, or average value, is the most commonly used measure of central tendency. It is used to estimate
the mean when the data have been collected using an interval or ratio scale.

The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The
mode is a good measure of location when the variable is inherently categorical or has otherwise been
grouped into categories. Used when variable is measured on a nominal scale.

The median of a sample is the middle value when the data are arranged in ascending or descending order. If
the number of data points is even, the median is usually estimated as the midpoint between the two middle
values—by adding the two middle values and dividing their sum by 2. The median is the 50th percentile.
The median is an appropriate measure of central tendency for ordinal data.

ii. Measures of variability

The range measures the spread of the data. It is simply the difference between the largest and smallest
values in the sample. As such, the range is directly affected by outliers.

Range = Xlargest - Xsmallest

The interquartile range is the difference between the 75th and 25th percentile.

The difference between the mean and an observed value is called the deviation from the mean. The variance
is the mean squared deviation from the mean. The variance can never be negative. When the data points are
clustered around the mean, the variance is small. When the data points are scattered, the variance is large.
Standard Deviation is square root of Variance.

The coefficient of variation is the ratio of the standard deviation to the mean, expressed as a percentage,
and it is a unitless measure of relative variability. Used only when the variable is measured on a ratio scale.

iii. Measures of shape

Skewness is the tendency of the deviations from the mean to be larger in one direction than in the other.

Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency
distribution. The kurtosis of a normal distribution is zero. If the kurtosis is positive, then the distribution is
more peaked than a normal distribution. A negative value means that the distribution is flatter than a normal
distribution.
General Procedure for Hypothesis Testing:

1. Formulate the null hypothesis H0 and the alternative hypothesis H1.

A null hypothesis is a statement of the status quo, one of no difference or no effect. If the null
hypothesis is not rejected, no changes will be made. An alternative hypothesis is one in which some
difference or effect is expected. Accepting the alternative hypothesis will lead to changes in opinions
or actions. Thus, the alternative hypothesis is the opposite of the null hypothesis.
The null hypothesis is always the hypothesis that is tested. The null hypothesis refers to a specified
value of the population parameter (e.g., m, s, p), not a sample statistic (e.g., X, s, p).

2. Select an appropriate statistical technique and the corresponding test statistic.


3. Choose the level of significance, 
Type I error occurs when the sample results lead to the rejection of the null hypothesis when it is in
fact true.
Type II error occurs when, based on the sample results, the null hypothesis is not rejected when it is
in fact false, denoted by beta.
The complement (1 - beta) of the probability of Type II error is called the power of a statistical test.
4. Determine the sample size and collect the data. Calculate the value of the test statistic.
5. Determine the probability associated with the test statistic under the null hypothesis, using the
sampling distribution of the test statistic. Alternatively, determine the critical values associated with
the test statistic that divide the rejection and nonrejection regions.
6. Compare the probability associated with the test statistic with the level of significance specified.
Alternatively, determine whether the test statistic has fallen into the rejection or the nonrejection
region.
7. Make the statistical decision to reject or not reject the null hypothesis.
8. Express the statistical decision in terms of the marketing research problem.
A cross-tabulation is the merging of the frequency distribution of two or more variables in a single table,
also called a contingency table.

Cross-tabulation with two variables is also known as bivariate cross-tabulation.

Effects of Introduction of a 3rd variable in cross-tabulation:

Statistics Associated with Cross-Tabulations

1. Chi-square statistic is used to test the statistical significance of the observed association in a cross-
tabulation. It assists us in determining whether a systematic association exists between the two
variables.
The chi-square distribution is a skewed distribution whose shape depends solely on the number of
degrees of freedom. As the number of degrees of freedom increases, the chi-square distributions
becomes more symmetrical.

2. The phi coefficient is used as a measure of the strength of association in the special case of a table
with two rows and two columns (a 2 X 2 table). The phi coefficient is proportional to the square root
of the chi-square statistic. For a sample of size n, this statistic is calculated as:
Phi = chi-square/n

3. Whereas the phi coefficient is specific to a 2 X 2 table, the contingency coefficient (C) can be used
to assess the strength of association in a table of any size. This index is also related to chi- square, as
follows:
C = sq. root (2 / 2 + n )
4. Cramer’s V is a modified version of the phi correlation coefficient, and is used in tables larger than
2 X 2. When phi is calculated for a table larger than 2 X 2, it has no upper limit. Cramer’s V is
obtained by adjusting phi for either the number of rows or the number of columns in the table, based
on which of the two is smaller. The adjustment is such that V will range from 0 to 1. A large value of
V merely indicates a high degree of association. It does not indicate how the variables are associated.

5. Lambda assumes that the variables are measured on a nominal scale. Asymmetric lambda measures
the percentage improvement in predicting the value of the dependent variable, given the value of the
independent variable. Lambda also varies between 0 and 1. A value of 0 means no improvement in
prediction. A value of 1 indicates that the prediction can be made without error. This happens when
each independent variable category is associated with a single category of the dependent variable.
Asymmetric lambda is computed for each of the variables (treating it as the dependent variable).

Usage of Cross-Tabulation with Hypothesis Testing

1. Test the null hypothesis that there is no association between the variables using the chi- square statistic. If
you fail to reject the null hypothesis, then there is no relationship.

2. If H0 is rejected, then determine the strength of the association using an appropriate statistic (phi
coefficient, contingency coefficient, Cramer’s V, lambda coefficient, or other statistics).

3. If H0 is rejected, interpret the pattern of the relationship by computing the percentages in the direction of
the independent variable, across the dependent variable.

4. If the variables are treated as ordinal rather than nominal, use tau b, tau c, or gamma as the test statistic. If
H0 is rejected, then determine the strength of the association using the magnitude, and the direction of the
relationship using the sign of the test statistic.

5. Translate the results of hypothesis testing, strength of association, and pattern of association into
managerial implications and recommendations where meaningful.

Parametric and Non-Parametric Tests

Hypothesis-testing procedures can be broadly classified as parametric or nonparametric, based on the


measurement scale of the variables involved. Parametric tests assume that the variables of interest are
measured on at least an interval scale. Nonparametric tests assume that the variables are measured on a
nominal or ordinal scale.
Chapter 17: Correlation and Regression

Product moment correlation, r, summarizes the strength of association between two metric (interval or
ratio scaled) variables, say X and Y. It is an index used to determine whether a linear, or straight-line,
relationship exists between X and Y. It indicates the degree to which the variation in one variable, X, is
related to the variation in another variable, Y. Because it was originally proposed by Karl Pearson, it is also
known as the Pearson correlation coefficient. It is also referred to as simple correlation, bivariate
correlation, or merely the correlation coefficient.

Whereas the product moment or simple correlation is a measure of association describing the linear
association between two variables, a partial correlation coefficient measures the association between two
variables after controlling for or adjusting for the effects of one or more additional variables.

Partial correlations have an order associated with them. The order indicates how many variables are being
adjusted or controlled.

Regression analysis is a powerful and flexible procedure for analysing associative relationships between a
metric dependent variable and one or more independent variables. It can be used in the following ways:

1. Determine whether the independent variables explain a significant variation in the dependent variable:
whether a relationship exists

2. Determine how much of the variation in the dependent variable can be explained by the independent
variables: strength of the relationship

3. Determine the structure or form of the relationship: the mathematical equation relating the independent
and dependent variables

4. Predict the values of the dependent variable

5. Control for other independent variables when evaluating the contributions of a specific variable or set of
variables

Although the independent variables may explain the variation in the dependent variable, this does not
necessarily imply causation. The use of the terms dependent or criterion variables, and independent or
predictor variables, in regression analysis arises from the mathematical relationship between the variables.

Bivariate Regression Model - Statistics

Bivariate regression model. The basic regression equation is Yi = b0 + b1 Xi + ei, where

Y = dependent or criterion variable, X = independent or predictor variable, b0 = intercept


of the line, b1 = slope of the line, and ei is the error term associated with the ith observation.

Coefficient of determination. The strength of association is measured by the coefficient of determination,


r2. It varies between 0 and 1 and signifies the proportion of the total variation in Y that is accounted for by
the variation in X.

Estimated or predicted value. The estimated or predicted value of Yi is Y’i = a + bx, where Y’i is the
predicted value of Yi, and a and b are estimators of b0 and b1, respectively.

Regression coefficient. The estimated parameter b is usually referred to as the non-standardized regression
coefficient.
Scattergram. A scatter diagram, or scattergram, is a plot of the values of two variables for all the cases or
observations.

Standard error of estimate. This statistic, SEE, is the standard deviation of the actual Y values from the
predicted Y’ values.

Standard error. The standard deviation of b, SEb, is called the standard error.

Standardized regression coefficient. Also termed the beta coefficient or beta weight, this is the slope
obtained by the regression of Y on X when the data are standardized.

Sum of squared errors. The distances of all the points from the regression line are squared and added
2
together to arrive at the sum of squared errors, which is a measure of total error, Summation of e j.

t statistic. A t statistic with n - 2 degrees of freedom can be used to test the null hypothesis that no linear
relationship exists between X and Y, or H0: b1 = 0,where t = b/SEb

OLS Method of Regression

The most commonly used technique for fitting a straight line to a scattergram is the least-squares procedure.
This technique determines the best-fitting line by minimizing the square of the vertical distances of all the
points from the line and the procedure is called ordinary least squares (OLS) regression. The best-fitting line
is called the regression line.

Assumptions

The regression model makes a number of assumptions in estimating the parameters and in significance
testing:

1. The error term is normally distributed. For each fixed value of X, the distribution of Y is normal.
2. The means of all these normal distributions of Y, given X, lie on a straight line with slope b.
3. The mean of the error term is 0.
4. The variance of the error term is constant. This variance does not depend on the values assumed by X.
5. The error terms are uncorrelated. In other words, the observations have been drawn independently.

Multiple Regression Model - Statistics

Multiple regression involves a single dependent variable and two or more independent variables.

Adjusted R2. R2, coefficient of multiple determination, is adjusted for the number of independent variables
and the sample size to account for diminishing returns. After the first few variables, the additional
independent variables do not make much contribution.

Coefficient of multiple determination. The strength of association in multiple regression is measured by the
square of the multiple correlation coefficient, R2, which is also called the coefficient of multiple
determination.

F test. The F test is used to test the null hypothesis that the coefficient of multiple determination in the
population, R2pop, is zero. This is equivalent to testing the null hypothesis H0:b1 = b2 = b3 = Á = bk =
0.The test statistic has an F distribution with k and
(n - k - 1) degrees of freedom.
Partial F test. The significance of a partial regression coefficient, bi, of Xi may be tested using an
incremental F statistic. The incremental F statistic is based on the increment in the explained sum of squares
resulting from the addition of the independent variable Xi to the regression equation after all the other
independent variables have been included.

Partial regression coefficient. The partial regression coefficient, b1, denotes the change in the predicted
value, Y’, per unit change in X1 when the other independent variables, X2 to Xk, are held constant.

Step-wise Regression

The purpose of stepwise regression is to select, from a large number of predictor variables, a small subset
of variables that account for most of the variation in the dependent or criterion variable. In this procedure,
the predictor variables enter or are removed from the regression equation one at a time.22 There are several
approaches to stepwise regression.

1. Forward inclusion. Initially, there are no predictor variables in the regression equation. Predictor
variables are entered one at a time, only if they meet certain criteria specified in terms of the F ratio. The
order in which the variables are included is based on the contribution to the explained variance.

2. Backward elimination. Initially, all the predictor variables are included in the regression equation.
Predictors are then removed one at a time based on the F ratio.

3. Stepwise solution. Forward inclusion is combined with the removal of predictors that no longer meet the
specified criterion at each step.

Stepwise procedures do not result in regression equations that are optimal, in the sense of producing the
largest R2, for a given number of predictors.

Multicollinearity arises when intercorrelations among the predictors are very high. Multicollinearity can
result in several problems, including:

1. Partial regression coefficients may not be estimated precisely. The standard errors are likely to be high.
2. Magnitudes as well as the signs of the partial regression coefficients may change from sample to sample.
3. It becomes difficult to assess the relative importance of the independent variables in explaining the
variation in the dependent variable.
4. Predictor variables may be incorrectly included or removed in stepwise regression.

You might also like