Business Research Methods Question Guide
Business Research Methods Question Guide
1. Application: From the point of view of application, there are two broad categories of
research:
-Pure research
- Applied research.
i) Pure research involves developing and testing theories and hypotheses that are
intellectually challenging to the researcher but may or may not have practical
application at the present time or in the future. The knowledge produced through pure
research is sought in order to add to the existing body of research methods.
ii) Applied research is done to solve specific, practical questions; for policy
formulation, administration and understanding of a phenomenon. It can be
exploratory, but is usually descriptive. It is almost always done on the basis of basic
research. Applied research can be carried out by academic or industrial institutions.
Often, an academic institution such as a university will has a specific applied research
program funded by an industrial partner interested in that program.
2. Objectives: From the viewpoint of objectives, a research can be classified as
-descriptive
-co relational
-explanatory
-exploratory
i)Descriptive research attempts to describe systematically a situation, problem, phenomenon,
service or programme, or provides information about , say, living condition of a community, or
describes attitudes towards an issue.
ii) Co relational research attempts to discover or establish the existence of a relationship/
interdependence between two or more aspects of a situation.
iii) Explanatory research attempts to clarify why and how there is a relationship between two
or more aspects of a situation or phenomenon.
iv) Exploratory research is undertaken to explore an area where little is known or to investigate
the possibilities of undertaking a particular research study (feasibility study / pilot study). In
practice most studies are a combination of the first three categories.
3. Inquiry Mode: From the process adopted to find answer to research questions – the two
approaches are:
- Structured approach
- Unstructured approach
Types of cuisine is the qualitative aspect of the study as finding out about them entails
description of the culture and cuisine the extent of their popularity is the quantitative aspect as it
involves estimating the number of people who visit restaurant serving such cuisine and
calculating the other indicators that reflect the extent of popularity.
Q8. What is the difference between exploratory research and conclusive research?
Research Project Component Exploratory Research Conclusive Research
Research Purpose General: generate insight Specific: verify insight
Data Needs Vague Clear
Data Sources Ill-defined Well-defined
Data Collection Form Open-ended, rough Usually structured
Sample Small, subjective Large, objective
Data Collection Flexible Rigid
Data Analysis Informal, typically qualitative Formal, typically quantitative
Inferences/
More tentative More final
Recommendations
When using the scientific method one of the primary goals is objectivity. Proper use of the
scientific method leads us to rationalism (basing conclusion on intellect, logic and evidence).
Relying on science also helps us avoid dogmatism (adherence to doctrine over rational and
enlightened inquiry, or basing conclusion on authority rather than evidence). The nonscientific
approach to knowledge involves informal kinds of thinking. This approach can be thought of as
an everyday unsystematic uncritical way of thinking. Below I will discuss the major differences
between the two. .
There have been occasions, when research has been called into question. Corporate espionage
and other issues such as; corporations skewing the facts to fit their data have existed. It is a
downside to dishonesty in people; however, the ethical, moral, and scientific belief is that all
research has to be based on facts. .
This does not mean the facts will not change. Research is conducted with what information and
tools are available at that time. This means that in 100 years research being conducted now could
be found false, but at the time it is true because of the limited technology or facts that could be
found.
As always when someone learns about research and the research method, one is told that a theory
is never solely factual, but proved or disproved based on what could be found at that time. It goes
back to the fact, that proper information, analysis, and evaluation are needed in order to conduct
proper research. Inaccurate facts will skew the data and render the entire research invalid.
There is also the human interpretation of the information found. While research is concerned
with these three topics, one also has to realize the writer of the research can limit the scope of the
research and therefore change the results based on their viewpoint alone.
There are four broad steps involved in planning the research design as explained below:
(1)Determining work involved in the project:
The first step in planning research design is determining the work involved in the project-and
designing a workable plan to carry out the research work within specific time limit.
The work involved includes the following:(a)To formulate the marketing problem(b)To
determine information requirement(c)To identify information sources (d)To prepare
detailed plan for the execution of research project This preliminary step indicates the
nature and volume of work involved in the research work. Various forms require for
research work will be decided and finalized. The sample to be selected for the survey work
will also be decided. Staff requirement will also be e s t i m a t e d . D e t a i l s w i l l b e
w o r k e d o u t a b o u t t h e i r t r a i n i n g a n d s u p e r v i s i o n o n f i e l d investigators, etc.In
addition, the questionnaire will be prepared and tested. This is how the researcher will p r e p a r e
a blue-print of the research project. According to this blueprint the whole
research project will be implemented. The researcher gets clear idea of the work involved i n
the project through such initial planning of the project. Such
p l a n n i n g a v o i d s confusion, misdirection and wastage of time, money and efforts at later
stages of research work. The whole research project moves smoothly due to initial planning of
the research project.
(2)Estimating costs involved:
T h e s e c o n d s t e p i n p l a n n i n g r e s e a r c h d e s i g n i s e s t i m a t i n g t h e c o s t s
i n v o l v e d i n t h e research project. MR projects are costly as the questionnaire is to
be prepared in large number of copies, interviewers are to be appointed for data
collection and staff will be required for tabulation and analysis of data collected. Finally,
experts will be required for drawing conclusions and for writing the research report.
The researcher has to estimate the expenditure required for the execution of the
project. The sponsoring organization will approve the research project and make suitable
budget provision accordingly. The cost calculation is a complicated job as expenditure on
different heads will have to be estimated accurately. The cost of the project also needs to be
viewed from the viewpoint o f i t s u t i l i t y i n s o l v i n g t h e m a r k e t i n g p r o b l e m . A
c o m p r e h e n s i v e r e s e a r c h s t u d y f o r solving comparatively minor marketing problem
will be uneconomical.
(3)Preparing time schedule:
T i m e f a c t o r i s i m p o r t a n t i n t h e e x e c u t i o n o f t h e r e s e a r c h p r o j e c t .
P l a n n i n g o f t i m e schedule is essential at the initial stage. Time calculation relates
to the preparation of questionnaire and its pre-testing, training of interviewers, actual survey
work, tabulation and analysis of data and finally reports writing. Time requirement of each stage
needs to be worked out systematically. Such study will indicate the time requirement of the
whole p r o j e c t . T o o l o n g p e r i o d f o r t h e c o m p l e t i o n o f r e s e a r c h w o r k i s
undesirable as the conclusions and recommendations may become outdated
w h e n a c t u a l l y a v a i l a b l e . Similarly, time-consuming research projects are not useful for
solving urgent marketing problems faced by a company. Preparing time schedule is not
adequate in research design. In addition, all operations involved in the research work
should be carried out strictly as per time schedule already p r e p a r e d . I f n e c e s s a r y
r e m e d i a l m e a s u r e s s h o u l d b e a d o p t e d i n o r d e r t o a v o i d a n y deviation in the
time schedule. This brings certainty as regards the completion of the whole research
project in time.
(4)Verifying results:
R e s e a r c h f i n d i n g s n e e d t o b e d e p e n d a b l e t o t h e s p o n s o r i n g o r g a n i z a t i o n .
R e s e a r c h e r m a y create new problems before the sponsoring organization if the research
work is conducted in a faulty manner. Such unreliable study is dangerous as it may create new
problems. It is therefore, necessary to keep effective check on the whole research
work during the implementing stage. For this suitable provisions need to be made in the
research design. After deciding the details of the steps noted above, the background
for research design will be ready. Thereafter, the researcher has to prepare the research design
of the whole p r o j e c t . H e h a s t o p r e s e n t t h e p r o j e c t d e s i g n t o t h e
s p o n s o r i n g a g e n c y o r h i g h e r authorities for detailed consideration and approval.
The researcher can start the research p r o j e c t ( a s p e r d e s i g n ) a f t e r s e c u r i n g t h e
n e c e s s a r y a p p r o v a l t o t h e r e s e a r c h d e s i g n prepared.
• To attribute cause and effect relationships among two or more variables so that we
can better understand and predict the outcome of one variable (e.g., sales) when
varying another (e.g., advertising).This classification is frequently used and is quite
popular. Before we discuss each of these design types, a cautionary note is in order.
Some might think that the research design decision suggests a choice among the
design types. Although there are research situations in which all the research
questions might be answered by doing only one of these types (e.g., a causal research
experiment to determine which of three prices results in the greatest profits), it is more
often the case that the research design might involve more than one of these types
performed in some sequence. The overall research design is intended to indicate exactly how the
different design types will be utilized to get answers to the research questions or test the
hypothesis.
Panel sampling
Panel sampling is the method of first selecting a group of participants through a random
sampling method and then asking that group for the same information again several times over a
period of time. Therefore, each participant is given the same survey or interview at two or more
time points; each period of data collection is called a "wave". This longitudinal sampling-method
allows estimates of changes in the population, for example with regard to chronic illness to job
stress to weekly food expenditures. Panel sampling can also be used to inform researchers about
within-person health changes due to age or to help explain changes in continuous dependent
variables such as spousal interaction. There have been several proposed methods of analyzing
panel data, including MANOVA, growth curves, and structural equation modeling with lagged
effects.
Replacement of selected units
Sampling schemes may be without replacement ('WOR' - no element can be selected more than
once in the same sample) or with replacement ('WR' - an element may appear multiple times in
the one sample). For example, if we catch fish, measure them, and immediately return them to
the water before continuing with the sample, this is a WR design, because we might end up
catching and measuring the same fish more than once. However, if we do not return the fish to
the water (e.g. if we eat the fish), this becomes a WOR design.
Sample size
Formulas, tables, and power function charts are well known approaches to determine sample
size.
Steps for using sample size tables :
Postulate the effect size of interest, α, and β.
Check sample size table
Select the table corresponding to the selected α
Locate the row corresponding to the desired power
Locate the column corresponding to the estimated effect size.
The intersection of the column and row is the minimum sample size required.
Q21. What do you mean by primary data? What are the various methods of collecting
primary data?
Primary sources are original sources from which the researcher directly collects data that have
not been previously collected e.g.., collection of data directly by the researcher on brand
awareness, brand preference, brand loyalty and other aspects of consumer behavior from a
sample of consumers by interviewing them,. Primary data are first hand information collected
through various methods such as observation, interviewing, mailing etc.
Methods of Collecting Primary Data
Primary data are directly collected by the researcher from their original sources. In this case, the
researcher can collect the required date precisely according to his research needs, he can collect
them when he wants them and in the form he needs them. But the collection of primary data is
costly and time consuming. Yet, for several types of social science research required data are not
available from secondary sources and they have to be directly gathered from the primary sources.
In such cases where the available data are inappropriate, inadequate or obsolete, primary data
have to be gathered. They include: socio economic surveys, social anthropological studies of
rural communities and tribal communities, sociological studies of social problems and social
institutions. Marketing research, leadership studies, opinion polls, attitudinal surveys, readership,
radio listening and T.V. viewing surveys, knowledge-awareness practice (KAP) studies, farm
managements studies, business management studies etc.
There are various methods of data collection. A ‘Method’ is different from a ‘Tool’ while a
method refers to the way or mode of gathering data, a tool is an instruments used for the method.
For example, a schedule is used for interviewing. The important methods are
(a) Observation, (b) Interviewing, (c) Mail survey, (d) Experimentation,
(a) Observation: Observation means viewing or seeing. Observation may be defined as a
systematic viewing of a specific phenomenon in its proper setting for the specific purpose of
gathering data for a particular study. Observation is classical method of scientific study.
(b) Interviewing: Interviewing is one of the prominent methods of data collection. It may be
defined as a two way systematic conversation between an investigator and an informant, initiated
for obtaining information relevant to a specific study. It involves not only conversation, but also
learning from the respondent’s gesture, facial expressions and pauses, and his environment.
Interviewing requires face to face contact or contact over telephone and calls for interviewing
skills. It is done by using a structured schedule or an unstructured guide.
(c) Mail survey: The mail survey is a data collection process for researchers. Research
practitioners should recognize that this is a viable means of collecting specific market data.
(d) Experimentation: The popularity of experimentation in marketing research has much to do
with the possibilities of establishing cause and effect. Experiments can be configured in such a
way as to allow the variable causing a particular effect to be isolated. Other methods commonly
used in marketing research, like surveys, provide much more ambiguous findings. In fact,
experimentation is the most scientific method employed in marketing research.
Q22. What are the differences between observation and interviewing as methods of data
collection? Give two specific examples of situations where either observation or
interviewing would be more appropriate.
Observation: Observation means viewing or seeing. Observation may be defined as a systematic
viewing of a specific phenomenon in its proper setting for the specific purpose of gathering data
for a particular study. Observation is classical method of scientific study.
Observation as a method of data collection has certain characteristics.
1. It is both a physical and a mental activity: The observing eye catches many things that are
present. But attention is focused on data that are pertinent to the given study.
2. Observation is selective: A researcher does not observe anything and everything, but selects
the range of things to be observed on the basis of the nature, scope and objectives of his study.
For example, suppose a researcher desires to study the causes of city road accidents and also
formulated a tentative hypothesis that accidents are caused by violation of traffic rules and over
speeding. When he observed the movements of vehicles on the road, many things are before his
eyes; the type, make, size and colour of the vehicles, the persons sitting in them, their hair style,
etc. All such things which are not relevant to his study are ignored and only over speeding and
traffic violations are keenly observed by him.
3. Observation is purposive and not casual: It is made for the specific purpose of noting things
relevant to the study. It captures the natural social context in which persons behaviour occur. It
grasps the significant events and occurrences that affect social relations of the participants.
4. Observation should be exact and be based on standardized tools of research and such as
observation schedule, social metric scale etc., and precision instruments, if any.
Observation is suitable for a variety of research purposes. It may be used for studying
(a) The behavior of human beings in purchasing goods and services. Life style, customs, and
manner, interpersonal relations, group dynamics, crowd behavior, leadership styles, managerial
style, other behaviors and actions;
(b) The behavior of other living creatures like birds, animals etc.
(c) Physical characteristics of inanimate things like stores, factories, residences etc.
(d) Flow of traffic and parking problems
(e) Movement of materials and products through a plant.
Q23. What is questionnaire? Discuss the main points that you will take into account while
drafting a questionnaire? .
A questionnaire is a research instrument consisting of a series of questions and other prompts for
the purpose of gathering information from respondents. Although they are often designed for
statistical analysis of the responses, this is not always the case. The questionnaire was invented
by Sir Francis Galton. .
Questionnaires have advantages over some other types of surveys in that they are cheap, do not
require as much effort from the questioner as verbal or telephone surveys, and often have
standardized answers that make it simple to compile data. However, such standardized answers
may frustrate users. Questionnaires are also sharply limited by the fact that respondents must be
able to read the questions and respond to them. Thus, for some demographic groups conducting a
survey by questionnaire may not be practical. .
The main points that will be taken into account while drafting a questionnaire:
* Use statements which are interpreted in the same way by members of different subpopulations
of the population of interest. .
* Use statements where persons that have different opinions or traits will give different answers.
* Think of having an "open" answer category after a list of possible answers.
* Use only one aspect of the construct you are interested in per item.
* Use positive statements and avoid negatives or double negatives.
* Do not make assumptions about the respondent.
* Use clear and comprehensible wording, easily understandable for all educational levels
* Use correct spelling, grammar and punctuation.
* Avoid items that contain more than one question per item (e.g. Do you like strawberries and
potatoes
Below is a checklist you can use when forming your questions:
q Is this question necessary? How will it be useful? What will it tell you?
q Will you need to ask several related questions on a subject to be able to answer your critical
question?
q Do respondents have the necessary information to answer the question?
q Will the words in each question be universally understood by your target audience?
q Are abbreviations used? Will everyone in your sample understand what they mean?
q Are unconventional phrases used? If so, are they really necessary? Can they be deleted?
q Is the question too vague? Does it get directly to the subject matter?
q Can the question be misunderstood? Does it contain unclear phrases?
q Is the question misleading because of unstated assumptions or unseen implications?
Are your assumptions the same as the target audience?
q Have you assumed that the target audience has adequate knowledge to answer the question?
q Is the question too demanding? For example, does it ask too much on the part of the respondent
in terms of mathematical calculations, or having to look up records?
q Is the question biased in a particular direction, without accompanying questions to balance the
emphasis?
q Are you asking two questions at one time?
q Does the question have a double negative?
q Is the question wording likely to be objectionable to the target audience in any way?
q Are the answer choices mutually exclusive?
q Is the question technically accurate?
q Is an appropriate referent provided? For example: per year, per acre.
Schedule Method
In case the informants are largely uneducated and non-responsive data cannot be collected by the
mailed questionnaire method. In such cases, schedule method is used to collect data. Here the
questionnaires are sent through the enumerators to collect informations. Enumerators are persons
appointed by the investigator for the purpose. They directly meet the informants with the
questionnaire. They explain the scope and objective of the enquiry to the informants and solicit
their cooperation. The enumerators ask the questions to the informants and record their answers
in the questionnaire and compile them. The success of this method depends on the sincerity and
efficiency of the enumerators. So the enumerator should be sweet-tempered, good-natured,
trained and well-behaved.
Schedule method is widely used in extensive studies. It gives fairly correct result as the
enumerators directly collect the information. The accuracy of the information depends upon the
honesty of the enumerators. They should be unbiased. This method is relatively more costly and
time-consuming than the mailed questionnaire method.
DATA ANALYSIS
Processing of data--editing, coding, classification and tabulation
After collecting data, the method of converting raw data into meaningful statement; includes data
processing, data analysis, and data interpretation and presentation.
Data reduction or processing mainly involves various manipulations necessary for preparing the
data for analysis. The process (of manipulation) could be manual or electronic. It involves
editing, categorizing the open-ended questions, coding, computerization and preparation of
tables and diagrams. .
Editing data: .
Information gathered during data collection may lack uniformity. Example: Data collected
through questionnaire and schedules may have answers which may not be ticked at proper
places, or some questions may be left unanswered. Sometimes information may be given in a
form which needs reconstruction in a category designed for analysis, e.g., converting
daily/monthly income in annual income and so on. The researcher has to take a decision as to
how to edit it. .
Editing also needs that data are relevant and appropriate and errors are modified. Occasionally,
the investigator makes a mistake and records and impossible answer. “How much red chilies do
you use in a month” The answer is written as “4 kilos”. Can a family of three members use four
kilo chilies in a month? The correct answer could be “0.4 kilo”.
Coding of data: .
Coding is translating answers into numerical values or assigning numbers to the various
categories of a variable to be used in data analysis. Coding is done by using a code book, code
sheet, and a computer card. Coding is done on the basis of the instructions given in the
codebook. The code book gives a numerical code for each variable.
Now-a-days, codes are assigned before going to the field while constructing the
questionnaire/schedule. Pose data collection; pre-coded items are fed to the computer for
processing and analysis. For open-ended questions, however, post-coding is necessary. In such
cases, all answers to open-ended questions are placed in categories and each category is assigned
a code. .
Manual processing is employed when qualitative methods are used or when in quantitative
studies, a small sample is used, or when the questionnaire/schedule has a large number of open-
ended questions, or when accessibility to computers is difficult or inappropriate. However,
coding is done in manual processing also. .
Data classification/distribution: .
Sarantakos (1998: 343) defines distribution of data as a form of classification of scores obtained
for the various categories or a particular variable. There are four types of distributions:
1.Frequencydistribution
2.Percentagedistribution
3.Cumulativedistribution
4.Statisticaldistributions
1.Frequency distribution: .
In social science research, frequency distribution is very common. It presents the frequency of
occurrences of certain categories. This distribution appears in two forms:
Ungrouped: Here, the scores are not collapsed into categories, e.g., distribution of ages of the
students of a BJ (MC) class, each age value (e.g., 18, 19, 20, and so on) will be presented
separately in the distribution. .
Grouped: Here, the scores are collapsed into categories, so that 2 or 3 scores are presented
together as a group. For example, in the above age distribution groups like 18-20, 21-22 etc., can
be formed) .
2.Percentage distribution: .
It is also possible to give frequencies not in absolute numbers but in percentages. For instance
instead of saying 200 respondents of total 2000 had a monthly income of less than Rs. 500, we
can say 10% of the respondents have a monthly income of less than Rs. 500.
3. Cumulative distribution: .
It tells how often the value of the random variable is less than or equal to a particular reference
value.
In this type of data distribution, some measure of average is found out of a sample of
respondents. Several kind of averages are available (mean, median, mode) and the researcher
must decide which is most suitable to his purpose. Once the average has been calculated, the
question arises: how representative a figure it is, i.e., how closely the answers are bunched
around it. Are most of them very close to it or is there a wide range of variation?
Tabulation of data: .
After editing, which ensures that the information on the schedule is accurate and categorized in a
suitable form, the data are put together in some kinds of tables and may also undergo some other
forms of statistical analysis. .
Table can be prepared manually and/or by computers. For a small study of 100 to 200 persons,
there may be little point in tabulating by computer since this necessitates putting the data on
punched cards. But for a survey analysis involving a large number of respondents and requiring
cross tabulation involving more than two variables, hand tabulation will be inappropriate and
time consuming.
Usefulness of tables: .
Tables are useful to the researchers and the readers in three ways:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean can be used for both continuous and discrete numeric data.
Limitations of the mean: .
The mean cannot be calculated for categorical data, as the values cannot be summed.
As the mean includes every value in the distribution the mean is influenced by outliers and
skewed distributions. .
The population mean is indicated by the Greek symbol µ (pronounced ‘mu’). When the mean is
calculated on a distribution from a sample it is indicated by the symbol x̅ (pronounced X-bar).
The median is the middle value in distribution when the values are arranged in ascending
or descending order. .
The median divides the distribution in half (there are 50% of observations on either side of the
median value). In a distribution with an odd number of observations, the median value is the
middle value. .
Looking at the retirement age distribution (which has 11 observations), the median is the middle
value, which is 57 years: .
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value is the mean of the
two middle values. In the following distribution, the two middle values are 56 and 57, therefore
the median equals 56.5 years: .
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The median is less affected by outliers and skewed data than the mean, and is usually the
preferred measure of central tendency when the distribution is not symmetrical.
The median cannot be identified for categorical nominal data, as it cannot be logically ordered.
Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age data.
Age Frequency
54 3
55 1
56 1
57 2
58 2
60 2
The most commonly occurring value is 54; therefore the mode of this distribution is 54 years.
The mode has an advantage over the median and the mean as it can be found for both numerical
and categorical (non-numerical) data. .
The are some limitations to using the mode. In some distributions, the mode may not reflect the
centre of the distribution very well. When the distribution of retirement age is ordered from
lowest to highest value, it is easy to see that the centre of the distribution is 57 years, but the
mode is lower, at 54 years. .
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
It is also possible for there to be more than one mode for the same distribution of data, (bi-modal,
or multi-modal). The presence of more than one mode can limit the ability of the mode in
describing the centre or typical value of the distribution because a single value to describe the
centre cannot be identified. .
In some cases, particularly where the data are continuous, the distribution may have no mode at
all (i.e. if all values are different). .
In cases such as these, it may be better to consider using the median or mean, or group the data in
to appropriate intervals, and find the modal class.
The shape of a distribution influence the Measures of Central Tendency
Symmetrical distributions: .
When a distribution is symmetrical, the mode, median and mean are all in the middle of the
distribution. The following graph shows a larger retirement age dataset with a distribution which
is symmetrical. The mode, median and mean all equal 58 years.
Skewed distributions: .
When a distribution is skewed the mode remains the most commonly occurring value, the
median remains the middle value in the distribution, but the mean is generally ‘pulled’ in the
direction of the tails. In a skewed distribution, the median is often a preferred measure of central
tendency, as the mean is not usually in the middle of the distribution.
A distribution is said to be positively or right skewed when the tail on the right side of the
distribution is longer than the left side. In a positively skewed distribution it is common for the
mean to be ‘pulled’ toward the right tail of the distribution. Although there are exceptions to this
rule, generally, most of the values, including the median value, tend to be less than the mean
value.
The following graph shows a larger retirement age data set with a distribution which is right
skewed. The data has been grouped into classes, as the variable being measured (retirement age)
is continuous. The mode is 54 years, the modal class is 54-56 years, the median is 56 years and
the mean is 57.2 years.
A distribution is said to be negatively or left skewed when the tail on the left side of the
distribution is longer than the right side. In a negatively skewed distribution, it is common for the
mean to be ‘pulled’ toward the left tail of the distribution. Although there are exceptions to this
rule, generally, most of the values, including the median value, tend to be greater than the mean
value.
The following graph shows a larger retirement age dataset with a distribution which left skewed.
The mode is 65 years, the modal class is 63-65 years, the median is 63 years and the mean is
61.8 years.
Mean
The arithmetic mean is the sum of the measures in the set divided by the number of measures in
the set. Totaling all the measures and dividing by the number of measures, we get
$1,000 ÷ 5 = $200.
Median
Another measure of central tendency is the median, which is defined as the middle value when
the numbers are arranged in increasing or decreasing order. When we order the daily earnings
shown in Table, we get
If there is an even number of items in a set, the median is the average of the two middle values.
For example,
If we had four values— 4, 10, 12, and 26
The median would be the average of the two middle values,
10 and 12; in this case, (10+12/2)
Median = 11
The median may sometimes be a better indicator of central tendency than the mean, especially
when there are outliers, or extreme values.
Example 1
Given the four annual salaries of a corporation shown in Table’2’, determine the mean and the
median.
The mean of these four salaries is $275,000.
The median is the average of the middle two salaries, or $40,000.
In this instance, the median appears to be a better indicator of central tendency because the
CEO's salary is an extreme outlier, causing the mean to lie far from the other three salaries.
Mode
Another indicator of central tendency is the mode, or the value that occurs most often in a set of
numbers. In the set of weekly earnings in Table 1, the mode would be $350 because it appears
twice and the other values appear only once.
Notation and formulae
MEAN
The mean of a sample is typically denoted as (read as x bar). The mean of a population is
typically denoted as μ (pronounced mew). The sum (or total) of measures is typically denoted
with a Σ. The formula for a sample mean is
Where
x is the midpoint of the interval,
f is the frequency for the interval,
fx is the product of the midpoint times the frequency, and
n is the number of values.
For example, if 8 is the midpoint of a class interval and there are ten measurements in the
interval, fx = 10(8) = 80, the sum of the ten measurements in the interval.
Σ fx denotes the sum of all the products in all class intervals. Dividing that sum by the number of
measurements yields the sample mean for grouped data.
For example, consider the information shown in Table 3.
Therefore, the average price of items sold was about $15.19. The value may not be the exact
mean for the data, because the actual values are not always known for grouped data.
Median for grouped data
As with the mean, the median for grouped data may not necessarily be computed precisely
because the actual values of the measurements may not be known. In that case, you can find the
particular interval that contains the median and then approximate the median.
Using Table 3, you can see that there is a total of 32 measures. The median is between the 16th
and 17th measure; therefore, the median falls within the $11.00 to $15.99 interval. The formula
for the best approximation of the median for grouped data is
OR Median = L + N/2-p.c.f x i
f
where
L is the lower class limit of the interval that contains the median,
n is the total number of measurements,
w is the class width,
f med is the frequency of the class containing the median, and
Σ f b is the sum of the frequencies for all classes before the median class.
Consider the information in Table 4.
As we already know, the median is located in class interval $11.00 to $15.99. So L = 11, n = 32,
w = 4.99, f med = 4, and Σ f b = 14.
Substituting into the formula:
Symmetric distribution
In a distribution displaying perfect symmetry, the mean, the median, and the mode are all at the
same point, as shown in Figure 1.
Figure 1.For a symmetric distribution, mean, median, and mode are equal.
Skewed curves
As you have seen, an outlier can significantly alter the mean of a series
of numbers, whereas the median will remain at the center of the series.
In such a case, the resulting curve drawn from the values will appear to
be skewed, tailing off rapidly to the left or right. In the case of negatively skewed or positively
skewed curves, the median remains in the center of these three measures.
Figure 2 shows a negatively skewed curve.
Figure 2.A negatively skewed distribution, mean < median < mode.
Dispersion
Quartile
In descriptive statistics, the quartiles of a set of values are the three points that divide the data set
into four equal groups, each representing a fourth of the population being sampled.
In epidemiology, sociology and finance, the quartiles of a population are the four subpopulations
defined by classifying individuals according to whether the value concerned falls into one of the
four ranges defined by the three values discussed above. Thus an individual item might be
described as being "in the upper quartile". .
Definitions
first quartile (designated Q1) = lower quartile = splits lowest 25% of data = 25th percentile
second quartile (designated Q2) = median = cuts data set in half = 50th percentile
third quartile (designated Q3) = upper quartile = splits highest 25% of data, or lowest 75% =
75th percentile. .
The difference between the upper and lower quartiles is called the inter quartile range.
If a data set of values is arranged in ascending order of magnitude, then:
The inter quartile range is a more useful measure of spread than the range as it describes the
middle 50% of the data values. .
Computing methods .
Examples:
Method1
Use the median to divide the ordered data set into two halves. Do not include the median into the
halves, or the minimum and maximum.
The lower quartile value is the median of the lower half of the data. The upper quartile value is
the median of the upper half of the data.
This rule is employed by the TI-83 calculator boxplot and "1-Var Stats" functions.
Method2
Use the median to divide the ordered data set into two halves. If the median is a datum (as
opposed to being the average of the middle two data), include the median in both halves.
The lower quartile value is the median of the lower half of the data. The upper quartile value is
the median of the upper half of the data.
Dispersion
A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the
same and increases as the data become more diverse.
Most measures of dispersion have the same units as the quantity being measured. In other words,
if the measurements are in meters or seconds, so is the measure of dispersion. Such measures of
dispersion include:
Dispersion (Measures of):
Measures of dispersion express quantitatively the degree of variation or dispersion of values in a
population or in a sample. Along with measures of central tendency, measures of dispersion are
widely used in practice as descriptive statistics. Some measures of dispersion are the standard
deviation , the average deviation , the range , the inter quartile range .
For example, the dispersion in the sample of 5 values (98,99,100,101,102) is smaller than the
dispersion in the sample (80,90,100,110,120), although both samples have the same central
location - "100", as measured by, say, the mean or the median . Most measures of dispersion
would be 10 times greater for the second sample than for the first one (although the values
themselves may be different for different measures of dispersion).
1. Range:
The range of a data set is easy to calculate, but it is an insensitive measure of variation (does not
change with a change in the distribution of data), and is not very informative.
The range is the most obvious measure of dispersion and is the difference between the lowest
and highest values in a dataset. The range of a data set is easy to calculate, but it is an insensitive
measure of variation (does not change with a change in the distribution of data), and is not very
informative.
If, L1 = lowest measurement, L2 = highest measurement
Then Range = L2 - L1 (Largest value - Smallest value)
And Coefficient of range = L2 - L1/ L2 + L1
An example of the use of the range to compare spread within datasets is provided in table 1. The
scores of individual students in the examination and coursework component of a module are
shown.
To find the range in marks the highest and lowest values need to be found from the table. The
highest coursework mark was 48 and the lowest was 27 giving a range of 21. In the examination,
the highest mark was 45 and the lowest 12 producing a range of 33. This indicates that there was
wider variation in the students’ performance in the examination than in the coursework for this
module.
Since the range is based solely on the two most extreme values within the dataset, if one of these
is either exceptionally high or low (sometimes referred to as outlier) it will result in a range that
is not typical of the variability within the dataset. For example, imagine in the above example
that one student failed to hand in any coursework and was awarded a mark of zero, however they
sat the exam and scored 40. The range for the coursework marks would now become 48 (48-0),
rather than 21, however the new range is not typical of the dataset as a whole and is distorted by
the outlier in the coursework marks.
Merit
The range is an adequate measure of variation for a small set of data, like class scores for a test.
Think of other measures where range might be useful: Salaries for a particular job category; or
Indoor versus outdoor temperatures?
An Exercise in Calculating the Range
In a previous example, we examined the net worth for 8 theoretical individuals. Here again are
the numbers:
$2,000 $10,000 $25,000 $32,000 $45,000 $50,000 $80,000 $23,000,000,000
Range = Largest value - Smallest value
Range = $23,000,000,000 - $2,000
Range = $22,999,998,000
This is obviously a very broad range, but it does not tell us much about the normal circumstances
of most of the members of our data group. A more informative way to describe these numbers
would be that they have a median net worth of $38,500 with a range of $2,000 to
$23,000,000,000.
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation.
Your first step is to find the Mean:
Answer:
600 + 470 + 170 + 430 + 300 1970
Mean = = = 394
5 5
so the mean (average) height is 394 mm. Let's plot this on the chart:
To calculate the Variance, take each difference, square it, and then average the result:
So, the Variance is 21,704.
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation: σ = √21,704 = 147.32... = 147 (to the nearest mm)
And the good thing about the Standard Deviation is that it is useful. Now we can show which
heights are within one Standard Deviation (147mm) of the Mean:
So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what
is extra large or extra small.
Rottweilers are tall dogs. And Dachshunds are a bit short
Now try the Standard Deviation Calculator.
But ... there is a small change with Sample Data
Our example was for a Population (the 5 dogs were the only dogs we were interested in).
But if the data is a Sample (a selection taken from a bigger Population), then the calculation
changes!
When you have "N" data values that are:
The Population: divide by N when calculating Variance (like we did)
A Sample: divide by N-1 when calculating Variance
All other calculations stay the same, including how we calculated the mean.
Example: if our 5 dogs were just a sample of a bigger population of dogs, we would divide by 4
instead of 5 like this:
Sample Variance = 108,520 / 4 = 27,130
Sample Standard Deviation = √27,130 = 164 (to the nearest mm)
The formulae for the variance and standard deviation are given below. means the mean of the
data.
(x
Variance = 2 = 2 r n -
)
The standard deviation, , is the square root of the variance.
What the formula means:
(1) xr - means take each value in turn and subtract the mean from each value.
(2) (Xr - )2 means square each of the results obtained from step (1). This is to get rid of any
minus signs.
(3) (xr - )2 means add up all of the results obtained from step (2).
(4) Divide step (3) by n, which is the number of numbers
(5) For the standard deviation, square root the answer to step (4).
Example
Find the variance and standard deviation of the following numbers: 1, 3, 5, 5, 6, 7, 9, 10 .
The mean = 46/ 8 = 5.75
(Step 1): (1 - 5.75), (3 - 5.75), (5 - 5.75), (5 - 5.75), (6 - 5.75), (7 - 5.75), (9 - 5.75), (10 - 5.75)
= -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
(Step 2): 22.563, 7.563, 0.563, 0.563, 0.063, 1.563, 10.563, 18.063
(Step 3): 22.563 + 7.563 + 0.563 + 0.563 + 0.063 + 1.563 + 10.563 + 18.063
= 61.504
(Step 4): n = 8, therefore variance = 61.504/ 8 = 7.69 (3sf)
(Step 5): standard deviation = 2.77 (3sf)
Adding or Multiplying Data by a Constant
If a constant, k, is added to each number in a set of data, the mean will be increased by k and the
standard deviation will be unaltered (since the spread of the data will be unchanged).
If the data is multiplied by the constant k, the mean and standard deviation will both be
multiplied by k.
Grouped Data
There are many ways of writing the formula for the standard deviation. The one above is for a
basic list of numbers. The formula for the variance when the data is grouped is as follows. The
standard deviation can be found by taking the square root of this value.
Example: The table shows marks (out of 10) obtained by 20 people in a test
Mark (x) Frequency (f)
1 0
2 1
3 1
4 3
5 2
6 5
7 5
8 2
9 0
10 1
Work out the variance of this data.
In such questions, it is often easiest to set your working out in a table:
fx fx2
0 0
2 4
3 9
12 48
10 50
30 180
35 245
16 128
0 0
10 100
f=20
fx=118
Sfx2 = 764
variance=fx2-(fx)2
f(f)2
=764-(118)2
20(20)2
= 38.2 - 34.81 = 3.39
Quartiles
If we divide a cumulative frequency curve into quarters, the value at the lower quarter is referred
to as the lower quartile, the value at the middle gives the median and the value at the upper
quarter is the upper quartile.
A set of numbers may be as follows: 8, 14, 15, 16, 17, 18, 19, 50. The mean of these numbers is
19.625 . However, the extremes in this set (8 and 50) distort the range. The inter-quartile range is
a method of measuring the spread of the numbers by finding the middle 50% of the values.
It is useful since it ignore the extreme values. It is a method of measuring the spread of the data.
The lower quartile is (n+1)/4 th value (n is the cumulative frequency, i.e. 157 in this case) and
the upper quartile is the 3(n+1)/4 the value. The difference between these two is the inter-quartile
range (IQR).
In the above example, the upper quartile is the 118.5th value and the lower quartile is the 39.5th
value. If we draw a cumulative frequency curve, we see that the lower quartile, therefore, is
about 17 and the upper quartile is about 37. Therefore the IQR is 20 (bear in mind that this is a
rough sketch- if you plot the values on graph paper you will get a more accurate value).
2. Inter quartile range (IQR)
In descriptive statistics, the inter quartile range (IQR), also called the mid spread or middle
fifty, is a measure of statistical dispersion, being equal to the difference between the upper and
lower quartiles,
IQR = Q3 − Q1.
In other words, the IQR is the 1st Quartile subtracted from the 3rd Quartile; these quartiles can
be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed
mid-range, and is the most significant basic scale. The inter quartile range has a breakdown point
of 25%, and is thus often preferred to the total range.
The IQR is used to build box plots, simple graphical representations of a probability distribution.
For a symmetric distribution (where the median equals the midline, the average of the first and
third quartiles), half the IQR equals the median absolute deviation (MAD).
The median is the corresponding measure of central tendency.
Data set in a table
i x[i] Quartile
1 102 For the data in this table the inter quartile range is IQR = 115 − 105 = 10.
2 104 Outliers are observations that fall below Q1 - 1.5(IQR) or above Q3 +
3 105 -Q1 1.5(IQR). In a box plot, the highest and lowest occurring value within
this limit are drawn as bar of the whiskers, and the outliers as individual
4 107
points.
5 108
-Q
6 109 2
(median)
7 110 Coefficient of Inter-Quartile Range:
8 112 Q3 − Q1
9 115 -Q3 Q3+Q1
10 116
11 118 3. Semi-Inter –Quartile Range OR Quartile Deviation
It is based on the lower quartile and the upper quartile . The
difference is called the inter quartile range. The difference divided by
is called semi-inter-quartile range or the quartile deviation. Thus Q.D
Solution:
After arranging the observations in ascending order, we get
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730, 1750,
1755, 1785, 1880, 1885, 1960. .
Where x represents each value in the population, μ is the mean value of the population, Σ is the
summation (or total), and N is the number of values in the population.
The standard deviation of a sample is known as S and is calculated using:
Where x represents each value in the population, x is the mean value of the sample, Σ is the
summation (or total), and n-1 is the number of values in the sample minus 1.
For a finite set of numbers, the standard deviation is found by taking the square root of the
average of the squared differences of the values from their average value. For example, consider
a population consisting of the following eight values:
These eight data points have the mean (average) of 5:
First, calculate the difference of each data point from the mean, and square the result of each:
Next, calculate the mean of these values, and take the square root:
4+4-4-4
=0
4
That looks good (and is the Mean Deviation), but what about this case:
Oh No! It also gives a value of 4, Even though the differences are more spread out!
So let us try squaring each difference (and taking the square root at the end):
42 + 42 + 42 + 42 64
√ =√ =4
4 4
72 + 12 + 62 + 22 90
√ =√ = 4.74...
4 4
That is nice! The Standard Deviation is bigger when the differences are more spread out ... just
what we want!
In fact this method is a similar idea to distance between points, just applied in a different way.
And it is easier to use algebra on squares and square roots than absolute values, which makes the
standard deviation easy to use in other areas of mathematics.
Return to Top
Pie Chart:
A pie chart (or a circle graph) is a circular chart divided into sectors, illustrating proportion. In
a pie chart, the arc length of each sector (and consequently its central angle and area), is
proportional to the quantity it represents. When angles are measured with 1 turn as unit then a
number of percent is identified with the same number of centiturns. Together, the sectors create a
full disk. It is named for its resemblance to a pie which has been sliced. The earliest known pie
chart is generally credited to William Playfair's Statistical Breviary of 1801.[1][2]
The pie chart is perhaps the most widely used statistical chart in the business world and the mass
media.[3] However, it has been criticized,[4] and some recommend avoiding it,[5][6][7][8] pointing out
in particular that it is difficult to compare different sections of a given pie chart, or to compare
data across different pie charts. Pie charts can be an effective way of displaying information in
some cases, in particular if the intent is to compare the size of a slice with the whole pie, rather
than comparing the slices among them.[1] Pie charts work particularly well when the slices
represent 25 to 50% of the data,[9] but in general, other plots such as the bar chart or the dot plot,
or non-graphical methods such as tables, may be more adapted for representing certain
information. It also shows the frequency within certain groups of information.
Example
Three sets of data plotted using pie charts and bar charts.
Pie charts are common in journalism. However statisticians generally regard pie charts as a poor
method of displaying information, and they are uncommon in scientific literature. One reason is
that it is more difficult for comparisons to be made between the size of items in a chart when
area is used instead of length and when different items are shown as different shapes.
Further, in research performed at AT&T Bell Laboratories, it was shown that comparison by
angle was less accurate than comparison by length. This can be illustrated with the diagram to
the right, showing three pie charts, and, below each of them, the corresponding bar chart
representing the same data. Most subjects have difficulty ordering the slices in the pie chart by
size; when the bar chart is used the comparison is much easier. Similarly, comparisons between
data sets are easier using the bar chart. However, if the goal is to compare a given category (a
slice of the pie) with the total (the whole pie) in a single chart and the multiple is close to 25 or
50 percent, then a pie chart can often be more effective than a bar graph. However, the research
of Spence and Lewandowsky did not find pie charts to be inferior. Participants were able to
estimate values with pie charts just as well as with other presentation forms.
Variants and similar charts
Exploded pie chart
An exploded pie chart for the example data, with the largest party group exploded.
A chart with one or more sectors separated from the rest of the disk is known as an exploded pie
chart. This effect is used to either highlight a sector, or to highlight smaller segments of the chart
with small proportions.
Bar Charts
Bar charts compare distinct items or show single items at distinct intervals. Usually, a bar chart is
laid out with categories along the vertical axis and values along the horizontal axis. In other
words, the bars are horizontally placed on the page. Bar charts are useful for comparing data
items that are in competition, so it makes sense to place the longest bars on top and the others in
descending order beneath the longest one.
A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values
that they represent. The bars can be plotted vertically or horizontally.
Bar charts are used for marking clear data which has discrete values. Some examples of
discontinuous data include 'shoe size' or 'eye color', for which a bar chart is appropriate. In
contrast, some examples of continuous data would be 'height' or 'weight'. A bar chart is very
useful for recording certain information whether it is continuous or not continuous data. Bar
charts also look a lot like a histogram. They are often mistaken for each other .The first bar graph
appeared in the 1786 book The Commercial and Political Atlas, by William Playfair (1759-
1823). Playfair was a pioneer in the use of graphical displays and wrote extensively about them.
This formula is usually written in a slightly different manner using the Greek capitol letter, ,
pronounced "sigma", which means "sum of...":
You may have noticed that the above formula refers to the sample mean. So, why call have we
called it a sample mean? This is because, in statistics, samples and populations have very
different meanings and these differences are very important, even if, in the case of the mean,
they are calculated in the same way. To acknowledge that we are calculating the population
mean and not the sample mean, we use the Greek lower case letter "mu", denoted as µ:
The mean is essentially a model of your data set. It is the value that is most common. You will
notice, however, that the mean is not often one of the actual values that you have observed in
your data set. However, one of its important properties is that it minimises error in the prediction
of any one value in your data set. That is, it is the value that produces the lowest amount of error
from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part of the
calculation. In addition, the mean is the only measure of central tendency where the sum of the
deviations of each value from the mean is always zero.
When not to use the mean
The mean has one main disadvantage: it is particularly susceptible to the influence of outliers.
These are values that are unusual compared to the rest of the data set by being especially small or
large in numerical value. For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this
mean value might not be the best way to accurately reflect the typical salary of a worker, as most
workers have salaries in the $12k to 18k range. The mean is being skewed by the two large
salaries. Therefore, in this situation we would like to have a better measure of central tendency.
As we will find out later, taking the median would be a better measure of central tendency in this
situation.
Another time when we usually prefer the median over the mean (or mode) is when our data is
skewed (i.e. the frequency distribution for our data is skewed). If we consider the normal
distribution - as this is the most frequently assessed in statistics - when the data is perfectly
normal then the mean, median and mode are identical. Moreover, they all represent the most
typical value in the data set. However, as the data becomes skewed the mean loses its ability to
provide the best central location for the data as the skewed data is dragging it away from the
typical value. However, the median best retains this position and is not as strongly influenced by
the skewed values. This is explained in more detail in the skewed distribution section later in this
guide.
Median
The median is the middle score for a set of data that has been arranged in order of magnitude.
The median is less affected by outliers and skewed data. In order to calculate the median,
suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case 56 (highlighted in bold). It is the middle mark
because there are 5 scores before it and 5 scores after it. This works fine when you have an odd
number of scores but what happens when you have an even number of scores? What if you had
only 10 scores? Well, you simply have to take the middle two scores and average the result. So,
if we look at the example below:
65 55 89 56 35 14 56 55 87 45
We again rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Only now we have to take the 5th and 6th score in our data set and average them to get a median
of 55.5.
Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest bar
in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most
popular option. An example of a mode is presented below:
Normally, the mode is used for categorical data where we wish to know which is the most
common category as illustrated below:
We can see above that the most common form of transport, in this particular data set, is the bus.
However, one of the problems with the mode is that it is not unique, so it leaves us with
problems when we have two or more values that share the highest frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data. This is
particularly problematic when we have continuous data, as we are more likely not to have any
one value that is more frequent than the other. For example, consider measuring 30 peoples'
weight (to the nearest 0.1 kg). How likely is it that we will find two or more people with exactly
the same weight, e.g. 67.4 kg? The answer, is probably very unlikely - many people might be
close but with such a small sample (30 people) and a large range of possible weights you are
unlikely to find two people with exactly the same weight, that is, to the nearest 0.1 kg. This is
why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure of central
tendency when the most common mark is far away from the rest of the data in the data set, as
depicted in the diagram below:
In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is
not representative of the data, which is mostly concentrated around the 20 to 30 value range. To
use the mode to describe the central tendency of this data set would be misleading.
Skewed Distributions and the Mean and Median
We often test whether our data is normally distributed as this is a common assumption
underlying many statistical tests. An example of a normally distributed set of data is presented
below:
When you have a normally distributed sample you can legitimately use both the mean or the
median as your measure of central tendency. In fact, in any symmetrical distribution the mean,
median and mode are equal. However, in this situation, the mean is widely preferred as the best
measure of central tendency as it is the measure that includes all the values in the data set for its
calculation, and any change in any of the scores will affect the value of the mean. This is not the
case with the median or mode.
However, when our data is skewed, for example, as with the right-skewed data set below:
we find that the mean is being dragged in the direct of the skew. In these situations, the median is
generally considered to be the best representative of the central location of the data. The more
skewed the distribution the greater the difference between the median and mean, and the greater
emphasis should be placed on using the median as opposed to the mean. A classic example of the
above right-skewed distribution is income (salary), where higher-earners provide a false
representation of the typical income if expressed as a mean and not a median.
If dealing with a normal distribution and tests of normality show that the data is non-normal,
then it is customary to use the median instead of the mean. This is more a rule of thumb than a
strict guideline however. Sometimes, researchers wish to report the mean of a skewed
distribution if the median and mean are not appreciably different (a subjective assessment) and if
it allows easier comparisons to previous research to be made.
Summary of when to use the mean, median and mode
Please use the following summary table to know what the best measure of central tendency is
with respect to the different types of variable.
Type of Variable Best measure of central tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median