0% found this document useful (0 votes)

65 views

Module 1: Nature of Statistics

The document provides a history of the development of statistics from 450 BC to 1911 AD. Some key developments include: - 450 BC: Hippias of Elis uses the mean to date the first Olympic Games. - 1654: Pascal and Fermat lay the foundations of probability theory in correspondence about gambling. - 1663: John Graunt uses parish records to estimate London's population, an early example of census data. - 1808: Gauss derives the normal distribution, fundamental to studying variation and error. - 1859: Florence Nightingale uses statistics to influence health reforms in the Crimean War, an early example of data visualization. - 1908: William Sealy Gosset

Uploaded by

Hannah Bea Lindo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Module 1: Nature of Statistics

Uploaded by

Hannah Bea Lindo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 47

MODULE 1: NATURE OF STATISTICS

Introduction

Statistical Thinking will one day be as necessary for efficient citizenship as the ability to read
and write (H. G. Wells).

In 2017, The Economist published one of the striking changes in the world economy. It claims
that the world’s most valuable resource is no longer oil, but data. The five biggest tech giants –
Google, Amazon, Apple, Facebook, and Microsoft – had been taking advantage and profiting from
making use of consumer/customer data. This phenomenon prompted professionals to the use
of statistics and later popularizing the concept of data science.

To date, many business companies all over the world are hiring statisticians and data
scientists to further their competitive advantage. Since everything is data and everyone needs it
analyzed, you need to learn the important knowledge and skills of statistics. As Florence Nightingale
puts it,

“Statistics… is the most important science in the whole world: for upon it depends the practical
application of every other science and of every art; the one science essential to all political and social
administration, all education, all organization based upon experience, for it only gives the results of
our experience.”

Lesson 1: History and Development of Statistics

Statistics as a science and art had undergone a series of development and refinement
through time all over the world. Many experts from different fields such as medicine and health,
philosophy, mathematics, and science contributed to strengthening the foundations of the field of
statistics. Some of these notable developments are highlighted in the Timeline of Statistics designed
by Tom Fryer and on the notes on history of statistics by Sweetland.

 450BC Hippias of Elis uses average value of the length of a king’s reign (mean) to work out the
date of the first Olympic Games, some 300 years before his time
 431BC Attackers besieging Plataea in the Peloponnesian War calculate the height of the wall by
counting the number of bricks. The count was repeated several times by different soldiers. The
most frequent value (mode) was taken to be the most likely. Multiplying it by the height of the
brick allowed them to calculate the length of the ladders needed to scale the
 400BC In the Indian epic Mahabharata, King Rtuparna estimates the number of fruits and leaves
(2095 fruit and 50 000 000 leaves) on two great branches of a vibhitaka tree by counting the
number on a single twig, then multiplying by the number of The estimate is found to be very
close to the actual number. This is the first recorded example of sampling
 – “but this knowledge is kept secret”, says the account.
 2AD Chinese census under the Han Dynasty finds 57.67 million people in 12.36 million
households – the first census from which data survives, and still considered by scholars to have
been
 7AD Census by Quirinus, governor of the Roman province of Judea, is mentioned in Luke’s
Gospel as causing Joseph and Mary to travel to Bethlehem to be
 840 Islamic mathematician Al-Kindi uses frequency analysis – the most common symbol in a
coded message will stand for the most common letters – to break secret codes. Al-Kindi also
introduces Arabic numerals to
 10th century The earliest known graph, in a commentary on a book by Cicero, shows the
movements of the planets through the It is apparently intended for use in monastery schools.
 1069 Domesday Book: survey for William the Conqueror of farms, villages, and livestock in his
new kingdom – the start of official statistics in
 1150 Trial of the Pyx, an annual test of the purity of coins from the Royal Mint, Coins are drawn
at random, in fixed proportions to the number minted. It continues to this day.
 1188 Gerald of Wales completed the first population census of
 1303 A Chinese diagram entitled “The Old Method Chart of the Seven Multiplying Squares”
shows the binomial coefficients up to the eighth power – the numbers that are fundamental to
the mathematics of probability, and that appeared five hundred years later in the west as
Pascal’s triangle.
 1346 Giovanni Villani’s Nouva Cronica gives statistical information on the population and trade
of
 1560 Gerolamo Cardano calculates probabilities of different dice throws for
 1570 Astronomer Tycho Brahe uses the arithmetic mean to reduce errors in his estimates of
the locations of stars and
 1644 Michael van Langren draws the first known graph of statistical data that shows the size of
possible It is of different estimates of the distance between Toledo and Rome.
 1654 Pascal and Fermat correspond about dividing stakes in gambling games and together
create the mathematical theory of
 1657 Huygen’s On the Reasoning in Games of Chance is the first book on probability He also
invented the pendulum clock.
 1663 John Graunt uses parish records to estimate the population of
 1693 Edmund Halley prepares the first mortality tables statistically relating death rates to age –
the foundation of life insurance. He also drew a stylized map of the path of a solar eclipse over
England – one of the first data visualization
 1713 Jacob Bernoulli’s Ars conjectandi derives the law of large numbers – the more often you
repeat an experiment, the more accurately you can predict the
 1728 Voltaire and his mathematician friend de la Condamine spot that a Paris bond lottery is
offering more prize money than the total cost of the tickets; they corner the market and win
themselves a
 1749 Gottfried Achenwall coins the word statistics (in German, Statistik); he means the
information you need to run a nation-state.
 1757 Casanova becomes a trustee of, and may have had a hand in devising the French national
 1761 The Rev. Thomas Bayes proves Baye’s theorem – the cornerstone of conditional probability
and testing of beliefs and
 1786 William Playfair introduces graphs and bars charts to show economic
 1789 Gilbert White and other clergymen-naturalists keep records of temperatures, dates of first
snowdrops and cuckoos, etc; the data is later useful for the study of climate
 1790 First US census, taken by men on horseback directed by Thomas Jefferson, counts
 3.9 million Americans.
 1791 First use of the word statistics in English, by Sir Joh Sinclair in his Statistical Account of
 1805 Adrien-Marie Legendre introduces the method of least squares for fitting a curve to a given
set of
 1808 Gauss, with contributions from Laplace, derives the normal distribution – the bell- shaped
curve fundamental to the study of variation and error.
 1833 The British Association for the Advancement of Science sets up a statistics section. Thomas
Malthus, who analyzed population growth, and Charles Babbage are members. It later becomes
the Royal Statistical
 1835 Belgian Adolphe Quetelet’s Treatise on Man introduces social science statistics and the
concept of the average man – his height, body, mass index, and
 1839 The American Statistical Association is formed. Alexander Graham Bell, Andrew Carnegie,
and President Martin Van Buren will become
 1840 William Farr sets up the official system for recording causes of death in England and Wales.
This allows epidemics to be tracked and disease compared – the start of medical statistics.
 1849 Charles Babbage designs his difference engine, embodying the ideas of data handling and
the modern computer. Ada Lovelace, Lord Byron’s niece, writes the world’s first computer
program for
 1854 John Snow’s cholera map pins down the source of an outbreak as a water pump in Broad
street, London, beginning the modern study of
 1859 Florence Nightingale uses statistics of Crimean War casualties to influence public opinion
and the War Office. She shows casualties month by month on a circular chart she devises, the
Nightingale rose, the forerunner of the pie She is the first woman member of the Royal
Statistical Society and the first overseas member of the American Statistical Association.
 1868 Minard’s graphic diagram of Napoleon’s March on Moscow shows on one
diagram the distance covered, the number of men still alive at each kilometer of the march, and
the temperatures they encountered on the
 1877 Francis Galton, Darwin’s cousin, describes regression to the mean. In 1888 he introduces
the concept of correlation. At a Guess the weight of an Ox contest in Devon he
describes the Wisdom of Crowds – that the average of many uninformed guesses is close to the
correct
 1886 Philanthropist Charles Booth begins his survey of the London poor, to produce his poverty
map of Areas were colored black, for the poorest, through to yellow for the upper-middle class
and wealthy.
 1894 Karl Pearson introduces the term standard If errors are normally distributed, 68% of
samples will lie within one standard deviation of the mean. Later he develops chi- squared tests
for whether two variables are independent of each other.
 1898 Von Bortkiewicz’s data on deaths of the soldier in the Prussian army from horse kicks
shows that apparently rare events follow a predictable pattern, the Poisson
 1900 Louis Bachelier shows that fluctuations in stock market prices behave in the same way as
the random Brownian motion of molecules – the start of financial
 1908 William Sealy Gosset, chief brewer for Guinness in Dublin, describes the t-test. It uses a
small number of samples to ensure that every brew tastes equally
 1911 Herman Hollerith, inventor of punchcard devices used to analyze data in the US census,
merges his company to form what will become IBM, pioneers of machines to handle business
data, and early
 1916 During the First World War, car designer Frederick Lanchester develops statistical laws to
predict the outcomes of aerial battles: if you double their size, land armies are only twice as
strong, but air forces are four times as
 1924 Walter Shewart invents the control chart to aid industrial production and
 1935 George Zipf finds that many phenomena – river lengths, city populations – obey a power
law so that the largest is twice the size of the second-largest, three times the size of the third,
and so R. A. Fisher revolutionizes modern statistics. His Design of Experiments gives ways of
deciding which results of scientific experiments are significant and which are not.
 1937 Jerzy Neyman introduces confidence intervals in statistical testing. His work leads to
modern scientific
 1940-45 Alan Turing at Bletchley Park cracks the German wartime Enigma code, using advanced
Bayesian statistics and Colossus, the first programmable electronic
 1944 The German tank problem: the Allies desperately need to know how many Panther tanks
they will face in France on D-Day. Statistical analysis of the serial numbers on gearboxes from
captured tanks indicates how many of each are being
 Statisticians predict 270 a month; reports from intelligence sources predict many fewer. The
total turned out to be 276. Statistics had outperformed spies.
 1948 Claude Shannon introduces information theory and the bit – fundamental to the digital
age.
 1948-53 The Kinsey Report gathers objective data on human sexual A large-scale survey of 5000
men and, later, 5000 women, causes an outrage.
 1950 Richard Doll and Bradford Hill establish the link between cigarette smoking and lung
cancer. Despite fierce opposition, the result is conclusively proved, to huge public health
benefit.
 1950s Genichi Taguchi’s statistical methods to improve the quality of automobile and electronics
components revolutionize the Japanese industry, which far overtakes western European
 1958 The Kaplan-Meier estimator gives doctors a simple statistical way of judging which
treatments work best. It has saved millions of
 1972 David Cox introduced the proportional hazard model and the concept of partial likelihood.
 1977 John Tukey introduces the box-plot or box-and-whisker diagram, which shows the
quartiles, medians, and spread in a single
 1979 Bradley Efron introduces bootstrapping, a simple way to estimate the distribution of
almost any sample of
 1982 Edward Tufte self-publishes The Visual Display of Quantitative Information, setting new
standards for graphic visualization of
 1988 Margaret Thatcher becomes the first world leader to call for action on climate
 1993 The statistical programming language R is released, now a standard statistical
 1997 The term Big Data first appears in
 2002 The amount of information stored digitally surpasses non-digital. Paul DePodesta uses
statistics – sabermetrics – to transform the fortunes of the Oakland Athletics baseball team; the
film Moneyball tells the
 2004 Launch of Significance magazine
 2008 Hal Varian, chief economist at Google, says that statistics will be the sexy profession of the
next ten
 2012 Nate Silver, statistician, successfully predicts the result in all 50 states in the US
Presidential election. He becomes a media star and sets up what may be an over-reliance on
statistical analysis for the 2016 election. The Large Hadron Collider confirms the existence of a
Higgs boson with a probability of five standard deviations the data is a coincidence.

Lesson 2: Basic Concepts of Statistics

Statistics refers to the scientific study that deals with the collection, organization and
presentation, analysis and interpretation of data.

Two Divisions of Statistics

1. Descriptive Statistics. This refers to the statistical procedures concerned with
describing the characteristics and properties of a group of persons, places, or It
organizes the presentation, description, and interpretation of data gathered without
trying to infer anything that goes beyond the data. The most common measures used
to describe data include the measures of central tendency (mean, median, mode),
measures of variation (range, variance, standard deviation, etc.), kurtosis, skewness,
etc.

Sample Research Questions (Objectives):

1. What is the demographic profile of the respondents? (describe the
demographic profile of the respondents)
2. What are the characteristics and qualifications do school principals look for in
a potential teacher applicant? (determine the characteristics and qualifications
that principals look for in a potential teacher applicant)
3. Which group of learners have the best performance in the national
achievement test? (identify which group of learners have the best
performance in the national achievement test)
4. How did the graduates of teacher education institutions perform in the
licensure examinations? (assess the licensure examination performance of the
graduates from teacher education institutions)
5. What are the factors that affect the implementation of the school program?
(determine the different factors that affect the implementation of the school
program)

2. Inferential Statistics. This refers to statistical procedures that are sued to draw
inferences for a large group of people, places, or things (population) based on the
information obtained from a small portion (sample) taken from a large group. The
most common procedures include the tests of difference between and among
groups, the test of relationship and association, and test of effects.

Sample Research Questions (Objectives):
1. To what degree do NCAE ratings predict freshman college GPA? (ascertain if
freshman college GPA can be predicted by NCAE ratings)
2. To what extent do entry-level qualifications of graduates of teacher education
programs increase the likelihood of developing proficient teachers? (ascertain if
entry- level qualifications of graduates of teacher education programs increase
the likelihood of developing proficient teachers)
3. How do K-3 pupils from different socio-economic status compare in their
reading and mathematics achievement after adjusting for family type? (compare
the reading and mathematics achievement of the K-3 pupils from different socio-
economic status after adjusting for family type)
4. How do male and female learners differ in the national achievement test?
(ascertain if results of the national achievement test differ between sexes)

Population and Sample

Population refers to a large collection of people, objects, places, or things. Any numerical
value that describes a population is called a parameter. A sample refers to the small portion or subset
of the population. Any numerical value that describes a sample is called statistics.

Example: The Department of Education, in a press brief, stated that the average rating of 1 000 000
high school students all over the country who took the examination is 94%. A division supervisor
would like to study the performance of high school students in the national achievement test from
their schools division. Eighteen thousand high school students from their division had an average
rating of 92%.

Population: All high school students who took the national achievement test
Parameter: N = 1 000 000 high school students, average rating of 94%
Sample: All high school students from the specific schools division who took the national achievement
test
Statistics: n = 18 000 high school students, an average rating of 92%

Variable, Data, and Indicators

A variable is a characteristic or property of a population or sample which makes the members
different from each other. Variables can be classified as follows.

1. Independent Variable. This is the one thing you change. It is the variable that affects
another
2. Dependent It is the variable being affected by another variable. The change that
happens is due to the influence of the independent variable.
3. Controlled This is the variable that you want to remain constant and unchanging.
4. Quantitative Variable. This is expressed as a number or can be
quantified. Types of Quantitative Variable
1. Discrete Variable. This variable has a
countable number of possible values in a finite
amount of
2. Continuous Variable. This variable can take
on any value between two specified values.
5. Qualitative Variable. This is information that can’t be expressed as a number; thus,
these are not

Example: To what degree do NCAE ratings predict college freshman GPA?
Independent Variable: NCAE ratings
Dependent Variable: college freshman GPA
Controlled Variable: Type of examination and test items

Both the NCAE ratings and college freshman GPA are quantitative, continuous variables. The number
of college freshman students is a discrete variable. The profile of the college freshman students such
as the program of study, sex, and school last attended are qualitative variables.

Data are facts or values gathered or observed from samples or population being
studied. Indicators are data that directly measure the being studied. To be able to gather significant
and relevant data, indicators for each variable of interest must be established first. This will make
analysis and interpretation a lot easier and convenient.

Example: The school principal wants to know if the feeding program implemented among K-3 pupils
for the past 6 months has been successful. The data/indicators that she may look for are the pupils'
weight and height before and after the feeding program.

Example: What is the socio-economic status of pupils and students in different private and public
schools in Batangas?
The variable socio-economic status is broad in scope and data may vary depending on the group of
persons being studied. Data or indicators may include parents’ educational attainment, parents’
occupation, household income, and other household conditions (house ownership,
appliances/gadgets, etc.)

Most research data can be classified into one of the three basic categories.

Category 1: A single group of participants with one score per participant. This type of data often
exists in research studies that are conducted simply to describe individual variables as they exist
naturally. Although several variables are being measured, the intent is to look at all of them one at a
time and there is no attempt to examine relationship nor difference between variables.

Category 2: A single group of participants with two or more variables measured for each participant.
The research study is specifically intended to examine relationships or differences between variables.
However, there is no attempt to control or manipulate the variables.

Category 3: Two or more groups with each score measurement of the same variable. This involves
independent-measures and repeated-measures designs.

Levels of Measurement

Measurement levels refer to different types of variables that imply how to analyze them.

1. It is a variable whose values don’t have an undisputed order. It may have two or
more exhaustive, non-overlapping categories but there is no intrinsic ordering of the
categories. Examples: sex, socioeconomic status, civil status, school division, religious
affiliation, mother-tongue
2. It holds the value that has an undisputed order but no fixed unit of measurement. An
ordinal variable is similar to a nominal variable except that there is a clear ordering of
the variables. Although, the difference between each range cannot be stated with
certainty. Examples: rating scales (Likert scales), shoe/shirt sizes, ranking, monthly
income (range)
3. An interval variable is similar to ordinal data except that the ranges are equally
spaced. It has a fixed unit of measurement but zero does not mean anything.
Examples: temperature, pressure, IQ score, mental ability ratings
4. A ratio variable is an interval variable with a true zero. It has a fixed unit of
measurement and zero means nothing. Example: weight, height, age, income

Lesson 3: Summation Notation

Summation notation is a convenient and simple way that is used to give a concise expression
for a sum of values of a variable. It is commonly used to express statistical formulas. It involves the
following symbols.
Example: Express the following summation as a sum of individual terms.

Evaluate the following summations.

MODULE 2: DATA COLLECTION AND SAMPLING DESIGN

Introduction

After identifying your research problem, the next step is to collect appropriate and relevant
data. Data collection is crucial to the success of any investigation or study. If the investigator was not
able to collect enough relevant data, the findings and results of the study will be affected; thus,
conclusions, generalization, or implications derived from the available data may not be reliable or
valid. Becoming an expert in data collection methods and techniques require time and effort.
Guidance from an experienced researcher or statistician may help you in working your data collection
and sampling design

Lesson 1: Sources of Data and Data Collection Methods

Data collection is a methodical process of gathering and analyzing specific information to give
solutions to relevant research questions.

Characteristics of a Good Data

Ortega (2017) outlines seven (7) characteristics that define quality data.

1. Accuracy and Precision: This characteristic refers to the exactness of the It cannot have any
erroneous elements and must convey the correct message without being misleading. This
accuracy and precision have a component that relates to its intended use. Without
understanding how the data will be consumed, ensuring accuracy and precision could be off-
target or more costly than necessary. For example, accuracy in healthcare might be more
important than in another industry (which is to say, inaccurate data in healthcare could have
more serious consequences) and, therefore, justifiably worth higher levels of investment.
2. Legitimacy and Validity: Requirements governing data set the boundaries of this characteristic.
For example, on surveys, items such as gender, ethnicity, and nationality are typically limited to
a set of options, and open answers are not Any answers other than these would not be
considered valid or legitimate based on the survey’s requirement. This is the case for most data
and must be carefully considered when determining its quality. The people in each department
in an organization understand what data is valid or not to them, so the requirements must be
leveraged when evaluating data quality.
3. Reliability and Consistency: Many systems in today’s environments use and/or collect the same
source data. Regardless of what source collected the data or where it resides, it cannot
contradict a value residing in a different source or collected by a different There must be a
stable and steady mechanism that collects and stores the data without contradiction or
unwarranted variance.
4. Timeliness and Relevance: There must be a valid reason to collect the data to justify the effort
required, which also means it has to be collected at the right moment in Data collected too soon
or too late could misrepresent a situation and drive inaccurate decisions.
5. Completeness and Comprehensiveness: Incomplete data is as dangerous as inaccurate data.
Gaps in data collection lead to a partial view of the overall picture to be displayed. Without a
complete picture of how operations are running, uninformed actions will occur. It’s important to
understand the complete set of requirements that constitute a comprehensive set of data to
determine whether or not the requirements are being
6. Availability and Accessibility: This characteristic can be tricky at times due to legal and regulatory
constraints. Regardless of the challenge, though, individuals need the right level of access to the
data to perform their This presumes that the data exists and is available for access to be
granted.
7. Granularity and Uniqueness: The level of detail at which data is collected is important because
confusion and inaccurate decisions can otherwise occur. Aggregated, summarized, and
manipulated collections of data could offer a different meaning than the data implied at a lower
An appropriate level of granularity must be defined to provide sufficient uniqueness and
distinctive properties to become visible. This is a requirement for operations to function
effectively.

Types of Data
1. Primary Data. These are data collected by the investigator himself/ herself for a
specific purpose. For instance, the data collected by an investigator for their research
projects is an example of primary
2. Secondary These are data collected by someone else for some other purposes, but
the being utilized by the current investigator for another purpose. For instance, the
census data is used to analyze the impact of education on career choice, and earning
is an example of secondary data.

Data Collection Tools and Instruments (Bhat, 2020)

1. Interview Method. The interviews conducted to collect quantitative data are more structured,
wherein the researchers ask only a standard set of questionnaires and nothing more than that.
There are three major types of interviews conducted for data collection
 Telephone interviews: For years, telephone interviews ruled the charts of data collection
However, nowadays, there is a significant rise in conducting video interviews using the internet,
Skype, or similar online video calling platforms.
 Face-to-face interviews: It is a proven technique to collect data directly from the participants. It
helps in acquiring quality data as it provides a scope to ask detailed questions and probing
further to collect rich and informative data. Literacy requirements of the participant are
irrelevant as face-to-face interviews offer ample opportunities to collect non-verbal data
through observation or to explore complex and unknown issues. Although it can be an
expensive and time-consuming method, the response rates for face-to-face interviews are often
 Computer-Assisted Personal Interviewing (CAPI): It is nothing but a similar setup of the face-to-
face interview where the interviewer carries a desktop or laptop along with him at the time of
interview to upload the data obtained from the interview directly into the database. CAPI saves
a lot of time in updating and processing the data and also makes the entire process paperless as
the interviewer does not carry a bunch of papers and

2. Survey or Questionnaire Method. The checklists and rating scale type of questions make the bulk of
quantitative surveys as it helps in simplifying and quantifying the attitude or behavior of
the respondents.
 Web-based questionnaire: This is one of the ruling and most trusted methods for internet-based
research or online In a web-based questionnaire, the receive an email containing the survey link,
clicking on which takes the respondent to a secure online survey tool from where he/she can
take the survey or fill in the survey questionnaire.
 Mail Questionnaire: In a mail questionnaire, the survey is mailed out to a host of the sample
population, enabling the researcher to connect with a wide range of audiences. The mail
questionnaire typically consists of a packet containing a cover sheet that introduces the
audience about the type of research and reason why it is being conducted along with a prepaid
return to collect data
3. Observation Method. In this method, researchers collect quantitative data through systematic
observations by using techniques like counting the number of people present at the specific
event at a particular time and a particular venue or number of people attending the event in a
designated place. Structured observation is more used to collect quantitative rather than
qualitative
 Structured observation: In this type of observation method, the researcher has to make careful
observations of one or more specific behaviors in a more comprehensive or structured setting
compared to naturalistic or participant observation. In a structured observation, the
researchers, rather than observing everything, focus only on very specific behaviors of It allows
them to quantify the behaviors they are observing. When the observations require a judgment
on the part of the observers – it is often described as coding, which requires a clearly defining a
set of target behaviors.
4.Documents and Records. Document review is a process used to collect data after reviewing the
existing documents. It is an efficient and effective way of gathering data as documents are
manageable and are the practical resource to get qualified data from the past. Three primary document
types are being analyzed for collecting supporting quantitative research
 Public Records: Under this document review, official, ongoing records of an organization are
analyzed for further research. For example, annual reports policy manuals, student activities,
game activities in the university,
 Personal Documents: In contrast to public documents, this type of document review deals with
individual personal accounts of individuals’ actions, behavior, health, physique, etc. For
example, the height and weight of the students, distance students are traveling to attend the
school,
 Physical Evidence: Physical evidence or physical documents deal with previous achievements of
an individual or of an organization in terms of monetary and scalable growth.
Lesson 2: Sampling Design
Sampling is a statistical procedure that is concerned with the selection of individual
observations. It allows us to make statistical inferences about the population.

Approaches to Determine the Sample Size

1. Using a census for a small population (N ≤ 200). This eliminates sampling error and provides data
on all the members or elements in the
2. Using a sample size of a similar The disadvantage of using the same method used by other
research is the possibility of repeating the same errors that were made in determining sample
size for the study.
3. Using published tables. (research-advisors.com/tools/SampleSize.htm)
4. Using a formula
a) http://www.raosoft.com/samplesize.html
b) https://www.surveymonkey.com/mp/sample-size-calculator/

c) http://sphweb.bumc.bu.edu/otlt/mph- modules/bs/bs704_power/BS704_Power_print.html
In using a formula to compute the sample size, the basic information needed is as follows.
a)Margin of error. It is the amount of error that you can tolerate. If 90% of respondents
answer yes, while 10% answer no, you may be able to tolerate a larger amount of error than
if the respondents are split 50-50 or 45-55. A lower margin of error requires a larger sample
b)Confidence Interval. It is the amount of uncertainty you can tolerate. Suppose that you have 20
yes-no questions in your survey. With a confidence level of 95%, you would expect that for
one of the questions (1 in 20), the percentage of people who answer yes would be more than the
margin of error away from the true answer. The true answer is the percentage you would
get if you exhaustively interviewed A higher confidence level requires a larger sample size.

Sampling Techniques

1. Probability Sampling. It is a sampling technique wherein the members of the population are
given an (almost) equal chance to be included as a sample.
 Simple Random All members of the population have a chance of being included in the sample.
Example: lottery method, random numbers
 Systematic Random Sampling (with a random start). It selects every kth member of the
population with a starting point determined at random. Example: Selecting every 5th member
of N = 1000, to get 200 samples. For instance, starting at 7th member, we have the 12th, 17th,
22nd, and so
 Stratified Random This is used when the population can be divided into several smaller non-
overlapping groups (strata), then the sample is randomly selected from each group.
 Cluster Sampling. Also called area sampling in which groups or cluster, instead of individuals
are selected randomly as sample
 Multi-stage Sampling. If the population is too big, two or more sampling techniques may be
used until the desired sample is

2. Non-probability Sampling. It is a sampling technique wherein the sample is determined by set
criteria, purpose, or personal
1. Purposive or Judgment The sample is selected based on predetermined criteria set by
the researcher. Example: To determine the difficulties encountered by students in the
2017 national achievement test, only the Grade 6 pupils of the said school
will be included as a sample.
2. Convenience or Accidental It relies on data collection from population members who
are conveniently available to participate in the study. Facebook polls or questions can
be mentioned as a popular example of convenience sampling.
3. Quota Sampling. It is a non-probability sampling technique in which researchers look for a
specific characteristic in their respondents, and then take a tailored sample that is in
proportion to a population of
4. Snowball The samples are determined by referrals made by previous members of the sample
MODULE 3: DATA PRESENTATION AND VISUALIZATION

Introduction

Data visualization is a graphical representation of information and data. The different data
visualization tools provide an accessible way to see and understand trends, outliers, and patterns in
data. Being another form of visual art, data visualization grabs the interest and attention of the
audience on the message. It helps to tell the important stories by curating data into a form easier to
understand, highlighting the most important aspect of the data set. However, data presentation and
visualization are not as simple as creating graphs and tables. Effective presentation and visualization
of data involve a balance between form (aesthetics) and function.

Lesson 1: Graphical Presentation of Data

A statistical graph (or chart) is a tool that helps readers to understand the characteristics of a
distribution of sample or a population. Effective data presentation follows the following principles.

Five Essential Elements of Data Visualization (Data Craze, 2020)

1. Consistent Style and Colors. Carefully choose and maintain the same style across your
visualizations. Remember that the true meaning and value of data are not just in
2. Select Right Visualization. A bar or pie chart is not the only visualization method in your
arsenal. Adjust what you want to present based on the purpose and type of data you
3. Less is More. Focus on the quality of what you want to present. The excessive number of charts
or indicators is distracting. Simplicity comes at a price – the less information to analyze the
4. Effective Visualization. The difference between effective and impressive visualization can be
huge. The data presented in the application should foremost give a value – effect in the form of
specific
5. Data Quality. The trust of users is difficult to build, but it is easy to lose. Unexpected
information is desirable, errors are not. Try to detect errors at an early

What Graphs Should You Use?

Data should be matched appropriately to the right information visualization. The following are
some of the most common graphs used to present data (Klipfolio, Inc., 2020).

1. Bar Graph. It organizes data into rectangular bars that make it convenient to
compare related data
When to Use: compare two or more values in the same category; compare parts of a whole,
do not have too many groups (less than 10); and relate multiple similar data sets. When Not
Use: the category you are visualizing has one value associated with it or the data is
continuous.
Design Best Practice: Use consistent colors and labelling throughout for identifying
relationships more easily. Simplify the length of the y-axis labels and don’t forget to start
from 0.

2. Line Chart. It organizes data to rapidly scan information to understand

When to Use: to understand trends, patterns, and fluctuations; to compare different yet
related data sets with multiple series; and to make projections beyond your data
When Not to Use: to demonstrate an in-depth view of your data
Design Best Practice: Use different colors for each category you are comparing. Use solid
lines to keep the line chart clear and concise. Try not to compare more than four categories
in one line chart.
3. Scatter Plot. It organizes many different data points to highlight similarities in the
given data set. It is useful when looking for outliers and identifying correlation
between two variables.
When to Use: to show relationship between variables and to have a compact data visualization
When Not to Use: to rapidly scan information or to have a clear and precise data
points Design Best Practice: Ensure to use 1 or 2 trend lines to avoid confusion. Start at 0 for
the y-axis.
4. Histogram. It shows the distribution of data over a continuous interval or certain
period. It gives and estimate as to where values are concentrated, what extremes are
and whether there are any gaps or unusual values throughout the data
When to Use: To make comparison in data sets over an interval or time and to show a
distribution of data
When Not to Use: to compare three or more variables in data sets
Design Best Practice: Avoid bars that are too wide that can hide important details or too
narrow that can cause a lot of noise. Use equal round numbers to create bar sizes. Use
consistent colors and labelling throughout.
5. Box Plot. Also known as box and whisker diagram, is a visual representation of
displaying a distribution of data, usually across groups, based on a five-number
summary: minimum, first quartile, median, third quartile, and maximum. It also
shows the
When to Use: To display or compare a distribution of data and identify the minimum,
maximum and median of data.
When Not to Use: to visualize individual, unconnected data sets
Design Best Practice: Ensure font sizes for labels and legends are big enough and line widths
are thick enough. Use different symbols, line styles or colors to differentiate multiple data
sets. Remove unnecessary clutter from the plots.

Other useful graphs and charts, with their description, use, and other important features may be
found at The Data Visualization Catalogue via datavizcatalogue.com

Here are some tips on improving your charts and graphs (Visme, 2020).
1. Our eyes do not follow a specific order, so you need to create that order. Create a visualization
that deliberately takes viewers on a predefined visual
2. Our eyes first focus on what stands out, so be intentional with your focal point. Create charts
and graphs with one clear message that can be effortlessly
3. Our eyes can only handle a few things at once, so do not over crowd your design. Simplify your
charts so that they highlight one main point you want you
4. Our brains are designed to immediately look for connections and try to find meaning in the data.
Assign colors deliberately to improve the functionality of your
5. We are guided by cultural

Lesson 2: Tabular Presentation of Data

Almost all research and technical reports use tables to present data. Tabular presentation of
data is a systematic and logical arrangement of data into rows and columns with respect to the
characteristics of data.

Components of Tables

1. Table Number and Title. It is included for easy reference and identification. It should indicated
the nature of the information that is included in the
2. Stub (Row Labels). It is placed on the left side of the tabular form indicating specific issues in the
3. Captions (Column Headings). It placed at the top of the columns of a table to explain figures of
the
4. Body. The most important part of the table which comprises numerical contents and reveal the
whole story of investigated
5. Footnote. It provides further explanation that may be needed for any item that is included in a
6. Source note. It is placed at the bottom of the table to indicate the sources of
Tabular Presentation of Nominal and Ordinal Data

Nominal or ordinal data are presented using a frequency table or frequency distribution
table. The table displays frequency count and percentages for each value of a variable.

Example: Suppose your research objective is to determine the profile of the respondents. The data
may be presented as follows.
A contingency table or crosstabulation can also be used to display the relationship between
categorical variables. This type of presentation allows us to examine a hypothesis regarding the
independence or dependence of between variables.

Example: Suppose your research objective is to determine the profile of the respondents. The data
may be presented in crosstabulation as follows.
Tabular Presentation of Interval and Ratio Data

The data on the interval or ratio scale are organized using a frequency distribution table.
These are the steps in constructing a frequency distribution table.

1. Determine the number of class intervals, = 1 + 3.322 , the range = – , and the class size c = R/k
2. Construct the class intervals based on the class The first and last class intervals should contain
the minimum and maximum value, respectively. It is advisable to start the first class interval
with the minimum value.
3. Arrange the data in in either ascending or descending order. Then tally the scores based on the
class intervals in step
4. Add columns for class boundaries, class mark or class midpoint, relative frequency, and
cumulative

The class interval contains the lower (L) and upper limits (U). (e g. In the class interval 46
– 65, the lower limit is 46 and the upper limit is 65)

The class mark or class midpoint (X) is the value in the middle of the class interval. (e. g. In the
class interval 46 – 65, the class mark is 55.5; that is,

The class boundaries are the true class limits of the class intervals. It is halfway below the
lower limit and halfway above the upper limit. (e. g. In the class interval 46 – 65, the class
boundary is 44.5 – 65.5)

The relative frequency (also known as percentage frequency) is computed using the formula

where f is the frequency of the class interval and n is the total of the frequencies.

The less than cumulative frequency (<cf) and greater than cumulative frequency (>cf) are
obtained by adding the frequencies from top to bottom and from bottom to top,
respectively.

Example: Using the scores of 50 students in a 55-item Mathematics test, construct a frequency
distribution table.
43 30 35 37 42 19 26 48 34 15

35 18 46 41 27 18 13 40 29 14

40 17 10 21 28 13 14 39 30 5

19 50 36 20 31 28 48 32 20 38

25 12 33 31 28 16 40 32 26 35

Solution:

Step 1: Determine the number of class intervals the range, and the class size.
= 1 + 3.322                                        = –

= 1 + 3.322 (50)                                = 50 – 5
= 6.643978 = 7 = 45
c= R/K
c = 45/7 = 6.43 =7

Step 2: Construct the class intervals based on the class size.
Since our minimum value is 5 and the class size if 7, the first class interval is 5 – 11. Note
that this class interval contains 7 values – 5, 6, 7, 8, 9, 10, 11.
To construct the succeeding intervals, add the class size to the lower and upper limits.

Class
Intervals
5 – 11
12 – 18
19 – 25
26 – 32
33 – 39
40 – 46
47 – 53

Step 3: Arrange the data in in either ascending or descending order. Then tally the scores based on the
class intervals in step 2.

5 14 18 20 27 30 32 35 40 43

10 14 18 21 28 30 33 36 40 46

12 15 19 25 28 31 34 37 40 48

13 16 19 26 28 31 35 38 41 48

13 17 20 26 29 32 35 39 42 50

This data set can be organized or sorted using stem-and-leaf plot. A stem-and-leaf plot is a special
table where each data value is split into a stem, first digit or digits, and a leaf, last digit.

Stem Leaf

0 5

1 0233445678899

2 00156678889

3 001122345556789

4 000123688

5 0

Step 4. Add columns for class boundaries, class mark or class midpoint, relative frequency, and
cumulative frequencies.
10/15/21, 5:06 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY)

MODULE 4 : DESCRIPTIVE STATISTICS

LESSON 1 - MEASURES OF CENTRAL TENDENCY

Lesson 1: Measures of Central Tendency
The measure of central tendency or average is a value that best characterize or describe a set of data.

A. MEAN - It is the balance point of a data set and the most reliable, most sensitive measure of average. It is always unique and is affected by extreme values. It is used if
data are interval or ratio, and when the data is normally distrubted.
For ungrouped data, the mean can be computed as follows.

Example: The following are the scores of 10 students in a 50-item Qualifying Examination.
45 44 42 40 45 48 49 50 50 47

The mean score of 10 students in a 50-item Qualifying Examination is 46. In general, the group of students did well in the examination.

Example: A high school teacher conducts a semester evaluation of the Math textbook used in Calculus. The following are data collected among the 40 students.

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 1/7
10/15/21, 5:06 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

Table 1 shows the textbook evaluation results for Calculus. The students strongly agreed that the Calculus book is organized appropriately for the users (M
= 3.63). The students also agreed that the Calculus book has relevant content (M = 3.38), developmentally appropriate supplementary activities (M = 3.25), and
other features that promote higher order thinking skills (M = 3.08).

For grouped data (data arranged in the frequency distribution table), we computed the mean using the following formulas.

Example: Compute and interpret the mean of the data set presented in the following frequency distribution table.

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 2/7
10/15/21, 5:06 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

Alternative Solution: Assumed Mean Formula

Choose any class mark and designate that as 0. For the deviations column, d, assign 0 to 0 and consecutive integral values for the rest of the class marks (negative integers for class marks lower than 0 and positive integers for class marks higher than 0).
Compute fd, that is, the product between the frequencies and deviations. Get the total for the column fd.
For this problem, let 0 = 29

The mean score of 50 students who took the 55-item Mathematics Test is 29. The mean

score suggests that the students did not perform satisfactorily in the mathematics test.

B. MEDIAN.It is the middle value in an ordered set of data; hence, it is not affected by outliers and is unique. It is used if data is ordinal, when there are few extreme values
in the data set, when some values are missing or underestimated, or there are open-ended distributions.
For the ungrouped data, the median can be determined as follows:

1. Arrange the data in ascending (or descending)

2. If n is odd, the middle entry is the median. If n is even, get the average of the two middle numbers.

Example: Consider the following data set. Compute the median values for each group.

Group A 22 33 21 18 19 15 16 18 16

Group B 15 15 15 16 17 20 21 20 20 28 25 27

Solution: Arrange the data in ascending order.

Group A 15 16 16 18 18 19 21 22 33

Group B 15 15 15 16 17 20 20 20 21 25 28 27

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 3/7
10/15/21, 5:06 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

Example: Compute and interpret the median of the data set presented in the following frequency distribution table.

The median, equivalent to 29.27, means that 50% of the students have scores less than 29.27 or 50% have scores greater than 29.27.

C. MODE. It is the most frequently occurring value in the data set; hence, it is not unique. That is, a data set may have no mode, one mode, or multiple modes. For this
reason, it is the most unreliable among the three averages. It is used when data are in nominal scale.

For the ungrouped data, the mode can be determined by inspection.

Example: Consider the following data set. Determine the mode for each group.

Group A 22 33 21 18 19 15 16 18 16

Group B 15 15 15 16 17 20 21 20 20 28 25 27

Solution: By inspection, the mode of Group A are 18 and 16; hence, a bimodal distribution. The mode of Group B is 15 and 20, which is also a bimodal
distribution.
For grouped data, we can compute the mode using the following formula

Example: Compute and interpret the mode of the data set presented in the following frequency distribution table.

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 4/7
10/15/21, 5:06 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

The mode, equivalent to 30, means that majority of the students who took the math test have score equal to 30.

Watch the following videos on measures of central tendency to understand more about these concepts.
Khan Academy. (14 November 2011). Finding mean, median, and mode [Video clip]. Retrieved 25 July 2020 from

The Organic Chemistry Tutor. (26 January 2019). Mean, median, and mode of grouped data & frequency distribution tables statistics [Video clip]. Retrieved 25
July 2020 from
Emmanuel, E. (11 February 2019). Mean, Median, and Mode (grouped data) [Video clip].

Retrieved 25 July 2020 from

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 5/7
10/15/21, 5:06 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 6/7
10/15/21, 5:06 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 7/7
10/15/21, 5:06 PM Subjects - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_i…

STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY)

MODULE 4 : DESCRIPTIVE STATISTICS

LESSON 2 - Measures of Position

Lesson 2: Measures of Position

The three commonly used measures of positions (also known as quantiles) are quartiles, deciles, and percentiles which divides the distribution into 4, 10,
and 100 equal parts, respectively.
To determine the different measures of position for ungrouped data we will use the following procedures.
(Note: The procedures outlined in the succeeding discussion are based on MS Excel calculations.)

Quartiles

Method 1: Conventional Method

1. Arrange the values in ascending (or descending order).

2. Determine the median and separate the lower and upper 50%.
3. To determine the first quartile, determine the “median” of the lower 50%.
4. To determine the third quartile, determine the “median” of the upper 50%.

Example: Determine the first and third quartile of the following data sets.

Group A: 85, 88, 87, 89, 86, 86, 85, 87, 89, 88,
Group B: 94, 80, 79, 88, 96, 86, 83, 81, 85, 99, 92

Solution
For Group A, we arrange the values in an array: 85, 85, 86, 86, 87, 87, 88, 88, 89, 89

The median is ̃=

The lower 50% includes 85, 85, 86, 86, 87. The middle value, 86, is the first quartile.

The lower 50% includes 87, 88, 88, 89, 89. The middle value, 88, is the third quartile.

For Group B, we arrange the values in an array: 79, 80, 81, 83, 85, 86, 88, 92, 94, 96, 99 The median is 86.
The lower 50% includes 79, 80, 81, 83, 85. The middle value, 81, is the first quartile.
The lower 50% includes 88, 92, 94, 96, 99. The middle value, 94, is the third quartile.

Method 2: Using Formula For first quartile,

1. Compute =

2. If k is an integer, the value of the first quartile is . If k is not an integer, the value of the first quartile is + ( +1 − ) × ( ). For third quartile,

1. Compute =
2. If t is an integer, the value of the first quartile is . If t is not an integer, the value of the first quartile is + ( +1 − ) × ( ).

Example: Determine the first and third quartile of the following data sets.

Group A: 85, 88, 87, 89, 86, 86, 85, 87, 89, 88,

Group B: 94, 80, 79, 88, 96, 86, 83, 81, 85, 99, 92

Solution:
For Group A, we arrange the values in an array: 85, 85, 86, 86, 87, 87, 88, 88, 89, 89

For the first quartile, we compute = = 2.75. Since k is not an integer, the first quartile is computed as follows.

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 1/5
10/15/21, 5:07 PM Subjects - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_i…
Let 2= 85, 3 = 86, ( ) = 0.75, Thus, 1 = 85 + (86 − 85)(0.75) = 85.75

For the third quartile, we compute = = 8.25. Since t is not an integer,the third quartile is computed as follows.
Let 8 = 88, 9 = 89, ( ) = 0.25, Thus, 3 = 88 + (89 − 88)(0.25) = 88.25 Using MS Excel

To compute for the values of the quartiles using excel, we will use the formula

=QUARTILE.EXC(array, quart). The “quart” in the formula may be 1, 2, or 3 for first, second, or third quartile, respectively.

For Group B, we arrange the values in an array: 79, 80, 81, 83, 85, 86, 88, 92, 94, 96, 99

For the first quartile, we compute = = 3. Since k is an integer, the first quartile is :
4
1= 3=81

For the third quartile, we compute = = 9. Since t is an integer, the third quartile is :
3 = 9 =94

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 2/5
10/15/21, 5:07 PM Subjects - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_i…

The computation for deciles and percentiles follows the same procedures as the computation of quartiles using the formula.

Example: Determine the 10th, 80th, and 90th percentiles of the following data set.

Group A: 85, 88, 87, 89, 86, 86, 85, 87, 89, 88,

Solution:

For Group A, we arrange the values in an array: 85, 85, 86, 86, 87, 87, 88, 88, 89, 89

For the 10th percentile: = = = 1.1. Since k is not an integer

For the 80th percentile: = = = 8.8. Since k is not an integer,

Figure 8 shows the MS Excel output using the formula =PERCENTILE.EXC(array, k), where k has values between 0 to 1. That is for 80 th percentile, k = 0.8.

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 3/5
10/15/21, 5:07 PM Subjects - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_i…

For the grouped data, we will use the following formulas.

Note that these formulas are very similar with the formula of the median.

Example: Compute and interpret the first and third quartile, 3rd decile, and 10th and 90th percentile of the data set presented in the following frequency
distribution table.

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 4/5
10/15/21, 5:07 PM Subjects - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_i…

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 5/5
10/15/21, 5:07 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY)

MODULE 4 : DESCRIPTIVE STATISTICS

MEASURES OF VARIABILITY
Lesson 3: Measures of Variability
To get a complete description of the data set, it is not enough to compute averages. We may also consider other statistics called measures of variability to
determine homogeneity or heterogeneity of the data in a distribution. Measures of variability describe how spread out or scattered a set of data is.

As shown in figure 6, the data sets may have the same average but may have a totally different interpretation due to the spread (variation or differences) of
data values.

A. Absolute Variability

Range. This refers to the difference between the maximum and minimum values.

Mean Absolute Deviation. This refers to the average absolute deviations of the values from the mean.

Ungrouped data :

Grouped data :

Variance and Standard Deviation. The variance refers to the squared deviations of the values from the mean. The standard deviation refers to the square root
of the variance.

Ungrouped data:

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 1/6
10/15/21, 5:07 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

Interquartile Range and Quartile Deviation. It is the measure of dispersion of the middle 50%

Example: Consider the following raw data on performance rating set taken from a sample of 10 respondents.

i 1 2 3 4 5 6 7 8 9 10

xi 84 86 83 81 87 90 93 85 80 94
Compute the range, mean absolute deviation, variance, standard deviation, interquartile range, and quartile deviation.

Solution:
a. Range = 94 − 80 = 14

The ratings span 14 points from the minimum to the maximum value.

To compute the mean absolute deviation, variance, and standard deviation, the mean must be determined fir st. Then, the column for | − | is computed

by getting the absolute value of the difference between the ratings and the mean. The column for (
−̅
2
)
is computed by squaring the values in the
̅

column for | − |. Finally, get the total for each column.

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 2/6
10/15/21, 5:07 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 3/6
10/15/21, 5:07 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

2
CLASS INTERVALS CLASS BOUNDARIES f x fx fx

5 - 11 4.5 – 11.5 2 8 16 128

12 – 18 11.5 – 18.5 10 15 150 2250

19 – 25 18.5 – 25.5 6 22 132 2904

26 – 32 25.5 – 32.5 13 29 377 10933

33 – 39 32.5 – 39.5 9 36 324 11664

40 – 46 39.5 – 46.5 7 43 301 12943

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 4/6
10/15/21, 5:07 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

47-53
46.5 – 53.5 3 50 150 7500

N=50 1450 48322

Example. Which groups of the following groups is most varied in terms of their performance ratings. Compute their coefficients of variation.

Group1: 85, 85, 87, 89, 86, 86, 85, 87, 89, 88, 87

Group 2: 94, 80, 79, 88, 96, 85, 83, 81, 85, 99, 87

Group 3: 70, 80, 79, 88, 89, 89, 85, 100, 90, 100

Solution: Notice that groups have the same mean value of 87. Hence, we need another measure to compare them. First, compute the standard deviation.

You can verify the following values using MS Excel or your calculator.

Group Mean Standard Deviation Sample

1 87 1.490712 10

2 87 7.055337 10

3 87 9.201449 10

Notice that Group 3 is the most varied than the two other groups. Furthermore, the coefficient of variation tells us the same observation.

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 5/6
10/15/21, 5:07 PM STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY) - https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson…

Example. Which groups of the following groups is most varied in terms of their performance ratings. Compute their quartile deviations.

Group1: 85, 85, 87, 89, 86, 86, 85, 87, 89, 88, 87

Group 2: 94, 80, 79, 88, 96, 85, 83, 81, 85, 99, 87

Group 3: 70, 80, 79, 88, 89, 89, 85, 100, 90, 100

Solution: After arranging the terms in an array, we get the following values for the first and third quartiles.

Group First Quartile Third Quartile

1 85.75 88.25

2 80.75 94.5

3 79.75 92.5

Since some groups have outliers, let us focus on the middle 50% of the distribution.

In terms of the variation in the middle 50%, Group B is most varied than the other two groups.

Click here to download Module 4/files/5466702/Module_4_DescriptiveStatistics_(Recovered)(2).docx

https://ubian.ub.edu.ph/student_lesson/show/2750742?from=%2Fstudent_lesson%2Fshow%2F2750742%3Flesson_id%3D12458339%26router%3Dtr… 6/6

Steady Convection and Diffusion 1D MATLAB CFD Code
No ratings yet
Steady Convection and Diffusion 1D MATLAB CFD Code
11 pages
The History of Statistics
0% (1)
The History of Statistics
4 pages
Timeline of Probability and Statistics
100% (1)
Timeline of Probability and Statistics
3 pages
Module 1: Nature of Statistics
No ratings yet
Module 1: Nature of Statistics
29 pages
Module 1: Nature of Statistics
No ratings yet
Module 1: Nature of Statistics
47 pages
STAT-APP-1 (1) - Converted-Merged
No ratings yet
STAT-APP-1 (1) - Converted-Merged
60 pages
Probability and Statistics
No ratings yet
Probability and Statistics
4 pages
The History of Statistics
No ratings yet
The History of Statistics
4 pages
Topic 1 INTRODUCTION TO STATISTICS HISTORY AND NATURE OF STATISTICS
No ratings yet
Topic 1 INTRODUCTION TO STATISTICS HISTORY AND NATURE OF STATISTICS
8 pages
Stat Manual
No ratings yet
Stat Manual
18 pages
A Brief History of Statistics (Selected Topics) : ALPHA Seminar
No ratings yet
A Brief History of Statistics (Selected Topics) : ALPHA Seminar
15 pages
Statistics History PDF
No ratings yet
Statistics History PDF
15 pages
History of Statistics
No ratings yet
History of Statistics
17 pages
Asian Development College Foundation Tacloban City
No ratings yet
Asian Development College Foundation Tacloban City
2 pages
History of Statistics
No ratings yet
History of Statistics
40 pages
Statistics: A. History On Statistics and Probability
No ratings yet
Statistics: A. History On Statistics and Probability
43 pages
Linea de Tiempo
No ratings yet
Linea de Tiempo
2 pages
Brief History of Statistics
No ratings yet
Brief History of Statistics
20 pages
History of Statistics on Timeline
No ratings yet
History of Statistics on Timeline
18 pages
OLIVO - Lesson 3 History of Statistics Quiz
No ratings yet
OLIVO - Lesson 3 History of Statistics Quiz
2 pages
Ders 1-2 Introduction
No ratings yet
Ders 1-2 Introduction
23 pages
Statisticians
No ratings yet
Statisticians
4 pages
A Brief History of Statistics
75% (4)
A Brief History of Statistics
60 pages
Module 1
100% (1)
Module 1
5 pages
History of Statistics: Donnalyn A. Abenoja August 17, 2019 Maed - 1A
No ratings yet
History of Statistics: Donnalyn A. Abenoja August 17, 2019 Maed - 1A
3 pages
Meaning, Scope and Types of Statistics
No ratings yet
Meaning, Scope and Types of Statistics
21 pages
Calupig, Jessell H. Grade 11-STEM - A: History of Statistics
No ratings yet
Calupig, Jessell H. Grade 11-STEM - A: History of Statistics
7 pages
Brief History of Statistics
No ratings yet
Brief History of Statistics
60 pages
History of Statistics
No ratings yet
History of Statistics
2 pages
History of Statistics
No ratings yet
History of Statistics
14 pages
History and Development of Statistics
96% (27)
History and Development of Statistics
4 pages
Stat English L
No ratings yet
Stat English L
34 pages
Grey and Teal Modern Simple Research Project Presentation 20240805 105727 0000
No ratings yet
Grey and Teal Modern Simple Research Project Presentation 20240805 105727 0000
29 pages
Statistics, History Of: Alain Desrosie'res, INSEE, Paris, France
No ratings yet
Statistics, History Of: Alain Desrosie'res, INSEE, Paris, France
5 pages
35466489-Course Syllabus
No ratings yet
35466489-Course Syllabus
5 pages
History of Business Statistics
No ratings yet
History of Business Statistics
3 pages
The History of Statistics
No ratings yet
The History of Statistics
1 page
Statistics-8 - Q1 - Module 1 History of Statistics
No ratings yet
Statistics-8 - Q1 - Module 1 History of Statistics
12 pages
Descriptive Statistics: Part One
No ratings yet
Descriptive Statistics: Part One
10 pages
Chapter 1 Introduction To Statistics
No ratings yet
Chapter 1 Introduction To Statistics
13 pages
History and Development of Statistics
No ratings yet
History and Development of Statistics
1 page
Origin and Development of Statistics
No ratings yet
Origin and Development of Statistics
3 pages
Stat 2023 English
No ratings yet
Stat 2023 English
69 pages
MMW Chap 3 Data Management Statistics Part 1
No ratings yet
MMW Chap 3 Data Management Statistics Part 1
18 pages
Handout#1 - Stat History - Terms - Exercise
No ratings yet
Handout#1 - Stat History - Terms - Exercise
5 pages
Unit 11 Introduction To Statistics: Structure
No ratings yet
Unit 11 Introduction To Statistics: Structure
20 pages
Chapter 1_250127_103236
No ratings yet
Chapter 1_250127_103236
12 pages
Handnote On B.Stat 1st Chapter
No ratings yet
Handnote On B.Stat 1st Chapter
14 pages
1. Lecture Note 01_Introduction to Statistics
No ratings yet
1. Lecture Note 01_Introduction to Statistics
7 pages
Modules in Stat101
No ratings yet
Modules in Stat101
133 pages
Unit1 Introduction to Statistics Min
No ratings yet
Unit1 Introduction to Statistics Min
19 pages
History of Statistics
100% (2)
History of Statistics
2 pages
Statistics
No ratings yet
Statistics
58 pages
SaP Unit 1
No ratings yet
SaP Unit 1
11 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
14 pages
경통1
No ratings yet
경통1
21 pages
1introduction To Statistics
No ratings yet
1introduction To Statistics
27 pages
Stics
No ratings yet
Stics
15 pages
Statistics: Math Holiday Homework
No ratings yet
Statistics: Math Holiday Homework
18 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
20 pages
The Rise of Statistical Thinking, 1820–1900
From Everand
The Rise of Statistical Thinking, 1820–1900
Theodore M. Porter
4/5 (7)
Kyle Maynard
No ratings yet
Kyle Maynard
6 pages
CMPC 221 Punzalan PDF
No ratings yet
CMPC 221 Punzalan PDF
9 pages
Arguelles Rollen L Fa2
No ratings yet
Arguelles Rollen L Fa2
24 pages
Activity2 Packet Tracer - Creating An Ethernet LAN
No ratings yet
Activity2 Packet Tracer - Creating An Ethernet LAN
3 pages
About Asia Brewery Inc.
No ratings yet
About Asia Brewery Inc.
1 page
Paas: Cloud Platform Service Design For Web Developers in Batangas
No ratings yet
Paas: Cloud Platform Service Design For Web Developers in Batangas
138 pages
Chapter 4 Results and Discussion
No ratings yet
Chapter 4 Results and Discussion
1 page
UCSP Semi Finals Lesson 2
No ratings yet
UCSP Semi Finals Lesson 2
53 pages
De La Salle Lipa: Web Developers in Batangas"
No ratings yet
De La Salle Lipa: Web Developers in Batangas"
2 pages
Chapter 4 Results and Discussion
100% (2)
Chapter 4 Results and Discussion
37 pages
Summative Test in Earth Science
No ratings yet
Summative Test in Earth Science
2 pages
Maxwell's Equations For Time-Varying Fields: e Ntents Objectives
No ratings yet
Maxwell's Equations For Time-Varying Fields: e Ntents Objectives
32 pages
TCS Previous Year Papers and Study Materials
No ratings yet
TCS Previous Year Papers and Study Materials
13 pages
NCERT Solutions for Class 11 Maths Chapter 3 Trigonometric Functions - Free PDF Download
No ratings yet
NCERT Solutions for Class 11 Maths Chapter 3 Trigonometric Functions - Free PDF Download
56 pages
CPU Scheduling
No ratings yet
CPU Scheduling
39 pages
4 Circuit Theorems
No ratings yet
4 Circuit Theorems
40 pages
Multi Channel Queueing Able and Baker Example
No ratings yet
Multi Channel Queueing Able and Baker Example
4 pages
A Research On Mineral Exploration Using GIS and Remote Sensing
No ratings yet
A Research On Mineral Exploration Using GIS and Remote Sensing
33 pages
Cat2Cet Mentors Cat 2020 Lrdi Workshop 2: Set 1: Refer To The Data Below and Answer The Questions That Follow
No ratings yet
Cat2Cet Mentors Cat 2020 Lrdi Workshop 2: Set 1: Refer To The Data Below and Answer The Questions That Follow
8 pages
FINAL TERM Datesheet and Syllabus_(CLASS 6)
No ratings yet
FINAL TERM Datesheet and Syllabus_(CLASS 6)
2 pages
Generalized Smarandache Curves With Frenet-Type Frame
No ratings yet
Generalized Smarandache Curves With Frenet-Type Frame
17 pages
Quiz-W02-Operator, Control Flow & Looping - Attempt Review
No ratings yet
Quiz-W02-Operator, Control Flow & Looping - Attempt Review
1 page
Binary Number System Test
No ratings yet
Binary Number System Test
2 pages
Mathematical Bridges 1st Edition Titu Andreescu download
100% (1)
Mathematical Bridges 1st Edition Titu Andreescu download
67 pages
Adc - Dac
No ratings yet
Adc - Dac
4 pages
Python For Loops
No ratings yet
Python For Loops
4 pages
05 - Development of Ring Hoop Shear Test For The Mechanical Characterization of Tubular Materials
No ratings yet
05 - Development of Ring Hoop Shear Test For The Mechanical Characterization of Tubular Materials
7 pages
Strength of Materials II - CE8402 2017 Regulation - Semester Question Paper 2023 Nov Dec
No ratings yet
Strength of Materials II - CE8402 2017 Regulation - Semester Question Paper 2023 Nov Dec
3 pages
Numerical I Module-1
86% (7)
Numerical I Module-1
184 pages
Analogies Between Analogies The Mathematical Reports of S M Ulam and his Los Alamos Collaborators S. M. Ulam (Editor) 2024 scribd download
100% (3)
Analogies Between Analogies The Mathematical Reports of S M Ulam and his Los Alamos Collaborators S. M. Ulam (Editor) 2024 scribd download
37 pages
Poster Madjid Karimirad
No ratings yet
Poster Madjid Karimirad
1 page
A Note On Non Split Locating Equitable Domination
No ratings yet
A Note On Non Split Locating Equitable Domination
4 pages
WWW - Studymaterialz.in: Signal and System Important 30 MCQ PDF With Solution
100% (1)
WWW - Studymaterialz.in: Signal and System Important 30 MCQ PDF With Solution
71 pages
Adaptive Input - Output Linearizing Control of Induction Motors PDF
No ratings yet
Adaptive Input - Output Linearizing Control of Induction Motors PDF
14 pages
Physics GCSE (PDFDrive)
100% (2)
Physics GCSE (PDFDrive)
257 pages
Key Standards of Grade 5
No ratings yet
Key Standards of Grade 5
1 page
Graphical Analysis and Equations of Uniformly Accelerated Motion
No ratings yet
Graphical Analysis and Equations of Uniformly Accelerated Motion
11 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
6 pages
Character Tables - CH 431 Inorganic Chemistry
No ratings yet
Character Tables - CH 431 Inorganic Chemistry
4 pages

Module 1: Nature of Statistics

Uploaded by

Module 1: Nature of Statistics

Uploaded by

MODULE 1: NATURE OF STATISTICS

Lesson 1: History and Development of Statistics

Lesson 2: Basic Concepts of Statistics

Two Divisions of Statistics

Sample Research Questions (Objectives):

Population and Sample

Lesson 3: Summation Notation

Evaluate the following summations.

Lesson 1: Sources of Data and Data Collection Methods

Characteristics of a Good Data

Data Collection Tools and Instruments (Bhat, 2020)

Approaches to Determine the Sample Size

Lesson 1: Graphical Presentation of Data

What Graphs Should You Use?

2. Line Chart. It organizes data to rapidly scan information to understand

STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY)

LESSON 1 - MEASURES OF CENTRAL TENDENCY

Alternative Solution: Assumed Mean Formula

1. Arrange the data in ascending (or descending)

Solution: Arrange the data in ascending order.

For the ungrouped data, the mode can be determined by inspection.

Retrieved 25 July 2020 from

STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY)

LESSON 2 - Measures of Position

Lesson 2: Measures of Position

Method 1: Conventional Method

1. Arrange the values in ascending (or descending order).

Method 2: Using Formula For first quartile,

For the 10th percentile: = = = 1.1. Since k is not an integer

For the 80th percentile: = = = 8.8. Since k is not an integer,

For the grouped data, we will use the following formulas.

STAT.APP(BSA3-4_ 7:00-10:00 SATURDAY)

column for | − |. Finally, get the total for each column.

5 - 11 4.5 – 11.5 2 8 16 128

12 – 18 11.5 – 18.5 10 15 150 2250

19 – 25 18.5 – 25.5 6 22 132 2904

26 – 32 25.5 – 32.5 13 29 377 10933

33 – 39 32.5 – 39.5 9 36 324 11664

40 – 46 39.5 – 46.5 7 43 301 12943

N=50 1450 48322

Group Mean Standard Deviation Sample

Group First Quartile Third Quartile

Click here to download Module 4/files/5466702/Module_4_DescriptiveStatistics_(Recovered)(2).docx

You might also like