EDA MODULE 1 Nature of Statistics
EDA MODULE 1 Nature of Statistics
1 Nature of Statistics
Learning Objectives
After completing this chapter, the students will able to:
• Define Statistics.
• Distinguish between descriptive statistics and inferential statistics.
• Differentiate parametric and statistic.
• Compare and contrast the sources of data.
• Differentiate constant and variable.
• Identify and explain the types of data.
• Differentiate experimental and mathematical variables.
• Classify variables as discrete and continuous.
• List and describe the four levels of measurements.
• Identify and explain the sampling techniques.
• Discuss the methods of collecting and presenting data.
• Evaluate summations of notations.
Chapter Outline
1.1 Introduction
1.2 Division of Statistics
1.3 Parameter and Statistic
1.4 Sources of Data
1.5 Constant and variable
1.6 Types of Data
1.7 Classification of Variables
1.8 Levels of Measurements
1.9 Sampling Techniques
1.10 Methods of Collecting Data
1.11 Methods of Presenting Data
1.12 Summation Notation Sigma∑
.......
1
1.1 Introduction
The origin of modern statistics may be traced to two areas of interest which, on
the surface, have very little common: games of chance or what we call political science.
In eighteen century studies in probability led to mathematical treatment of errors of
measurement and the theory which now forms the foundation of statistics. In the same
century, interest in the numerical description of political units led to development of
methods which nowadays come under the heading of descriptive statistics.
Although descriptive statistics is an important branch of statistics and it continues
to be widely used, statistical information usually arises from samples, and this means that
its analysis will require generalizations which go beyond the data. As a result, the most
important feature of the recent growth of statistics has been a shift in emphasis from the
methods which merely describe to methods which serve to make generalizations: that is,
a shift in emphasis from descriptive statistics to the methods of inferential statistics.
Descriptive Statistics is the totality of methods and treatments employed in the
collection, description and analysis of numerical data. The purpose of a descriptive
statistics is to tell something about particular group of observation. On the other hand,
Inferential Statistics is the logical process from sample analysis to a generalization or
conclusion about a population. It is also called statistical inference or inductive statistics.
A population consists of all members of the group about which we want to draw a
conclusion, while sample is a proportion, or part of the population of interest selected for
analysis.
The relation of between a sample and population is portrayed in Figure 1.1.
Population Sample
Figure1.1:Relation between L B D N G
Population and Sample A F E H J T M N Q K C M
R S W Q V O P R W P D
C I K
Page 2 of 12
1.3 Parameter and Statistic
The major advantage of descriptive statistics is that they permit researchers to describe
the information contained in many scores with just a few indices.
There are basically two types of random variables yielding two types of data:
qualitative and quantitative.
Qualitative Variable. A variable that is conceptualized and analyzed as distinct
categories, with no continuum implied. Also termed categorical variable; that are put in
the same or different classes, each class being considered as possessing some common
characteristic that is not shared by those in other classes.
Example: eye color, gender, occupation, religious preference etc.
Page 3 of 12
yield frequencies when counted, giving rise to discrete variable or when measured yield
metric or continuous variable.
Example: height, weight, math aptitude, lust of life, etc.
Qualitative Quantitative
Discrete Continuous
Page 4 of 12
1.8 Levels of Measurement
In the broadest sense, all collected data are “measured” in some form. For example,
even discrete quantitative data can be thought of as arising by a process of “measurement
through counting.” The four widely recognized level of measurement- the nominal,
ordinal, interval, and ratio.
A. Nominal level of measurement is mutually exclusive and exhaustive meaning it is
used to differentiate classes or categories for purely classification or identification
purposes. It is the weakest form of measurement because no attempt can be made to
account for differences within the particular category or to specify any ordering or
direction across the various categories. Nominal data are discrete variables.
Exhaustive is a property of a set of categories such that each individual or object must
appear in a category.
Example:
Qualitative Variable Categories
Gender Male, Female
Automobile Ownership Yes, No
Type of Life Insurance Owned Term, Endowment, Straight-Life, Others, None
B. Ordinal level of measurement is used in ranking. It is somewhat stronger form of
measurement because an observed value classified into one category is said to poses
more of a property being scaled than does an observed value classified into another
category. Nevertheless, within a particular category no attempt is made to account for
differences between the classified values. Moreover, ordinal scaling is still a weak form
of measurement, because no meaningful numerical statements can be made about
differences between categories. That is the ordering implies only which category is
‘’greater’’ or “lesser”- not how much “greater” or “lesser.” Ordinal data are discrete
variables.
Example:
Qualitative Variable Categories
Student class designation Freshman, Sophomore, Junior, Senior
Product satisfaction Unsatisfied, Neutral, Satisfied, Very Satisfied
Movie classification G, PG,PG-13, R-18, X
Faculty Rank Professor, Associate Prof., Assistant Prof, Instructor
Hotel Ratings , , , ,
Student Grades 1.0, 1.25, 1.50, 1.75, 2.00, …
Page 5 of 12
. C. Interval level of measurement is to classify order and differentiate between classes
or categories in terms of degrees of differences. Interval data are either discrete or
continuous variables.
Example:
Qualitative Variable
Temperature (in degree ℃ or℉)
Calendar Time (Gregorian, Hebrew, or Islamic)
D. Ratio level of measurement differs from interval measurement only in one aspect; it
has a true zero point (complete absence of the attitude being measured).With an absolute
value point it can be said that the ratios of two observations is “twice as fast”’ “half as
long” or others. Ratio data are either discrete or continuous variables.
Example:
Qualitative variable
Weight (in pounds or
kilogram)
Age (in years or days)
Salary (in Philippine peso)
Page 6 of 12
Figure 1.3 Illustrate the classification of numerical data.
Numerical Data
Qualitative Quantitative
One of the most important steps in the research process is to select the sample of
individuals who will participate as a part of the study. Sampling refers to the process of
selecting these individuals.
Example: For instance we have the data shown below; say we want to
consider every 5th on the list.
23 34 12 14 13 23 24 39 27 23
12 15 16 23 26 28 23 22 19 34
25 22 18 30 23 24 17 18 15 12
Therefore, the samples from every 5th from left to right are 13,23,26,34,23, and 12.
Page 7 of 12
Example: given the population of a certain university and a target sample
population of 5,455, determine the sample size of each subgroup or courses.
Field of Population
Specialization
Nursing 6,000
Accountancy 500
Management 2,000
Marketing 1,000
Education 2,500
Total 12,000
To determine the sample size in each subgroup, we will simply multiply the sample
population with respect to each subgroup percentage in reference to the population. The
computation is shown in the last column of the table below.
Page 8 of 12
Example: A researcher may only include close friends and clients to be
included in the sample population.
Example: Imagine attempting to obtain the frame that includes all the
homeless people in Metro Manila. To obtain a sample of homeless
individuals, for example, the researcher will interview individuals on the street
or at homeless shelter.
Random Non-random
Page 9 of 12
1.10 Methods of Collecting Data
After the research problem has been laid, the next step is to determine the methods to
collect data. Here are the five basic methods in collecting data.
Observation Method. This method is used to data that are pertaining to behaviors of an
individual or a group of individuals at the time of occurrence of a given situation are best
obtained by observation. One limitation of this method is observation is made only at the
times or occurrence of the appropriate events.
Experiment Method. This is used to determine the cause and effect relationship of
certain phenomena under controlled conditions. This method usually employed by
scientific researchers.
There are different ways in presenting data. Three of them are as follows
Textual Method. This method presents the collected data in narrative and paragraph
forms.
Tabular Method. This method presents the collected data in table which are orderly
arranged in rows and column for an easier and more comprehensive comparison of
figures.
Graphical Method. This method presents the collected data in visual or pictorial form to
get a clear view of data. (e.g. histogram, pie chart, pareto chart, pictograph, etc.)
Page 10 of 12
1.12 Summation Notation, Sigma∑
The symbol
𝑛
∑(𝑋𝑖 )
𝑖=1
is used to denote the sum of all the Xi’s from i=1 to i=n; by definition,
𝑛
∑(𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 )
𝑖=1
We often denote this sum simply by ∑X or ∑X1. The symbol ∑ is the Greek capital letter
sigma, denoting sum.
Solution:
Solution:
1. ∑4𝑖=1(2𝑋𝑖 𝑌𝑖 )=2X1Y1+2X2Y2+2X3Y3+2X4Y4
=2(1)(0)+2(3)(8)+2(2)(1)+2(5)(6)
=0+48+4+60
=112
Page 11 of 12
3
3. ∑𝑖=1(𝑋𝑖 + 𝑍𝑖 )2 =(X1+Z1)2+(X2+Z2)2+(X3+Z3)2
= (1+4)2+(3+7)2+[2+(-2)]2+(5+3)2
=52+102+02+82
=25+100+0+64
=189
Page 12 of 12