0% found this document useful (0 votes)
47 views12 pages

EDA MODULE 1 Nature of Statistics

Uploaded by

Mish Albedo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views12 pages

EDA MODULE 1 Nature of Statistics

Uploaded by

Mish Albedo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter

1 Nature of Statistics

Learning Objectives
After completing this chapter, the students will able to:

• Define Statistics.
• Distinguish between descriptive statistics and inferential statistics.
• Differentiate parametric and statistic.
• Compare and contrast the sources of data.
• Differentiate constant and variable.
• Identify and explain the types of data.
• Differentiate experimental and mathematical variables.
• Classify variables as discrete and continuous.
• List and describe the four levels of measurements.
• Identify and explain the sampling techniques.
• Discuss the methods of collecting and presenting data.
• Evaluate summations of notations.
Chapter Outline
1.1 Introduction
1.2 Division of Statistics
1.3 Parameter and Statistic
1.4 Sources of Data
1.5 Constant and variable
1.6 Types of Data
1.7 Classification of Variables
1.8 Levels of Measurements
1.9 Sampling Techniques
1.10 Methods of Collecting Data
1.11 Methods of Presenting Data
1.12 Summation Notation Sigma∑

Statistics is the grammar of science.


-Karl Pearson

.......
1
1.1 Introduction

Everyday we encounter statistics. Some company advertisement use statistics is


that more customers would prefer their product over competitors such as a certain
petroleum company claims that 60% of fuel consumers preferred their products compared
to other fuel companies
Statistics is also used to show the quality of a product just like the claim of safeguard
soap; the company advertises that their soap can kill 99.99% of germs. There is a wide
application of statistics in different field such as astronomy, business education, sciences
etc.

We define statistics as a branch of mathematics that examines and investigate


ways to process and analyze the data gathered. Statistics provides procedure in data
collection, presentation, organization, and interpretation to have a meaningful idea that is
useful to decision makers.
1.2 Division of Statistics

The origin of modern statistics may be traced to two areas of interest which, on
the surface, have very little common: games of chance or what we call political science.
In eighteen century studies in probability led to mathematical treatment of errors of
measurement and the theory which now forms the foundation of statistics. In the same
century, interest in the numerical description of political units led to development of
methods which nowadays come under the heading of descriptive statistics.
Although descriptive statistics is an important branch of statistics and it continues
to be widely used, statistical information usually arises from samples, and this means that
its analysis will require generalizations which go beyond the data. As a result, the most
important feature of the recent growth of statistics has been a shift in emphasis from the
methods which merely describe to methods which serve to make generalizations: that is,
a shift in emphasis from descriptive statistics to the methods of inferential statistics.
Descriptive Statistics is the totality of methods and treatments employed in the
collection, description and analysis of numerical data. The purpose of a descriptive
statistics is to tell something about particular group of observation. On the other hand,
Inferential Statistics is the logical process from sample analysis to a generalization or
conclusion about a population. It is also called statistical inference or inductive statistics.
A population consists of all members of the group about which we want to draw a
conclusion, while sample is a proportion, or part of the population of interest selected for
analysis.
The relation of between a sample and population is portrayed in Figure 1.1.

Population Sample
Figure1.1:Relation between L B D N G
Population and Sample A F E H J T M N Q K C M
R S W Q V O P R W P D
C I K

Page 2 of 12
1.3 Parameter and Statistic

The major advantage of descriptive statistics is that they permit researchers to describe
the information contained in many scores with just a few indices.

Parameter is a numerical index describing a characteristic of a population.


Statistic is a numerical index describing a characteristic of a sample.
1.4 Sources of Data

There are two main sources of data whether primary or secondary.


Primary of data are data that come from an original source and are intended to answer
specific research questions, can be taken by interview, mail in questionnaire, survey or
experimentation.
Secondary of data are data that are taken from previously recorded data, such as
information in research conducted, industry financial statements, business periodicals and
government reports. It can also be taken electronically (e.g via internet websites, compact
disk etc.).

1.5 Constant and Variable

There two major characteristics of objects, people, or events whether constant or


variable.
Constant. A constant is a characteristic of objects, people or events that does not
vary. For example, the temperature at which water boils (100 degrees Celsius) is a
constant.
Variable. A variable is a characteristic of objects, people, or events that can take
from different values. It can vary in quantity (e.g., weight of people), or in quality (e.g.,
hair color of people).
Variables can be classified in different ways.

1.6 Types of Data

There are basically two types of random variables yielding two types of data:
qualitative and quantitative.
Qualitative Variable. A variable that is conceptualized and analyzed as distinct
categories, with no continuum implied. Also termed categorical variable; that are put in
the same or different classes, each class being considered as possessing some common
characteristic that is not shared by those in other classes.
Example: eye color, gender, occupation, religious preference etc.

Quantitative Variable. A variable that is conceptualized and analyze along a continuum


implied. It differs in amount of degree. Also termed numerical variable: variables that

Page 3 of 12
yield frequencies when counted, giving rise to discrete variable or when measured yield
metric or continuous variable.
Example: height, weight, math aptitude, lust of life, etc.

Figure 1.2 illustrates the types of variables.


Figure 1.2: Types of Variables
Variable

Qualitative Quantitative

Discrete Continuous

1.7 Classification of Variables

Variables can be classified into two according to purpose whether experimental or


mathematical.
Experimental Classification. A researcher may classify variables according to the
function they serve in the experiment.
1. Independent variables are variables controlled by the experimenter/researcher, and
expected to have an effect on the behavior of the subjects. The independent variable is
also called explanatory variable.
2. Dependent variable is some measure of the behavior of subjects and expected to be
influenced by the independent variable. The dependent variable is also called outcome
variable.
Example. To predict the value of fertilizer on the growth of plants, the dependent
variable is the growth of the plants; while the independent variable is the amount of
fertilizer used.
Mathematical Classification. Variables may also be classified in terms of the
mathematical values they may take on within a given interval.
1. Continuous Variable is a variable which can assume any of an infinite number of
values, and can be associated with points on an continuous line interval.
Example: height, weight, volume, etc.
2. Discrete Variable is a variable which consist of either a finite number of values or
countable number of values.
Example: gender, courses, Olympic Games, etc.

Page 4 of 12
1.8 Levels of Measurement

In the broadest sense, all collected data are “measured” in some form. For example,
even discrete quantitative data can be thought of as arising by a process of “measurement
through counting.” The four widely recognized level of measurement- the nominal,
ordinal, interval, and ratio.
A. Nominal level of measurement is mutually exclusive and exhaustive meaning it is
used to differentiate classes or categories for purely classification or identification
purposes. It is the weakest form of measurement because no attempt can be made to
account for differences within the particular category or to specify any ordering or
direction across the various categories. Nominal data are discrete variables.

Mutually Exclusive is a property of a set of categories such that an individual or object


is included in only one category.

Exhaustive is a property of a set of categories such that each individual or object must
appear in a category.

Example:
Qualitative Variable Categories
Gender Male, Female
Automobile Ownership Yes, No
Type of Life Insurance Owned Term, Endowment, Straight-Life, Others, None
B. Ordinal level of measurement is used in ranking. It is somewhat stronger form of
measurement because an observed value classified into one category is said to poses
more of a property being scaled than does an observed value classified into another
category. Nevertheless, within a particular category no attempt is made to account for
differences between the classified values. Moreover, ordinal scaling is still a weak form
of measurement, because no meaningful numerical statements can be made about
differences between categories. That is the ordering implies only which category is
‘’greater’’ or “lesser”- not how much “greater” or “lesser.” Ordinal data are discrete
variables.

Example:
Qualitative Variable Categories
Student class designation Freshman, Sophomore, Junior, Senior
Product satisfaction Unsatisfied, Neutral, Satisfied, Very Satisfied
Movie classification G, PG,PG-13, R-18, X
Faculty Rank Professor, Associate Prof., Assistant Prof, Instructor
Hotel Ratings , , , ,
Student Grades 1.0, 1.25, 1.50, 1.75, 2.00, …

Page 5 of 12
. C. Interval level of measurement is to classify order and differentiate between classes
or categories in terms of degrees of differences. Interval data are either discrete or
continuous variables.
Example:
Qualitative Variable
Temperature (in degree ℃ or℉)
Calendar Time (Gregorian, Hebrew, or Islamic)

D. Ratio level of measurement differs from interval measurement only in one aspect; it
has a true zero point (complete absence of the attitude being measured).With an absolute
value point it can be said that the ratios of two observations is “twice as fast”’ “half as
long” or others. Ratio data are either discrete or continuous variables.

Example:

Qualitative variable
Weight (in pounds or
kilogram)
Age (in years or days)
Salary (in Philippine peso)

Table 1.1 shows the characteristics of levels of measurement

Table 1.1 Characteristics of Levels of Measurement

Level of Measurement Properties


Nominal Indicates a distinction
Ordinal Indicates a distinction
Indicates the direction of the
distinction
(e.g. less than or more than)
Interval Indicates a distinction
Indicates the direction of the
distinction
Indicates the amount of distinction
(/in equal intervals)
Ratio Indicates a distinction
Indicates the direction of the
distinction
Indicates the amount of distinction
Indicates an absolute zero

Page 6 of 12
Figure 1.3 Illustrate the classification of numerical data.

Numerical Data

Qualitative Quantitative

Nominal Ordinal Interval Ratio

1.9 Sampling Techniques

A sample is a group in a research study on which information is obtained. A


population is a group to which the results of the study are intended to apply. In almost all
researches, the sample is smaller than the population, since researchers rarely have access
to all the members of the population.

One of the most important steps in the research process is to select the sample of
individuals who will participate as a part of the study. Sampling refers to the process of
selecting these individuals.

A. Random Sampling is a process whose members had an equal chance of being


selected from the population; it is also called probability sampling.

1. Simple Random Sampling is a process of selecting n sample size in the


population via random numbers or through lottery.
2. Systematic Sampling is a process of selecting a kth element in the population
until the desired number of subjects or respondent is attained.

Example: For instance we have the data shown below; say we want to
consider every 5th on the list.
23 34 12 14 13 23 24 39 27 23
12 15 16 23 26 28 23 22 19 34
25 22 18 30 23 24 17 18 15 12

Therefore, the samples from every 5th from left to right are 13,23,26,34,23, and 12.

3. Stratified Sampling is a process of subdividing the population into subgroups


or strata and drawing members at random from each subgroup or stratum.

Page 7 of 12
Example: given the population of a certain university and a target sample
population of 5,455, determine the sample size of each subgroup or courses.

Field of Population
Specialization
Nursing 6,000
Accountancy 500
Management 2,000
Marketing 1,000
Education 2,500
Total 12,000

To determine the sample size in each subgroup, we will simply multiply the sample
population with respect to each subgroup percentage in reference to the population. The
computation is shown in the last column of the table below.

Field of Population Percentage Sample Size Found by


Specialization
Nursing 6,000 50.00 2,728 0.5000 x 5,455
Accountancy 500 4.16 227 0.0416 x 5,455
Management 2,000 16.66 909 0.1666 x 5,455
Marketing 1,000 8.33 455 0.0833 x 5,455
Education 2,500 20.33 1,136 0.2033 x 5,455
Total 12,000 100.00 5,455

4. Cluster Sampling is a process of selecting clusters from a population which


is very large or widely spread out over a wide geographical area.

Example: If we want to know the opinion of the residents of Manila


regarding the improvement of living in the city. We may use the cluster
sampling by subdividing the city into district then select at random the number
of district to be used as sample.

B. Non-random sampling is a sampling procedure where samples selected in a


deliberate manner with little or no attention to randomization; it is also called
non-probability sampling.

1. Convenience sampling is a process of selecting a group of individuals who


(conveniently) are available for study.

Page 8 of 12
Example: A researcher may only include close friends and clients to be
included in the sample population.

2. Purposive sampling is a process of selecting based from judgment to select a


sample which the researcher believed, based on prior information, will
provide the data they need. The disadvantage of purposive sampling is that the
researcher’s judgment may be in error- he or she may not be correct in
estimating the representative-ness of a sample or their expertise regarding the
information needed. It is also called judgment sampling.

Example: A human resource director interviews the qualified applicants in


supervisory position. (Note: Qualified applicants are selected by the HR
Director which is based from his own judgment.)

3. Quota sampling is applied when an investigator survey collects information


from an assigned number, or quota of individual from one of several sample
units fulfilling certain prescribed criteria or belonging to one stratum. Their
advantage is that they are cheaper to administer.

Example: When the respondents are composed of men aged over 30 or 20


people who have bought cellular phones in the last week. It is the
interviewer’s discretion which men or cellular phone buyers they select.

4. Snowball sampling is a technique in which one or more members of


population are located and used to lead the researchers to the other members
of the population.

Example: Imagine attempting to obtain the frame that includes all the
homeless people in Metro Manila. To obtain a sample of homeless
individuals, for example, the researcher will interview individuals on the street
or at homeless shelter.

Figure1.4 shows the division sampling techniques.

Figure 1.4 Sampling Techniques


Sampling Techniques

Random Non-random

Sample Systematic Stratified Cluster Convenience Purposive Quota Snowball

Page 9 of 12
1.10 Methods of Collecting Data
After the research problem has been laid, the next step is to determine the methods to
collect data. Here are the five basic methods in collecting data.

Direct or Interview Method. It is a face-to-face encounter between the interviewer and


the interviewee. The interview may vary according to the preference of either or both
parties. However, this method is time-consuming expensive, and has limited field
coverage.

Indirect or Questionnaire Method. Unlike direct method, this method utilized


questionnaire to obtain information. It can be done by mail or hand-carried to intended
respondents.

Registration Method. This method of gathering information is governed by laws:

Example: birth certificates, death certificates, and license, etc.

Observation Method. This method is used to data that are pertaining to behaviors of an
individual or a group of individuals at the time of occurrence of a given situation are best
obtained by observation. One limitation of this method is observation is made only at the
times or occurrence of the appropriate events.

Experiment Method. This is used to determine the cause and effect relationship of
certain phenomena under controlled conditions. This method usually employed by
scientific researchers.

1.11 Methods of Presenting Data

There are different ways in presenting data. Three of them are as follows

Textual Method. This method presents the collected data in narrative and paragraph
forms.

Tabular Method. This method presents the collected data in table which are orderly
arranged in rows and column for an easier and more comprehensive comparison of
figures.

Graphical Method. This method presents the collected data in visual or pictorial form to
get a clear view of data. (e.g. histogram, pie chart, pareto chart, pictograph, etc.)

Page 10 of 12
1.12 Summation Notation, Sigma∑

The symbol
𝑛

∑(𝑋𝑖 )
𝑖=1

is used to denote the sum of all the Xi’s from i=1 to i=n; by definition,
𝑛

∑(𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 )
𝑖=1

We often denote this sum simply by ∑X or ∑X1. The symbol ∑ is the Greek capital letter
sigma, denoting sum.

Example:Write the following expressions in expanded form:

1. ∑4𝑖=1(𝑋𝑖 )3 2.∑3𝑖=1(𝑋𝑖 + 2) 3.∑2𝑖=1(𝑋𝑖 + 𝑌𝑖 )3

Solution:

1. ∑4𝑖=1(𝑋𝑖 )3= X13+X23+X33+X43


2. ∑3𝑖=1(𝑋𝑖 + 2)=(X1+2)+(X2+2)+(X3+2)
3. ∑2𝑖=1(𝑋𝑖 + 𝑌𝑖 )3=(X1+Y1)3+(X2+Y2)3

Example: Evaluate the following notations using the values below:

X1=1 X2=3 X3=2 X4=5


Y1=0 Y2=8 Y3=1 Y4=6
Z1=4 Z2=7 Z3=-2 Z4=3

1. ∑4𝑖=1(2𝑋𝑖 𝑌𝑖 ) 2.∑4𝑖=1 𝑍𝑖 (𝑌𝑖 − 𝑋𝑖 ) 3. ∑3𝑖=1(𝑋𝑖 + 𝑍𝑖 )2

Solution:
1. ∑4𝑖=1(2𝑋𝑖 𝑌𝑖 )=2X1Y1+2X2Y2+2X3Y3+2X4Y4
=2(1)(0)+2(3)(8)+2(2)(1)+2(5)(6)
=0+48+4+60
=112

2. ∑4𝑖=1 𝑍𝑖 (𝑌𝑖 − 𝑋𝑖 )=Z1(Y1-X1)+Z2(Y2-X2)+Z3(Y3-X3)+Z4(Y4-X4)


=4(0-1)+7(8-3)+(-2)(1-2)+3(6-5)
=4(-1)+7(5)+(-2)(-1)+3(1)
= (-4)+35+2+3
=36

Page 11 of 12
3
3. ∑𝑖=1(𝑋𝑖 + 𝑍𝑖 )2 =(X1+Z1)2+(X2+Z2)2+(X3+Z3)2
= (1+4)2+(3+7)2+[2+(-2)]2+(5+3)2
=52+102+02+82
=25+100+0+64
=189

Page 12 of 12

You might also like