Module For Stat 121E (BSECE)
Module For Stat 121E (BSECE)
STAT 121
For Electronics Communication Engineering Students
2021
MARILYN S. PAINAGAN
2021
USM MISSION
This module was developed for engineering data analysis with emphasis
on data that is related to Electronics and Communication Engineering. This
module aimed to provide a better understanding of engineering data; its nature,
forms and kinds. Including the different methods of data collection as well as
the introduction to experimental design.
In order to assess the understanding of the users, a self –assessment questions
were included at the end of every chapter.
A course guide was also included in this module, in order to guide the
users on the content, grading system and class policies on this subject.
Page
Chapter 1: Data Collection 11
1.1. Methods of primary data collection 13
1.2. Methods of secondary data collection 24
Chapter 2: Introduction to design of experiments 27
References 35
Course Information
Course Title ENGINEERING DATA ANALYSIS
Course Code STAT 121
Pre-requisite/Co-
requisite Mathematics in Modern World
Course Description
This course is designed for undergraduate engineering students with emphasis
on problem solving related to societal issues that engineers and scientists are
called upon to solve. It introduces different methods of data collection and the
suitability of using a particular method for a given situation.
The relationship of probability to statistics is also discussed, providing students
with the tools they need to understand how "chance" plays a role in statistical
analysis. Probability distributions of random variables and their uses are also
considered, along with a discussion of linear functions of random variables
within the context of their application to data analysis and inference. The course
also includes estimation techniques for unknown parameters; and hypothesis
testing used in making inferences from sample to population; inference for
regression parameters and build models for estimating means and predicting
future values of key variables under study. Finally, statistically based
experimental design techniques and analysis of outcomes of experiments are
discussed with the aid of statistical software.
Course Objectives/Outcomes
Upon passing the course, you must be able to:
1. Apply statistical methods in the analysis of data.
2. Design experiments involving several factors.
9 MIDTERM EXAM
In most cases, data collection is the primary and most important step for
research, irrespective of the field of research. The approach of data collection
is different for different fields of study, depending on the required information.
The goal for all data collection is to capture quality evidence and then
translates to rich data analysis and brings credible answer to questions. Regardless of
the field of study or preference for defining data, accurate data collection is essential
to maintaining the integrity of research. The appropriate data collection instruments
and clear instructions for their correct use reduce the likelihood of errors occurring.
Data collection is one of the most important stages in conducting a research. You can
have the best research design but if you cannot collect the required data you will not
be able to complete your research (Kabir, 2016).
Types of Data
1. Qualitative Data
1. Primary data
Data that has been collected from first-hand-experience
Has not been published yet and is more reliable, authentic and
objective.
Has not been changed or altered by human beings; therefore its
validity is greater than secondary data
In statistical surveys it is necessary to get information from primary sources and work
on primary data. A research can be conducted without secondary data but a research
based on only secondary data is least reliable and may have biases because secondary
data has already been manipulated by human beings.
The researcher has to contend with all the hassles of data collection
Ensuring the data collected is of a high standard
Cost of obtaining the data is often the major expense in studies
2. Secondary Data
Secondary data is essential, since it is impossible to conduct a new survey that can
adequately capture past change and/or developments.
Books
Records
Biographies
Newspapers
Published censuses or other statistical data
Data archives
Internet articles
Research articles by other researchers (journals)
Databases, etc.
The data collected by the third party may not be a reliable party so
the reliability and accuracy of data go down.
Data collected in one location may not be suitable for the other one
due to variable environmental factor.
With the passage of time the data becomes obsolete and very old.
Secondary data collected can distort the results of the research. For
using secondary data a special care is required to amend or modify
for use.
Secondary data can also raise issues of authenticity and copyright.
In primary data collection, you collect the data yourself using qualitative and
quantitative methods. The key point here is that the data you collect is unique to you
and your research and, until you publish, no one else has access to it.
A. Structured
B. Semi-structure
C. Unstructured
A. Structured Interviews
According to Bernard (1988), semi-structure Interviews best used when you won’t
get more than one chance to interview someone and when you will be sending
several interviewers out into the field to collect data.
The inclusion of open-ended questions of interviewers to follow relevant topics that
may stray from the interview guide does, however, still provide the opportunity for
identifying new ways of seeing and understanding the topic at hand.
The object – which refers to the activity or any type of operation that is being
observed.
the observations involving observer as an entity apart from the
thing being observed, are referred to as the objective
observation.
The accuracy of activity sampling will depend on the number of observations. Few
and infrequent observations will provide a low level of accuracy, while many and
frequent observations will give highly accurate but more expensive information. It is,
therefore, particularly important that the observer knows the optimum number of
observations necessary for a particular study.
Inorder to determine the allowable number of observation, this formula is being used:
4P(100−P)
𝑁= L2
equation 1.
This formula will give the accuracy of the study within 95% confidence limits.
For example:
A group of worker is being studied using activity sampling, and 32 observations are
noted. Of these 75% showed that the worker was performing useful work. If we
assume that we would like to check that the worker is performing at this level
Solution:
4(75)(100 − 75)
𝑁=
102
𝑁 = 75
However, after performing 75 checks, the value of P was found to be only 70% so the
extra data could be used to assess the new requirement for the number of checks.
4(70)(100 − 70)
𝑁=
102
𝑁 = 84
Define the process boundaries that mark the entry points of the process inputs
and the exit points of the process outputs.
Construct a process flow diagram that illustrates the various process activities
and their interrelationships.
When you want to understand a work process or some part of a process, these tools
can help:
Process capacity - the capacity of the process is its maximum output rate,
measured in units produced per unit of time. The capacity of a series of tasks
is determined by the lowest capacity task in the string. The capacity of parallel
strings of tasks is the sum of the capacities of the two strings, except for cases
in which the two strings have different outputs that are combined. In such
cases, the capacity of the two parallel strings of tasks is that of the lowest
capacity parallel string.
Capacity utilization - the percentage of the process capacity that actually is
being used.
Throughput rate (also known as flow rate ) - the average rate at which units
flow past a specific point in the process. The maximum throughput rate is the
process capacity.
Flow time (also known as throughput time or lead time) - the average time
that a unit requires to flow through the process from the entry point to the
Link analysis is a data analysis technique used in network theory that is used
to evaluate the relationships or connections between network nodes. These
relationships can be between various types of objects (nodes), including people,
organizations and even transactions. Link analysis is essentially a kind of knowledge
discovery that can be used to visualize data to allow for better analysis, especially in
the context of links, whether Web links or relationship links between people or
between different entities.
Link analysis is literally about analyzing the links between objects, whether
they are physical, digital or relational. This requires diligent data gathering. For
example, in the case of a website where all of the links and backlinks that are present
must be analyzed, a tool has to sift through all of the HTML codes and various scripts
in the page and then follow all the links it finds in order to determine what sort of links
are present and whether they are active or dead.
The prime method of inquiry in science is the experiment. The key features
are control over variables, careful measurement, and establishing cause and effect
relationships. An experiment is an investigation in which a hypothesis is scientifically
tested. In an experiment, an independent variable (the cause) is manipulated and the
dependent variable (the effect) is measured; any extraneous variables are controlled.
An advantage is that experiments should be objective. The views and opinions of the
researcher should not affect the results of a study. This is good as it makes the data
more valid, and less bias.
Limitation: The artificiality of the setting may produce unnatural behavior that does
not reflect real life, i.e. low ecological validity. This means it would not be possible to
generalize the findings to a real life setting. Demand characteristics or experimenter
effects may bias the results and become confounding variables.
2. Field Experiments: Field experiments are done in the everyday (i.e. real life)
environment of the participants. The experimenter still manipulates the
independent variable, but in a real-life setting (so cannot really control
extraneous variables).
Strength: Behavior in a field experiment is more likely to reflect real life because of its
natural setting, i.e. higher ecological validity than a lab experiment. There is less
Limitation: There is less control over extraneous variables that might bias the
results. This makes it difficult for another researcher to replicate the study in exactly
the same way.
Strength: Behavior in a natural experiment is more likely to reflect real life because of
its natural setting, i.e. very high ecological validity. There is less likelihood of demand
characteristics affecting the results, as participants may not know they are being
studied. Can be used in situations in which it would be ethically unacceptable to
manipulate the independent variable, e.g. researching stress.
Limitation: They may be more expensive and time consuming than lab experiments.
There is no control over extraneous variables that might bias the results. This makes
it difficult for another researcher to replicate the study in exactly the same way.
Secondary data is the data that is collected from the primary sources which can be
used in the current research study. Collecting secondary data often takes
considerably less time than collecting primary data where you would have to gather
every information from scratch. It is thus possible to gather more data this way.
Formative Assessment
You should begin with a specific research question in mind. You may need to spend
time reading about your field of study to identify knowledge gaps and to find
questions that interest you.
Example:
We will work with two research question one from electronics & health and one
from sensors & ecology:
You want to know how phone use before bedtime affects sleep patterns. Specifically,
you ask how the number of minutes a person uses their phone before sleep affects
the number of hours they sleep.
You want to know if what type of temperature sensor can accurately detect changes
in temperature.
Phone use and sleep Minutes of phone use Hours of sleep per night
before sleep
Then you need to think about possible confounding variables and consider how you
might control for them in your experiment.
Finally, put these variables together into a diagram. Use arrows to show the possible
relationships between variables and include signs to show the expected direction of
the relationships. (This is now your hypothesis).
?
Natural
Variation
in sleep
Figure 1. The illustration showed that we predict that increasing phone use is
negatively correlated with hours of sleep, and predict an unknown influence of
natural variation on hours of sleep.
+
Temperature Temperature
sensors readings
Environmental
Factors
Now that you have a strong conceptual understanding of the system you are
studying, you should be able to write a specific, testable hypothesis that addresses
your research question.
Phone use and Phone use before sleep does Increasing phone use
sleep not correlate with the amount before sleep leads to a
of sleep a person gets. decrease in sleep
The next steps will describe how to design a controlled experiment. In a controlled
experiment, you must be able to:
If your study system doesn’t match these criteria, there are other types of
research you can use to answer your research question.
How you manipulate the independent variable can affect the experiment’s external
validity – that is, the extent to which the results can be generalized and applied to the
broader world.
First, you may need to decide how widely to vary your independent variable.
Second, you may need to choose how finely to vary your independent variable.
Sometimes this choice is made for you by your experimental system, but often you
will need to decide, and this will affect how much you can infer from your results.
Phone-use experiment:
First, you need to consider the study size: how many individuals will be included in the
experiment? In general, the more subjects you include, the greater your
experiment’s statistical power, which determines how much confidence you can have
in your results.
Then you need to randomly assign your subjects to treatment groups. Each group
receives a different level of the treatment (e.g. no phone use, low phone use, high
phone use).
You should also include a control group, which receives no treatment. The control
group tells us what would have happened to your test subjects without any
experimental intervention.
When assigning your subjects to groups, there are two main choices you need to
make:
Randomization
An experiment can be completely randomized or randomized within blocks:
Phone use Subjects are all randomly Subjects are first grouped by
and sleep assigned a level of phone use age, and then phone use
using a random number treatments are randomly
generator. assigned within these
groups.
In medical or social research, you might also use matched pairs within your
independent measures design to make sure that each treatment group
contains the same variety of test subjects in the same proportions.
Write a hypothesis of each statement and identify the variables, control and
experimental group.
1. Cigarette smoking increases the risk of lung cancer.
Hypothesis: If _______________________then _______________________________
Independent Variable: _______________Dependent Variable:___________________
Control Group: _____________________Experimental Group:___________________
2. Internet traffic increases during daytime due to many users.
Hypothesis: If _____________________________then__________________________
Independent Variable: _______________Dependent Variable:___________________
References:
A Guide to experimental design. Retrieved at www.scribbr.com/methodology/
Analysis of Variance. Retrieved from
http://www.cimt.org.uk/projects/mepres/alevel/fstats_ch7.pdf
Analysis of variance. Retrieved from
https://sites.calvin.edu/scofield/courses/m143/materials/handouts/anova1And2.pdf
Gomez and Gomez. (1983). Statistical Procedures for Agricultural Research. Wiley
Inter-Science Publication.
Hypothesis testing and ANOVA. Retrieved from
https://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture7.pd
Hypothesis testing. Retrieved from
http://math.ucdenver.edu/~ssantori/MATH2830SP13/Math2830-Chapter-08.pdf
Kabir, Syed Muhammad. (2016). METHODS OF DATA COLLECTION. pp 201-275.
Retrieved from https://www.researchgate.net/publication/325846997