0% found this document useful (0 votes)
41 views

Module For Stat 121E (BSECE)

Uploaded by

Salvacion Bandoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Module For Stat 121E (BSECE)

Uploaded by

Salvacion Bandoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

ENGINEERING DATA ANALYSIS

STAT 121
For Electronics Communication Engineering Students

MARILYN S. PAINAGAN, PhD

2021

UNIVERSITY OF SOUTHERN MINDANAO


Kabacan, Cotabato
ENGINEERING DATA ANALYSIS
STAT 121
For Electronics Communication Engineering Students

MARILYN S. PAINAGAN

2021

UNIVERSITY OF SOUTHERN MINDANAO


Kabacan, Cotabato
Author’s Declaration

Ideas, concepts, diagrams and/or illustrations depicted in this learning


material are excerpts from established references and properly noted in the list
of literatures cited herein. The author/compiler in this learning material does
not claim full and authentic ownership of all the contents of this material, nor in
any manner willfully infringe the copyright law and other existing provisions
appertaining thereto.

This learning material is printed for the sole use of classroom or


distance/remote learning of USM and is not intended for commercial
purposes. Any use or reproduction in part or in full, whether electronic or
mechanical, photocopying or recording in any information storage and retrieval
system, other than what it is intended for requires the consent of authorized
and competent authority of the University of Southern Mindanao.

Stat 121: Engineering Data Analysis


iv
USM VISION

Quality and relevant education for its clientele to be globally competitive,


culture sensitive and morally responsive human resources for sustainable
development

USM MISSION

Help accelerate socio-economic development, promote harmony among


diverse communities and improve quality of life through instruction, research,
extension and resource generation in Southern Philippines.

UNIVERSITY QUALITY POLICY STATEMENT

The University of Southern Mindanao, as a premier university, is


committed to provide quality instruction, research development and extension
services and resource generation that exceed stakeholders’ expectations
through the management of continual improvement efforts on the following
initiatives.
1. Establish Key Result Areas and performance indicators across all
mandated functions;
2. Implement quality educational programs;
3. Guarantee competent educational service providers;
4. Spearhead need-based research outputs for commercialization,
publication, patenting, and develop technologies for food security,
climate change mitigation and improvement in the quality of life;
5. Facilitate transfer of technologies generated from research to the
community for sustainable development;
6. Strengthen relationship with stakeholders;
7. Sustain good governance and culture sensitivity; and
8. Comply to customer, regulatory and statutory requirements.

Stat 121: Engineering Data Analysis


v
PREFACE

This module was developed for engineering data analysis with emphasis
on data that is related to Electronics and Communication Engineering. This
module aimed to provide a better understanding of engineering data; its nature,
forms and kinds. Including the different methods of data collection as well as
the introduction to experimental design.
In order to assess the understanding of the users, a self –assessment questions
were included at the end of every chapter.
A course guide was also included in this module, in order to guide the
users on the content, grading system and class policies on this subject.

Stat 121: Engineering Data Analysis


vi
TABLE OF CONTENTS

Page
Chapter 1: Data Collection 11
1.1. Methods of primary data collection 13
1.2. Methods of secondary data collection 24
Chapter 2: Introduction to design of experiments 27
References 35

Stat 121: Engineering Data Analysis


vii
COURSE GUIDE

Course Information
Course Title ENGINEERING DATA ANALYSIS
Course Code STAT 121
Pre-requisite/Co-
requisite Mathematics in Modern World

Course Description
This course is designed for undergraduate engineering students with emphasis
on problem solving related to societal issues that engineers and scientists are
called upon to solve. It introduces different methods of data collection and the
suitability of using a particular method for a given situation.
The relationship of probability to statistics is also discussed, providing students
with the tools they need to understand how "chance" plays a role in statistical
analysis. Probability distributions of random variables and their uses are also
considered, along with a discussion of linear functions of random variables
within the context of their application to data analysis and inference. The course
also includes estimation techniques for unknown parameters; and hypothesis
testing used in making inferences from sample to population; inference for
regression parameters and build models for estimating means and predicting
future values of key variables under study. Finally, statistically based
experimental design techniques and analysis of outcomes of experiments are
discussed with the aid of statistical software.

Course Objectives/Outcomes
Upon passing the course, you must be able to:
1. Apply statistical methods in the analysis of data.
2. Design experiments involving several factors.

Course Learning/Study Plan/Schedule


Week/Date Topic Teaching Learning Assessment
(Deadlines) and Materials
Learning
Activities
2-3 Data Collection Facilitate IM for STAT
 Methods of the use of 121
Data IM and
Collection Exercise #1:
assist the
 Planning and Data
students Collection
Conducting
Surveys Study the
 Planning and IM
Conducting

Stat 121: Engineering Data Analysis


8
Experiments:
Introduction
to Design of
Experiments

4.-5 -Probability Facilitate IM for STAT Exercise #2:


-Discreet Probability the use of 121 Illustrate the
Distributions IM and understanding
- Continuous of probability
assist the
Probability through
Distribution students illustrations
6-7 - Joint Probability
Study the
Distribution
module

8-9 -Sampling Assist the IM for STAT


Distributions and students 121
Point Estimation of and Exercise #3:
Parameters Finding the
facilitate the
critical interval
use of IM. of a population
-Statistical intervals Study the
module

9 MIDTERM EXAM

Course Requirements/Assessment and Evaluation


Scheme/Grading System
Course Evaluation
Assessment
Course Outcomes Satisfactory Target
Task Weight (%)
(CO) Rating Standard
Addressing CO
Exercise #1 10%
CO 1: Apply at least 60%
statistical of the
Exercise #2 10%
methods in 75% students have
the analysis at least 75%
Exercise #3 10%
of data score

15% at least 60%


of the
Midterm Exam 75% students have
at least 75%
score
CO 2: Design Exercise #4 10% at least 60%
experiments Exercise #5 10% of the
involving 75% students have
Exercise #6 10%
several at least 75%
factors Exercise #7 10% score

Stat 121: Engineering Data Analysis


9
15% 75% Atleast 75%
of the class
Final Exam obtain a
rating of 60%
and above

House Rules/Class Policies


Classroom Policies
1. Attendance will be checked every class meeting through VLE. Provisions of the
USM Code regarding attendance of student will be strictly implemented.
2. Cheating is strictly prohibited. Anyone who is caught submitting a copied
requirement will be sanctioned based on USM student handbook.
3. All requirements must be submitted on or before the set deadline.

Intended Learning Outcomes


By the end of this chapter, you must be able to:
1. Understand the processes of data collection.
2. Illustrate the processes of collecting data.
3. Understand the basic concepts of probability
4. Able to solve probability problems
5. Understand the general concepts of point estimation and statistical
intervals
6. Illustrate the ability to predict point in the parameters

Stat 121: Engineering Data Analysis


10
CHAPTER 1: DATA COLLECTION

Introduction Data collection means gathering information to address critical


evaluation questions that the author has identified earlier in the evaluation process.
Data collection is defined as the procedure of collecting, measuring and analyzing
accurate insights for research using standard validated techniques.

In the field of Electronics and Communication


Engineering, most data that are being collected are related to data science.
According to Purdue School of Engineering and Technology, the field of Data Science
is indebted to electrical engineering, because data science has adopted many
techniques within the signal processing field ranging from signal analysis to neural
network to deep learning and many more. Many consider machine learning as an
outgrowth of statistical signal processing techniques.
In ECE, Data Science research is conducted on both the data analysis (knowledge
generation) as well as data production.

In most cases, data collection is the primary and most important step for
research, irrespective of the field of research. The approach of data collection
is different for different fields of study, depending on the required information.

The goal for all data collection is to capture quality evidence and then
translates to rich data analysis and brings credible answer to questions. Regardless of
the field of study or preference for defining data, accurate data collection is essential
to maintaining the integrity of research. The appropriate data collection instruments
and clear instructions for their correct use reduce the likelihood of errors occurring.
Data collection is one of the most important stages in conducting a research. You can
have the best research design but if you cannot collect the required data you will not
be able to complete your research (Kabir, 2016).

Types of Data
1. Qualitative Data

 Data is in the form of words, pictures or objects


 Subjective – individuals interpretation of events is important ,e.g., uses
participant observation, in-depth interviews, etc.
 Qualitative data are mostly non-numerical and usually descriptive or nominal
in nature.

Stat 121: Engineering Data Analysis


11
2. Quantitative Data

 Data is in the form of numbers and statistics


 Quantitative data measure uses different scales, which can be classified as
nominal scale, ordinal scale, interval scale and ratio scale.
 Objective: seeks precise measurement & analysis of target concepts, e.g.,
uses surveys, questionnaires, etc.

Classification of data based upon who collected the data

1. Primary data
 Data that has been collected from first-hand-experience
 Has not been published yet and is more reliable, authentic and
objective.
 Has not been changed or altered by human beings; therefore its
validity is greater than secondary data

In statistical surveys it is necessary to get information from primary sources and work
on primary data. A research can be conducted without secondary data but a research
based on only secondary data is least reliable and may have biases because secondary
data has already been manipulated by human beings.

Sources of Primary Data


 Experiments
 Survey
 Questionnaire
 Interview
 Observations

Advantages of Using Primary Data


 The investigator collects data specific to the problem under study.
 There is no doubt about the quality of the data collected (for the
investigator).
 If required, it may be possible to obtain additional data during the
study period.

Disadvantages of Using Primary Data

 The researcher has to contend with all the hassles of data collection
 Ensuring the data collected is of a high standard
 Cost of obtaining the data is often the major expense in studies

2. Secondary Data

Stat 121: Engineering Data Analysis


12
 Data collected from a source that has already been published in any
form
 The review of literature in any research is based on secondary data
 It is collected by someone else for some other purpose (but being
utilized by the researcher for another purpose).

Secondary data is essential, since it is impossible to conduct a new survey that can
adequately capture past change and/or developments.

Sources of Secondary Data

 Books
 Records
 Biographies
 Newspapers
 Published censuses or other statistical data
 Data archives
 Internet articles
 Research articles by other researchers (journals)
 Databases, etc.

Advantages of using Secondary Data

 No hassles of data collection.


 It is less expensive.
 The investigator is not personally responsible for the quality of data.

Disadvantages of Using Secondary Data

 The data collected by the third party may not be a reliable party so
the reliability and accuracy of data go down.
 Data collected in one location may not be suitable for the other one
due to variable environmental factor.
 With the passage of time the data becomes obsolete and very old.
 Secondary data collected can distort the results of the research. For
using secondary data a special care is required to amend or modify
for use.
 Secondary data can also raise issues of authenticity and copyright.

Stat 121: Engineering Data Analysis


13
Methods of Primary Data Collection
1.1

In primary data collection, you collect the data yourself using qualitative and
quantitative methods. The key point here is that the data you collect is unique to you
and your research and, until you publish, no one else has access to it.

The main methods includes the following:


 Questionnaires
 Interviews
 Focus Group Interviews
 Observation
 Survey
 Case-studies
 Diaries
 Activity Sampling Technique
 Memo Motion Study
 Process Analysis
 Link Analysis
 Time and Motion Study
 Experimental Method
 Statistical Method etc.

Out of all these methods, we will only discuss on the commonly


used method of data collection in Electronics and Communication
Engineering.

1.1.1 Questionnaire Method

Questionnaire is a research instrument consisting of a series of


questions and other prompts for the purpose of gathering information
from respondents.

Two types of questionnaire

1. Questionnaires with questions that measure separate variables


 commonly part of surveys
This could for instance include questions on:
 preferences (e.g. political party)
 behaviors (e.g. food consumption)
 facts (e.g. gender)

Stat 121: Engineering Data Analysis


14
2. Questionnaires with questions that are aggregated into either a scale or index
 commonly part of tests
This include for instance questions that measure:
 latent traits (e.g. personality traits such as extroversion)
 attitudes (e.g. towards immigration)
 an index (e.g. Social Economic Status)
Question Types:
1. Open- ended question - asks the respondent to formulate his/her own
answer
2. Closed- ended question - has the respondent pick an answer from a given
number of options
Four types of response scales for closed-ended questions are distinguished:
 Dichotomous, where the respondent has two options.
 Nominal-polytomous, where the respondent has more than two
unordered options.
 Ordinal-polytomous, where the respondent has more than two
ordered options.
 Continuous (Bounded), where the respondent is presented with a
continuous scale.

Basic Rules for Questionnaire Item Construction

The basic rules are:


 Use statements which are interpreted in the same way by members
of different subpopulations of the population of interest.
 Use statements where persons that have different opinions or traits
will give different answers.
 Think of having an ‘open’ answer category after a list of possible
answers.
 Use only one aspect of the construct you are interested in per item.
 Use positive statements and avoid negatives or double negatives.
 Do not make assumptions about the respondent.
 Use clear and comprehensible wording, easily understandable for all
educational levels.
 Use correct spelling, grammar and punctuation.
 Avoid items that contain more than one question per item (e.g. Do
you like strawberries and potatoes?).
 Question should not be biased or even leading the participant
towards an answer.

Main modes of questionnaire administration are:

 Face-to-face questionnaire administration, where an interviewer


presents the items orally.
 Paper-and-pencil questionnaire administration, where the items are
presented on paper.

Stat 121: Engineering Data Analysis


15
 Computerized questionnaire administration, where the items are
presented on the computer.
 Adaptive computerized questionnaire administration, where a
selection of items is presented on the computer, and based on the
answers on those items, the computer selects following items
optimized for the testee’s estimated ability or trait.

1.1.2 Interview Method

 Interviewing involves asking questions and getting answers from


participants in a study.

Interviews can be:

A. Structured
B. Semi-structure
C. Unstructured

A. Structured Interviews

Characteristics of the structured interview:

 The interviewer asks a particular set of predetermined questions


 All candidates are asked the same questions in the same order
 The questions are created prior to the interview, and often have a
limited set of response categories

When to Use Structured questions are best used when


a Structured the interviewer fully understood the
Interview? topic and created in advance a set of
questions that provides respondents with
relevant, meaningful and appropriate
response categories to choose from for
each question.

Stat 121: Engineering Data Analysis


16
B. Semi-structure Interviews
Characteristics of Semi-structured Interviews

 The interviewer and respondents engage in a formal interview.


 The interviewer develops and uses an ‘interview guide’. This is a list of
questions and topics that need to be covered during the conversation,
usually in a particular order.
 The interviewer follows the guide, but is able to follow topical
trajectories in the conversation that may stray from the guide when
she/he feels this is appropriate.

According to Bernard (1988), semi-structure Interviews best used when you won’t
get more than one chance to interview someone and when you will be sending
several interviewers out into the field to collect data.
The inclusion of open-ended questions of interviewers to follow relevant topics that
may stray from the interview guide does, however, still provide the opportunity for
identifying new ways of seeing and understanding the topic at hand.

1.1.3 OBSERVATIONAL METHOD

As a method of data collection for research purposes, observation is more than


just looking or listening. According to Stenhouse (1975), Research is simply defined as
‘systematic enquiry made public’.
 In order to become systematic, observation must in some way be
selective. Observation harnesses this ability; systematic observation
entails careful planning of what we want to observe.
 In order to make observation ‘public’, what we see or hear has to be
recorded in some way to allow the information to be analysed and
interpreted.

Classification of Observational Method

1. Casual Observation - An observation with a casual approach involves observing


the right thing at the right place and also at the right time by a matter of
chance.
2. Scientific Observation – scientific observation involves the use of the tools of
the measurement, but a very important point to be kept in mind here is that
all the observations are not scientific in nature.
3. Natural Observation - Natural observation involves observing the behaviour in
a normal setting and in this type of observation, no efforts are made to bring
any type of change in the behavior of the observed.

Stat 121: Engineering Data Analysis


17
Two main components of observation

 The subject - which refers to the observer.


Subjective observation involves the observation of the one’s
own immediate experience.

 The object – which refers to the activity or any type of operation that is being
observed.
the observations involving observer as an entity apart from the
thing being observed, are referred to as the objective
observation.

1.1.4 ACTIVITY SAMPLING

Activity sampling is a technique whereby a number of successive observations


are made over a period of time of one or a group of workers, machines or processes.
The activity sampling technique was devised for the purpose of getting information
on the time spent by groups of workers or machines on various activities or delays.
The activity sampling technique is conducted over a representative period of work by
taking samples of activity of the operators and machines to be included and then
analyzed using statistical tolerance procedures

The accuracy of activity sampling will depend on the number of observations. Few
and infrequent observations will provide a low level of accuracy, while many and
frequent observations will give highly accurate but more expensive information. It is,
therefore, particularly important that the observer knows the optimum number of
observations necessary for a particular study.

Inorder to determine the allowable number of observation, this formula is being used:

4P(100−P)
𝑁= L2
equation 1.

Where: N = Number of observations;


P = Approximate occurrence of factor as a percentage of N;
L = Acceptable accuracy in occurrence of factor being studied

This formula will give the accuracy of the study within 95% confidence limits.

For example:
A group of worker is being studied using activity sampling, and 32 observations are
noted. Of these 75% showed that the worker was performing useful work. If we
assume that we would like to check that the worker is performing at this level

Stat 121: Engineering Data Analysis


18
continuously, plus or minus 10%, ie. between 67.5% and 82.5%, how many
observations would we need to provide 95% confidence in the result.

Solution:

Here, P = 75%; L = 10%

4(75)(100 − 75)
𝑁=
102

𝑁 = 75

However, after performing 75 checks, the value of P was found to be only 70% so the
extra data could be used to assess the new requirement for the number of checks.

4(70)(100 − 70)
𝑁=
102
𝑁 = 84

Hence more checks would be required.


Once these checks had been completed, a final calculation should be done to ensure
that the number required had not changed.

1.1.5 PROCESS ANALYSIS

A step-by-step breakdown of the phases of a process, used to convey the


inputs, outputs, and operations that take place during each phase. A process analysis
can be used to improve understanding of how the process operates, and to determine
potential targets for process improvement through removing waste and increasing
efficiency.

Process analysis generally involves the following tasks:

 Define the process boundaries that mark the entry points of the process inputs
and the exit points of the process outputs.
 Construct a process flow diagram that illustrates the various process activities
and their interrelationships.

Stat 121: Engineering Data Analysis


19
 Determine the capacity of each step in the process. Calculate other measures
of interest.
 Identify the bottleneck, that is, the step having the lowest capacity.
 Evaluate further limitations in order to quantify the impact of the bottleneck.
 Use the analysis to make operating decisions and to improve the process.
Process Analysis Tools

When you want to understand a work process or some part of a process, these tools
can help:

 Flowchart: A picture of the separate steps of a process in sequential order,


including materials or services entering or leaving the process (inputs and
outputs), decisions that must be made, people who become involved, time
involved at each step and/or process measurements.
 Failure Mode Effects Analysis (FMEA): A step-by-step approach for
identifying all possible failures in a design, a manufacturing or assembly
process, or a product or service; studying the consequences, or effects, of
those failures; and eliminating or reducing failures, starting with the highest-
priority ones.
 Mistake-proofing: The use of any automatic device or method that either
makes it impossible for an error to occur or makes the error immediately
obvious once it has occurred.
 Spaghetti Diagram: A spaghetti diagram is a visual representation using a
continuous flow line tracing the path of an item or activity through a process.
The continuous flow line enables process teams to identify redundancies in
the work flow and opportunities to expedite process flow.

Process Performance Measures

Operations managers are interested in process aspects such as cost, quality,


flexibility, and speed. Some of the process performance measures that communicate
these aspects include:

 Process capacity - the capacity of the process is its maximum output rate,
measured in units produced per unit of time. The capacity of a series of tasks
is determined by the lowest capacity task in the string. The capacity of parallel
strings of tasks is the sum of the capacities of the two strings, except for cases
in which the two strings have different outputs that are combined. In such
cases, the capacity of the two parallel strings of tasks is that of the lowest
capacity parallel string.
 Capacity utilization - the percentage of the process capacity that actually is
being used.
 Throughput rate (also known as flow rate ) - the average rate at which units
flow past a specific point in the process. The maximum throughput rate is the
process capacity.
 Flow time (also known as throughput time or lead time) - the average time
that a unit requires to flow through the process from the entry point to the

Stat 121: Engineering Data Analysis


20
exit point. The flow time is the length of the longest path through the process.
Flow time includes both processing time and any time the unit spends
between steps.
 Cycle time - the time between successive units as they are output from the
process. Cycle time for the process is equal to the inverse of the throughput
rate. Cycle time can be thought of as the time required for a task to repeat
itself. Each series task in a process must have a cycle time less than or equal
to the cycle time for the process. Put another way, the cycle time of the
process is equal to the longest task cycle time. The process is said to be in
balance if the cycle times are equal for each activity in the process. Such
balance rarely is achieved.
 Process time - the average time that a unit is worked on. Process time is flow
time less idle time.
 Idle time - time when no activity is being performed, for example, when an
activity is waiting for work to arrive from the previous activity. The term can
be used to describe both machine idle time and worker idle time.
 Work In process - the amount of inventory in the process.
 Set-up time - the time required to prepare the equipment to perform an
activity on a batch of units. Set-up time usually does not depend strongly on
the batch size and therefore can be reduced on a per unit basis by increasing
the batch size.
 Direct labor content - the amount of labor (in units of time) actually contained
in the product. Excludes idle time when workers are not working directly on
the product. Also excludes time spent maintaining machines, transporting
materials, etc.
 Direct labor utilization - the fraction of labor capacity that actually is utilized
as direct labor.

1.1.6 LINK ANALYSIS

Link analysis is a data analysis technique used in network theory that is used
to evaluate the relationships or connections between network nodes. These
relationships can be between various types of objects (nodes), including people,
organizations and even transactions. Link analysis is essentially a kind of knowledge
discovery that can be used to visualize data to allow for better analysis, especially in
the context of links, whether Web links or relationship links between people or
between different entities.
Link analysis is literally about analyzing the links between objects, whether
they are physical, digital or relational. This requires diligent data gathering. For
example, in the case of a website where all of the links and backlinks that are present
must be analyzed, a tool has to sift through all of the HTML codes and various scripts
in the page and then follow all the links it finds in order to determine what sort of links
are present and whether they are active or dead.

Stat 121: Engineering Data Analysis


21
In networking, link analysis may involve determining the integrity of the
connection between each network node by analyzing the data that passes through
the physical or virtual links. With the data, analysts can find bottlenecks and possible
fault areas and are able to patch them up more quickly or even help with network
optimization.
Link analysis has three primary purposes:

 Find matches for known patterns of interests between linked objects.


 Find anomalies by detecting violated known patterns.
 Find new patterns of interest (for example, in social networking and
marketing and business intelligence).

1.1.7 EXPERIMENTAL METHOD

The prime method of inquiry in science is the experiment. The key features
are control over variables, careful measurement, and establishing cause and effect
relationships. An experiment is an investigation in which a hypothesis is scientifically
tested. In an experiment, an independent variable (the cause) is manipulated and the
dependent variable (the effect) is measured; any extraneous variables are controlled.
An advantage is that experiments should be objective. The views and opinions of the
researcher should not affect the results of a study. This is good as it makes the data
more valid, and less bias.

There are three types of experiments:


1. Laboratory / Controlled Experiments: This type of experiment is conducted in
a well-controlled environment – not necessarily a laboratory – and therefore
accurate measurements are possible.

Strength: It is easier to replicate (i.e. copy) a laboratory experiment. This is because a


standardized procedure is used. They allow for precise control of extraneous and
independent variables. This allows a cause and effect relationship to be established.

Limitation: The artificiality of the setting may produce unnatural behavior that does
not reflect real life, i.e. low ecological validity. This means it would not be possible to
generalize the findings to a real life setting. Demand characteristics or experimenter
effects may bias the results and become confounding variables.

2. Field Experiments: Field experiments are done in the everyday (i.e. real life)
environment of the participants. The experimenter still manipulates the
independent variable, but in a real-life setting (so cannot really control
extraneous variables).

Strength: Behavior in a field experiment is more likely to reflect real life because of its
natural setting, i.e. higher ecological validity than a lab experiment. There is less

Stat 121: Engineering Data Analysis


22
likelihood of demand characteristics affecting the results, as participants may not
know they are being studied. This occurs when the study is covert.

Limitation: There is less control over extraneous variables that might bias the
results. This makes it difficult for another researcher to replicate the study in exactly
the same way.

3. Natural Experiments: Natural experiments are conducted in the everyday


(i.e. real life) environment of the participants, but here the experimenter has
no control over the IV as it occurs naturally in real life.

Strength: Behavior in a natural experiment is more likely to reflect real life because of
its natural setting, i.e. very high ecological validity. There is less likelihood of demand
characteristics affecting the results, as participants may not know they are being
studied. Can be used in situations in which it would be ethically unacceptable to
manipulate the independent variable, e.g. researching stress.

Limitation: They may be more expensive and time consuming than lab experiments.
There is no control over extraneous variables that might bias the results. This makes
it difficult for another researcher to replicate the study in exactly the same way.

1.1.8 STATISTICAL METHODS

Statistical methods are the methods of collecting, summarizing, analyzing,


and interpreting variable(s) in numerical data. Statistical methods can be contrasted
with deterministic methods, which are appropriate where observations are exactly
reproducable or are assumed to be so. Data collection involves deciding what to
observe in order to obtain information relevant to the questions whose answers are
required, and then making the observations.

Statistical analysis relates observed statistical data to theoretical models, such as


probability distributions or models used in regression analysis. By estimating
parameters in the proposed model and testing hypotheses about rival models, one
can assess the value of the information collected and the extent to which the
information can be applied to similar situations. Statistical prediction is the
application of the model thought to be most appropriate, using the estimated values
of the parameters. More recently, less formal methods of looking at data have been
proposed, including exploratory data analysis.

Stat 121: Engineering Data Analysis


23
1.2 METHODS OF SECONDARY DATA COLLECTION

Secondary data is the data that is collected from the primary sources which can be
used in the current research study. Collecting secondary data often takes
considerably less time than collecting primary data where you would have to gather
every information from scratch. It is thus possible to gather more data this way.

Secondary data can be obtained from two different research strands:


 Quantitative: Census, housing, social security as well as electoral statistics and
other related databases.
 Qualitative: Semi-structured and structured interviews, focus groups
transcripts, field notes, observation records and other personal, research-
related documents.

Sources of Secondary data:

 Published Printed Sources: There are varieties of published printed sources.


Their credibility depends on many factors. For example, on the writer,
publishing company and time and date when published. New sources are
preferred and old sources should be avoided as new technology and
researches bring new facts into light.
 Books: Books are available today on any topic that you want to research. The
use of books start before even you have selected the topic. After selection of
topics books provide insight on how much work has already been done on the
same topic and you can prepare your literature review. Books are secondary
source but most authentic one in secondary sources.
 Journals/periodicals: Journals and periodicals are becoming more important
as far as data collection is concerned. The reason is that journals provide up-
to-date information which at times books cannot and secondly, journals can
give information on the very specific topic on which you are researching rather
talking about more general topics.
 Magazines/Newspapers: Magazines are also effective but not very reliable.
Newspapers on the other hand are more reliable and in some cases the
information can only be obtained from newspapers as in the case of some
political studies.
 Published Electronic Sources: As internet is becoming more advance, fast and
reachable to the masses; it has been seen that much information that is not
available in printed form is available on internet. In the past the credibility of
internet was questionable but today it is not. The reason is that in the past
journals and books were seldom published on internet but today almost every

Stat 121: Engineering Data Analysis


24
journal and book is available online. Some are free and for others you have to
pay the price.
 e-journals: e-journals are more commonly available than printed journals.
Latest journals are difficult to retrieve without subscription but if your
university has an e-library you can view any journal, print it and those that are
not available you can make an order for them.
 General Websites: Generally websites do not contain very reliable information
so their content should be checked for the reliability before quoting from
them.
 Weblogs: Weblogs are also becoming common. They are actually diaries
written by different people. These diaries are as reliable to use as personal
written diaries.
 Unpublished Personal Records: Some unpublished data may also be useful in
some cases.
 Diaries: Diaries are personal records and are rarely available but if you are
conducting a descriptive research then they might be very useful. The Anne
Frank’s diary is the most famous example of this. That diary contained the
most accurate records of Nazi wars.
 Letters: Letters like diaries are also a rich source but should be checked for
their reliability before using them.
 Government Records: Government records are very important for marketing,
management, humanities and social science research.
 Census Data/population statistics: Health records; Educational institutes’
records etc.
 Public Sector Records: NGOs’ survey data; Other private companies records.

Summary for Chapter 1

Engineering Data comes in different forms and kinds, therefore, different


method of collection is required based on the desired data. We have also learned that
the accuracy of the analysis depends on the number of datasets. It is therefore
imperative to plan the number of datasets before implementing a data analysis.

Formative Assessment

Stat 121: Engineering Data Analysis


25
Hi! I’m SAQ
(Self-Assessment Questions)!
Let’s see if you understand
our lessons.
Answer the questions below.

1. Differentiate the primary data from secondary data.


______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
2. In the field of electronics and communication engineering what are the
common methods of data collection?
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
3. In terms of data collection, why is it that secondary data is easier to
collect than primary data?
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________

Stat 121: Engineering Data Analysis


26
CHAPTER 2: Introduction to Design of Experiments
An experiment is a type of research method in which you manipulate one or
more independent variables and measure their effect on one or more dependent
variables. Experimental design means creating a set of procedures to test a
hypothesis.
A good experimental design requires a strong understanding of the system
you are studying. By first considering the variables and how they are related (Step 1),
you can make predictions that are specific and testable (Step 2). How widely and
finely you vary your independent variable (Step 3) will determine the level of detail
and the external validity of your results. Your decisions about randomization,
experimental controls, and independent vs repeated-measures designs (Step 4) will
determine the internal validity of your experiment.

Step 1 Define your research question and variables

You should begin with a specific research question in mind. You may need to spend
time reading about your field of study to identify knowledge gaps and to find
questions that interest you.

Example:

We will work with two research question one from electronics & health and one
from sensors & ecology:

Example question 1: Phone use and sleep

You want to know how phone use before bedtime affects sleep patterns. Specifically,
you ask how the number of minutes a person uses their phone before sleep affects
the number of hours they sleep.

Example question 2: Temperature and sensors

You want to know if what type of temperature sensor can accurately detect changes
in temperature.

To translate your research question into an experimental hypothesis, you need to


define the main variables and make predictions about how they are related.

Stat 121: Engineering Data Analysis


27
Start by simply listing the independent and dependent variables.

Research question Independent variable Dependent variable

Phone use and sleep Minutes of phone use Hours of sleep per night
before sleep

Temperature and Different types of Air temperature reading


sensor temperature sensor

Then you need to think about possible confounding variables and consider how you
might control for them in your experiment.

Confounding How to control for it


variable

Phone use Natural Control statistically: measure the


and sleep variation in sleep average difference between sleep with
patterns among phone use and sleep without phone use
individuals. rather than the average amount of sleep
per treatment group.

Temperature Temperature Control experimentally: measure air


and sensor readings are temperature at the same environment
affected by
environmental
factors

Finally, put these variables together into a diagram. Use arrows to show the possible
relationships between variables and include signs to show the expected direction of
the relationships. (This is now your hypothesis).

Stat 121: Engineering Data Analysis


28
Minutes of
phone use Hours of
before sleep
sleep

?
Natural
Variation
in sleep

Figure 1. The illustration showed that we predict that increasing phone use is
negatively correlated with hours of sleep, and predict an unknown influence of
natural variation on hours of sleep.

+
Temperature Temperature
sensors readings

Environmental
Factors

Figure 2. The illustration showed that we predict a positive correlation between


temperature readings and temperature sensors and a negative correlation between
temperature readings and environmental factors.

Stat 121: Engineering Data Analysis


29
Step 2 Write your hypothesis

Now that you have a strong conceptual understanding of the system you are
studying, you should be able to write a specific, testable hypothesis that addresses
your research question.

Null (H0) hypothesis Alternate (Ha) hypothesis

Phone use and Phone use before sleep does Increasing phone use
sleep not correlate with the amount before sleep leads to a
of sleep a person gets. decrease in sleep

Temperature Temperature reading accuracy Temperature reading


and sensor does not correlate with accuracy correlate with
temperature sensor type temperature sensor type

The next steps will describe how to design a controlled experiment. In a controlled
experiment, you must be able to:

 Systematically and precisely manipulate the independent variable(s).


 Precisely measure the dependent variable(s).
 Control any potential confounding variables.

If your study system doesn’t match these criteria, there are other types of
research you can use to answer your research question.

Step 3 Design your experimental treatments

How you manipulate the independent variable can affect the experiment’s external
validity – that is, the extent to which the results can be generalized and applied to the
broader world.

First, you may need to decide how widely to vary your independent variable.

Temperature sensor experiment:

You can choose to increase air temperature:

Stat 121: Engineering Data Analysis


30
 Just slightly above the natural range for your study region.
 Over a wider range of temperatures to mimic future warming.
 Over an extreme range that is beyond any possible natural variation.

Second, you may need to choose how finely to vary your independent variable.
Sometimes this choice is made for you by your experimental system, but often you
will need to decide, and this will affect how much you can infer from your results.

Phone-use experiment:

You can choose to treat phone use as:

 A categorical variable: either as binary (yes/no) or as levels of a factor (no


phone use, low phone use, high phone use).
 A continuous variable (minutes of phone use measured every night).

Step 4 Assign your subjects to treatment groups

How you apply your experimental treatments to your test subjects is


crucial for obtaining valid and reliable results.

First, you need to consider the study size: how many individuals will be included in the
experiment? In general, the more subjects you include, the greater your
experiment’s statistical power, which determines how much confidence you can have
in your results.

Then you need to randomly assign your subjects to treatment groups. Each group
receives a different level of the treatment (e.g. no phone use, low phone use, high
phone use).

You should also include a control group, which receives no treatment. The control
group tells us what would have happened to your test subjects without any
experimental intervention.

When assigning your subjects to groups, there are two main choices you need to
make:

1. A completely randomized design vs a randomized block design.


2. An independent measures design vs a repeated measures design.

Randomization
An experiment can be completely randomized or randomized within blocks:

 In a completely randomized design, every subject is assigned to a treatment


group at random.

Stat 121: Engineering Data Analysis


31
 In a randomized block design (aka stratified random design), subjects are
first grouped according to a characteristic they share, and then randomly
assigned to treatments within those groups.

Completely randomized design Randomized block design

Phone use Subjects are all randomly Subjects are first grouped by
and sleep assigned a level of phone use age, and then phone use
using a random number treatments are randomly
generator. assigned within these
groups.

Temperature Temperature sensors are Soils are first grouped by


and sensor assigned to soil plots at random average temperature, and
by using a number generator to then treatment plots are
generate map coordinates randomly assigned within
within the study area. these groups.

Sometimes randomization isn’t practical or ethical, so researchers create partially-


random or even non-random designs. An experimental design where treatments
aren’t randomly assigned is called a quasi-experimental design.

Independent vs. repeated measures

In an independent measures design (also known as between-subjects design or


classic analysis of variance (ANOVA) design), individuals receive only one of the
possible levels of an experimental treatment.

 In medical or social research, you might also use matched pairs within your
independent measures design to make sure that each treatment group
contains the same variety of test subjects in the same proportions.

In a repeated measures design (also known as within-subjects design or repeated-


measures analysis of variance (ANOVA) design), every individual receives each of the
experimental treatments consecutively, and their responses to each treatment are
measured.

 Repeated measures can also refer to an experimental design where an effect


emerges over time, and individual responses are measured over time in order
to measure this effect as it emerges.

Stat 121: Engineering Data Analysis


32
Counterbalancing (randomizing or reversing the order of treatments among subjects)
is often used in repeated-measures design to ensure that the order of treatment
application doesn’t influence the results of the experiment.

Independent measures Repeated measures design


design

Phone use Subjects are randomly Subjects are assigned


and sleep assigned a level of phone consecutively to low, medium,
use (low, medium, or high) and high levels of phone use
and follow that level of throughout the experiment, and
phone use throughout the the order in which they follow
experiment. these treatments is randomized.

Temperature Temperature sensors are Every plot was assigned with


and sensor assigned to soil plots at different temperature sensor
random and the soils are consecutively over the course of
kept at this temperature the experiment, and the order in
throughout the experiment. which they receive these
treatments is randomized.

Experiments are always context-dependent, and a good experimental design will


take into account all of the unique considerations of your study system to produce
information that is both valid and relevant to your research question.

Summary for Chapter 2

In this chapter, designing an experiment was introduced in a step by step


process. You have learned the process of designing an experiment as well as the
considerations in designing an experiment. It is important that you can identify the
dependent variables as well as the independent variables of the datasets. Upon
identifying the variables, it is necessary to identify the research hypothesis or you
intelligent guest of the relationships of the variables. After identifying the hypothesis
you need to design the treatment and apply an appropriate treatment. A good
experiment depends on choosing a proper experimental design.

Stat 121: Engineering Data Analysis


33
Formative Assessment

Hi! I’m SAQ


(Self-Assessment Questions)!
Let’s see if you understand
our lessons.
Answer the questions below.

Write a hypothesis of each statement and identify the variables, control and
experimental group.
1. Cigarette smoking increases the risk of lung cancer.
Hypothesis: If _______________________then _______________________________
Independent Variable: _______________Dependent Variable:___________________
Control Group: _____________________Experimental Group:___________________
2. Internet traffic increases during daytime due to many users.
Hypothesis: If _____________________________then__________________________
Independent Variable: _______________Dependent Variable:___________________

Control Group: _____________________Experimental Group:___________________


3. Students who study perform better in school.
Hypothesis: If _____________________________then__________________________
Independent Variable: _______________Dependent Variable:___________________
Control Group: _____________________Experimental Group:___________________
Read the following situation and answer the questions:
An audiologist wanted to study the effect of gender on the response time to a certain
sound frequency. He suspected that men were better at detecting this type of sound
than women. He took a random sample of 20 male and 20 female subjects for this
experiment. Each subject was been given a button to press when he/she heard the
sound. The audiologist then measured the response time – the time between the
sound was emitted and the time the button was pressed.
Hypothesis: If _____________________________then__________________________
Independent Variable: _______________Dependent Variable:___________________

Stat 121: Engineering Data Analysis


34
Control Group: _____________________Experimental Group:___________________
Constants: _____________________________________________________________

References:
A Guide to experimental design. Retrieved at www.scribbr.com/methodology/
Analysis of Variance. Retrieved from
http://www.cimt.org.uk/projects/mepres/alevel/fstats_ch7.pdf
Analysis of variance. Retrieved from
https://sites.calvin.edu/scofield/courses/m143/materials/handouts/anova1And2.pdf
Gomez and Gomez. (1983). Statistical Procedures for Agricultural Research. Wiley
Inter-Science Publication.
Hypothesis testing and ANOVA. Retrieved from
https://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture7.pd
Hypothesis testing. Retrieved from
http://math.ucdenver.edu/~ssantori/MATH2830SP13/Math2830-Chapter-08.pdf
Kabir, Syed Muhammad. (2016). METHODS OF DATA COLLECTION. pp 201-275.
Retrieved from https://www.researchgate.net/publication/325846997

Point and interval estimates. Retrieved from


http://mba.teipir.gr/files/4rth_lecture.pdf
Sawyer, Steven. (2009). Analysis of Variance: The Fundamental Concepts. Journal
of Manual & Manipulative Therapy. 17. 27E-38E. 10.1179/jmt.2009.17.2.27E.
Simple regression and correlation. Retrieved from
http://pba.ucdavis.edu/files/45007.pdf
The basic concepts of probability. Retrieved from
http://www.ams.sunysb.edu/~linli/teaching/ams-310/lecture-notes-2
Two-way ANOVA. Retrieved from
https://www.statstutor.ac.uk/resources/uploaded/coventrytwowayanova.pdf

Stat 121: Engineering Data Analysis


35
Answer Key
1.
Hypothesis: If ____you smoke______then ___your risk of lung cancer increased
Independent Variable: ___smoking_____Dependent Variable:_____cancer risk__
Control Group: ___no smoking______Experimental Group:____smoking______
2.
Hypothesis: If ____many users______then ___internet traffic is high____
Independent Variable: ___number of users__Dependent Variable:_internet traffic
Control Group: ___no users______Experimental Group:_with users______
3.
Hypothesis: If _students will study______then ___they peform better in school_
Independent Variable: _students study__Dependent Variable:school performance
Control Group: ___no study______Experimental Group:_study______
4.
Hypothesis: If _male____then ___better hear the sound___
Independent Variable: _gender__Dependent Variable: ability to hear the sound
Control Group: ___none______Experimental Group:_gender______

Contants: ____Sounds, button to press____

Stat 121: Engineering Data Analysis


36

You might also like