0% found this document useful (0 votes)
11 views13 pages

Engineering Data Analysis Reviewer

The document provides an overview of Engineering Data Analysis, detailing methods and techniques for analyzing data, including statistical analysis, diagnostic analysis, and predictive analysis. It covers types of data, experimental design, sampling methods, and the importance of statistics in research. Additionally, it discusses graphical data representation, measures of location and variability, and the concept of probability.

Uploaded by

Justine Nicole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views13 pages

Engineering Data Analysis Reviewer

The document provides an overview of Engineering Data Analysis, detailing methods and techniques for analyzing data, including statistical analysis, diagnostic analysis, and predictive analysis. It covers types of data, experimental design, sampling methods, and the importance of statistics in research. Additionally, it discusses graphical data representation, measures of location and variability, and the concept of probability.

Uploaded by

Justine Nicole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

lOMoARcPSD|13114256

EDA Reviewer - just some notes

Engineering Data Analysis (Cavite State University)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Justine Nicole B. Bustillos ([email protected])
lOMoARcPSD|13114256

ENGINEERING DATA ANALYSIS

CHAPTER 1: INTRODUCTION TO ENGINEERING


DATA ANALYSIS
What is Data Analysis?
● Turning raw data into useful information
● Purpose is to provide answers to questions
being asked at a research questions
● Even the greatest amount and best quality Engineering and Research Process
data mean nothing if not properly analyzed—
or if not analyzed at all
Types of Data Analysis: Techniques and Methods
Major Data Analysis Methods
● Statistical Analysis - shows “what happen?" by
using past data
● Diagnostic Analysis - shows “why did it
happen?" by finding the cause of the event
● Predictive Analysis - shows "what is likely to
happen" by using previous data
Two Broad Categories of Statistical Analysis
Descriptive Statistics - branch of statistics that involves
Some Applications of Data Analysis
organizing, displaying, and describing data
Weather and climate change - How to protect small
Inferential Statistics - branch of statistics that involves
farmers against droughts?
drawing conclusions about a population based on
Plant and Animal research - How to recognize animal
information contained in a sample taken from that
diseases
population
Soil - How to classify and profile soil?
Types of Data
Land - How to classify land use and land cover
Qualitative data - measurements for which there is no
changes, as well as map crops?
natural numerical scale, but which consist of attributes,
Applications of Data Analysis in Agriculture
labels, or other non-numerical characteristics.
Biodiversity - How to evaluate fish and wildlife
e.g. low=1, med=2, high=3 (still qualitative)
population viability under land management
e.g. ID (34B, 67AA, 19G, …)
alternatives?
e.g. Education level (HS, 2-yr, 4-yr, MS, PhD)
Remote sensing - How to enhance agricultural
Quantitative data - numerical measurements that arise
monitoring and crop production estimations using
from a natural numerical scale.
satellite observations?
e.g. weight
e.g. number of students in a class
Introduction to Basic Statistic
Research
● systematic study or investigation of something
for the purpose of answering questions in a
scientific manner.
● Finding solution to a gap
Statistics - scientific methods of collecting, organizing,
summarizing, presenting and analyzing data, as well
as drawing conclusions and making reasonable
decisions on the basis of such data.
Role of Statistics in Research: The role of statistics in
research is to function as a tool in designing research,
analyzing its data, and drawing conclusions therefrom.
In scientific research, statistics deals with:
● Designing experiments and surveys
Basic Terms ● Collecting and summarizing data
Population - any specific collection of objects of ● Describing the data and the variables
interest ● Estimating population parameters
Sample - any subset or subcollection of the population ● Testing the hypotheses about the population
Measurement - a number or attribute computed for ● Studying relationships among variables
each member of a population or of a sample Experimental Design
Sample data - the measurements of sample elements ● the set of rules, plans and course of action
Parameters - a number that summarizes some aspect taken in the conduct of an experiment.
of the population as a whole ● Design of an experiment is needed in order to:
Ex. Population of interest: Adults in the Philippines -Ensure cost effective collection of appropriate
Parameter: % that are married, average age, how data
many are senior citizen -Provide an appropriate and valid analysis of
Statistic - a number computed from the sample data data
description of a sample of the population -Provide reliable conclusions leading to
reliable inferences.

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Split Plot Design Sampling unit - is a portion of the experimental unit on


which the response variable is observed and
measured.
Examples: A tiller in a hill, a plant in a pot, a seedling
in a seedbed, a tree in a group of trees, a branch of a
tree, a farmer in a barangay, etc.
Experimental error - is the measure of variations
among experimental units treated alike.
Some sources of experimental error:
● Inherent variability of the experimental error
(e.g. variability in the field, different starting
weight of animals, etc)
Statistical Investigation - study of a population using ● Errors in experimentation due to: - lack of
information from a sample. Any such investigation uniformity in the conduct of the experiment -
involves the following steps: failure to standardize the experimental
technique
● Errors in observation and measurements
Sampling error - is the measure of variations among
sampling units within an experimental unit.
● Sampling errors are statistical errors that arise
when a sample does not represent the whole
population.
● Sampling errors occur when numerical
Definition of Common Terms parameters of an entire population are derived
Treatments - are procedures or conditions whose from a sample of the entire population. Since
effects are to be measured and compared. the whole population is not included in the
Examples: Kinds of fertilizer, rate of irrigation, age of sample, the parameters derived from the
harvest, different locations, food formulations, sample differ from those of the actual
marketing strategies, teaching methods, drying population.
temperature, speed of rotation, feeding rate, slope. Precision - denotes repeatability of measurements. It is
Experimental Unit - is a group of experimental measured by the variance. Some ways of increasing
materials or individuals to which a single treatment is precision are:
applied once. ● Increase the number of samples or
Examples: A plot of land, a plant, a portion of a leaf, a replications
class of students, a marketing day, a cage of birds, etc. ● Skillful grouping of experimental materials
● Proper selection of treatments
Accuracy - denotes unbiasedness or the closeness of
the average values of the measurements to the true
value.

Response variable - is a characteristic used to


measure the treatment effects.
Examples: Yield of crop, height of plants, degree of
infestation, biomass production, gain in weight, volume
of sales, etc..
Dependent versus Independent Variables
- The variable of primary interest in an investigation is
the dependent variable.
- Other variables which are believed to affect the
measurements obtained on the dependent variable are
called independent variables.
- In this context we say that the dependent variable is
determined or influenced by the independent variable.
Parameter - an attribute that describes some property Layout – refers to the final arrangement of treatments
of a sample. Statistics may be used to make over the whole experimental area.
inferences about the parameters of the population from
which the sample is taken.
Example:
Area of Interest: CvSU Students
Parameter: Number of Engineering Students, Male/
Female/ LBGTQ++, Height, Weight. GPA

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Replication - is the repetition of the application of Sampling Methods:


treatments on a number of experimental units. Simple Random Sampling (SRS) - sample of same
Functions of replication: size is equally likely to be chosen
● To provide an estimate of the experimental Stratified Random Sampling - separates population
error into strata
● To increase the precision of estimates Cluster Sampling - simple random sample of groups
● To increase the scope of the experiment
Randomization - is the allocation of treatments to the
experimental units by means of a chance device such
that every treatment has an equal chance of being
assigned to any experimental unit. Functions of
randomization:
● Provide a random sample of observation
● To satisfy the assumption of independence of
observation Common Errors in Data Acquisition
● To eliminate systematic bias in assigning the ● incorrect measurements being taken because
treatments of faulty equipment
Simple Random Sample - sample that is drawn from a ● mistakes made during transcription from
population in such a way that at each stage of the primary sources
sampling, each remaining element in the population ● inaccurate recording of data due to
frame has an equal chance of being chosen. misinterpretation of terms
Local control or error control - is any process or ● inaccurate responses to questions concerning
technique used to minimize the experimental error. sensitive issues

CHAPTER 2A: OBTAINING AND ORGANIZATION Graphical Organization & Summarization of Data
OF DATA Frequency Table/Distribution - a systematic
Data arrangement of values grouped into class intervals.
Statistics is a tool for converting data into information. Frequency tables are used to summarize data so that
the frequency of each interval is clearly displayed and
the relative frequency of each interval can be easily
computed.
Class Interval - range of numbers defined arbitrarily by
the highest and lowest numbers in the class.
Frequency - the number of times a particular value or
phenomenon occurs
Midpoint - average of the upper and lower boundary of
a class
Relative Frequency - the proportion of all given values
that fall within the interval. Usually expressed in
percent.
Cumulative frequency - is the sum of the frequency for
that class and all the previous classes

Obtaining Data: Techniques and Methods


Methods of Collecting Data
● Direct Observations
● Experiments
● Surveys
Sampling - process in which a predetermined number Constructing a Frequency Distribution
of observations are taken from a larger population The following data represents the ages of 30 students
in a statistics class. Construct a frequency distribution
that has five classes.

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

1. The number of classes (5) is stated in the Histogram


problem. ● a graphical representation of a frequency
2. The minimum data entry is 18 and maximum table; it displays quantitative data.
entry is 54, so the range is 36. Divide the ● The class intervals are marked off on the
range by the number of classes to find the horizontal axis; frequencies or relative
class width frequencies are marked off on the vertical axis.

3. The minimum data entry of 18 may be used for


the lower limit of the first class. To find the
lower class limits of the remaining classes,
add the width (8) to each lower limit. The lower Frequency Polygon
class limits are 18, 26, 34, 42, and 50 The ● the geometric shape obtained by connecting
upper class limits are 25, 33, 41, 49, and 57 with a straight line the midpoints of adjacent
4. Make a tally mark for each data entry in the class intervals of a histogram.
appropriate class. ● The relevance of presentation of data in the
5. The number of tally marks for a class is the pictorial or graphical form is immense.
frequency for that class. ● Frequency polygons give an idea about the
shape of the data and the trends that a
particular data set follows.
● This can be very useful in comparing different
sets of data by superimposing one on the
other.

Bar Graph Line Graph


● a graphical representation of a frequency table ● graphical display of information that changes
for qualitative data. continuously over time.
● On one axis of the graph frequencies of the ● visual comparison of how two
relative frequencies are represented. variables—shown on the x- and y axes—are
● The various classes of data are labeled on the related or vary with each other
other axis.

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Measures of Location (Mean, Median and Mode)


Measure of location - a number that represents the
central or most representative measurement in a set.
Mean (𝜇) - the arithmetic average of a set of
measurements.

Curve Smoothing - process of smoothing the corners


of a frequency polygon so that we obtain a smooth
curve, suggesting the basic shape of the distribution of
Median (Md) - the middle number in an ordered set of
numbers.
measurements.
The aim of smoothing is to give a general idea of
relatively slow changes of value with little attention
paid to the close matching of data values, while curve
fitting concentrates on achieving as close a match as
possible.

Mode (Mo) - number that occurs most frequently in a


set of measurements. It is possible for a set of
measurements to have more than one mode.

Outlier - is a value that is very different from the other


data in your data set. This can skew your results.

Scatter Plot
● a graphic display of data points in a
two-dimensional plane.
● Each data point represents a single unit of Measures of Variability
observation on which two measurements, X This is a single number that represents the spread or
and Y, have been made. amount of dispersion in a set of data.
● The values of each of the measurements are Range - measures the total spread of a set of data and
scaled on the X and Y axes, respectively. is computed from only two numbers.
● Each data point is located in the plane at the Range = largest measurement - smallest
intersection of its associated X and Y values. measurement

Variance
● Variance is a numerical value that describes
the variability of observations from its
arithmetic mean.
● Variance measures how far the outcome
varies from the mean
● The variance equals the average of the sum of
all the squared deviations of the population.

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

● A deviation is the distance from any single


measurement of a set to the mean of that set.
● It indicates how far are the individuals or the
observations in a group that are spread out.
● Statisticians use variance to see how
individual numbers relate to each other within
a data set, rather than using broader
mathematical techniques such as arranging
numbers into quartiles.
● The advantage of variance is that it treats all
deviations from the mean as the same
regardless of their direction.

Standard Deviation, Sd
● the square root of the variance.
● This measurement is very useful for describing Coefficient of Variation (CV)
the spread or dispersion of a set of data ● This indicates the degree of precision with
around the mean. which the treatments are compared and is a
● Measures how far the normal standard good index of the reliability of the experiment.
deviation is from the expected value. ● It expresses the experimental error as
● Indicates how much observations or the percentage of the mean, thus the higher the
individuals of a data set which differs from the CV values, the lower is the reliability of the
mean. experiment.
● Basically CV<10 is very good, 10-20 is good,
20-30 is acceptable, and CV>30 is not
acceptable.
● For field experiments CV of 30% is tolerable
and for laboratory/ clinical experiments 5% is
the limit.
● Acceptable CV depends on the different
factors: experimental designs, number of
replications and size, experimental materials,
parameters, etc.
Example:

CHAPTER 2B: PROBABILITY

Probability
● Study of random or nondeterministic
experiments
● We often frame probability in terms of a
random process giving rise to an outcome
● Probability is defined as a proportion, and it
always takes values between 0 and 1 (or 0%
and 100%, in percentage)

We use probability to build tools to describe and


understand apparent randomness

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Random Experiments Sample space with large or infinite number of sample


points are best described by a statement or rule

Event - An event is a subset of a sample space


S = { 1, 2, 3, 4, 5 }
● Event A = { 1,3,5 }
● Event B = { 2,4 }
● Event C = { 5 }
Simple Event - Outcome from a Sample Space with
Characteristic
e.g. A Red Card from a deck of cards.
Joint or Compound Event - Involves 2 Outcomes
simultaneously
e.g. An Ace and a Red Card from a deck of cards.
Introductory Examples: Rolling a die An Ace or a Red Card from a deck of cards.
● What is the chance of getting 1 when rolling a
die? Simple Events
● What is the chance of getting a 1 or 2?
● What is the chance of getting either 1, 2, 3, 4,
5, or 6?
● What is the chance of not getting a 2?
1. P = 1/6 = 𝟎. 𝟏𝟔𝟔𝟕 = 𝟏𝟔. 𝟔𝟕 %
2. P = 2/6 = 𝟎. 𝟑𝟑𝟑𝟑 = 𝟑𝟑. 𝟑𝟑 %
3. P = 6/6 = 𝟏 = 𝟏𝟎𝟎 %
4. P = 5/6 = 𝟎. 𝟖𝟑𝟑𝟑 = 𝟖𝟑. 𝟑𝟑 % Joint Events
Sample Space - set of all possible outcomes of an
experiment
Event - a collection or set of one or more simple
events in a sample space

Compound Events

Special Event

Suppose that 3 items are selected at random from a


manufacturing process. Each item is inspected and
classified defective D, or nondefective N. What is the
sample space? Dependent or Independent Events

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Vizualizing Events

Counting Sample Points

Operations with Events

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Answer: Permutation because the order of pin number


is important in unlocking a cell phone.

CHAPTER 2B.1: PROBABILITY (Continuation)

Computing Probability

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Joint Probability Using Contingency Table

Joint Probability

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Conditional Probability

Compound Probability

Multiplicative Rule

Downloaded by Justine Nicole B. Bustillos ([email protected])


lOMoARcPSD|13114256

Downloaded by Justine Nicole B. Bustillos ([email protected])

You might also like