unit-1(DAE)
unit-1(DAE)
2. Experimental Strategies
a) Completely Randomized Design (CRD): In this strategy, treatments are
assigned randomly to the experimental units without any restriction. This is
suitable when the treatments are homogeneous and the experimental units are
homogeneous as well.
Dr. A. Varun, [email protected], 9494226711
Department of Mechanical Engineering, B V Raju Institute of Technology, Narsapur 1
Design and Analysis of Experiments UNIT – 1, Experimental Design Fundamental
4. Terminologies
a) Experimental unit: The experimental unit is the smallest entity to which a
treatment can be applied and from which data can be collected. It can be an
individual, a group, an object, or any other defined entity.
b) Treatment: A treatment refers to a specific condition or intervention being applied
to the experimental units. It represents a level or value of a factor being studied.
c) Factor: A factor is the variable or attribute being manipulated in the experiment. It
represents a potential source of variation that may affect the response variable.
d) Level: A level is a specific value or setting of a factor. Factors can have multiple
levels, and each level represents a unique condition or value of the factor.
e) Response variable: The response variable is the outcome or variable of interest
being measured in the experiment. It represents the effect or response being studied.
f) Control group: The control group is a group that does not receive the treatment or
intervention being studied. It serves as a baseline for comparison and helps assess
the effects of the treatment.
5. ANOVA
a) Analysis of Variance (ANOVA) is a statistical technique used to analyze the
differences between group means and assess the significance of these differences.
b) ANOVA determines the variation in the response variable that can be attributed to
different factors or treatments. It decomposes the total variation into components
associated with factors, error, and residual variation.
c) ANOVA helps identify significant factors and understand their impact on the
response variable. It provides statistical evidence to support conclusions about
treatment effects.
d) ANOVA can be extended to more complex designs, such as factorial designs, where
multiple factors and their interactions are considered.
ANOVA, which stands for Analysis of Variance, is a statistical technique used to
analyze the differences between group means and assess the significance of these
differences. ANOVA is particularly useful when comparing means across three or more
groups or treatments.
Here are the key aspects and principles of ANOVA:
a) Variation and partitioning: ANOVA involves partitioning the total variation
observed in the data into different components associated with the treatment effects,
error, and residual variation. This partitioning allows for a quantitative assessment
of the sources of variation and their contributions to the overall variability in the
data.
b) Null hypothesis and alternative hypothesis: In ANOVA, the null hypothesis
assumes that there is no significant difference between the means of the groups or
treatments being compared. The alternative hypothesis, on the other hand, suggests
that at least one of the means is significantly different from the others.
c) F-statistic: ANOVA uses the F-statistic to test the null hypothesis. The F-statistic
compares the variation between the group means (explained variation) with the
variation within the groups (unexplained variation). If the observed difference
between the group means is significantly larger than the expected variation within
the groups, the null hypothesis is rejected.
d) Sum of Squares: ANOVA calculates the Sum of Squares to quantify the variation
in the data. The Total Sum of Squares (SST) represents the total variation in the
data, the Treatment Sum of Squares (SSTreatment) represents the variation between
the group means, and the Error Sum of Squares (SSError) represents the
unexplained variation within the groups.
e) Degrees of Freedom: Degrees of Freedom (df) indicate the number of independent
observations available for estimating the variation. In ANOVA, there are two types
of degrees of freedom: the degrees of freedom associated with the treatment
(dfTreatment) and the degrees of freedom associated with the error (dfError).
f) Mean Squares: Mean Squares are obtained by dividing the Sum of Squares by the
corresponding degrees of freedom. Mean Square Treatment (MSTreatment)
Dr. A. Varun, [email protected], 9494226711
Department of Mechanical Engineering, B V Raju Institute of Technology, Narsapur 3
Design and Analysis of Experiments UNIT – 1, Experimental Design Fundamental
represents the average variation between the group means, and Mean Square Error
(MSError) represents the average unexplained variation within the groups.
g) F-distribution and p-value: The F-statistic follows the F-distribution under the
assumption of the null hypothesis. By comparing the observed F-statistic with the
critical value from the F-distribution, researchers can determine the statistical
significance of the results. The p-value is then calculated to quantify the probability
of observing the results under the null hypothesis.
h) Multiple comparisons: When ANOVA reveals a significant difference among the
group means, additional post hoc tests, such as Tukey's test or Bonferroni
correction, can be performed to identify which specific groups differ significantly
from each other.
i) Assumptions: ANOVA assumes that the data within each group or treatment are
independent and identically distributed, and that the residuals (unexplained
variation) are normally distributed with constant variance. Violations of these
assumptions may affect the validity of the ANOVA results.
ANOVA is a powerful tool for analyzing the differences between multiple groups or
treatments and determining whether these differences are statistically significant. It
provides a structured approach to comparing means and helps researchers draw
meaningful conclusions from their data.
6. Steps in Experimentation
a) Formulate research question and objectives: Clearly define the research question
and the specific objectives of the experiment.
b) Design the experiment: Determine the factors to be studied, their levels, and the
appropriate experimental strategy. Decide on the number of replications and the
randomization scheme.
c) Randomly assign treatments: Randomly assign treatments to the experimental units
according to the chosen experimental design.
d) Collect data: Measure the response variable for each experimental unit and record
the data.
e) Analyze the data: Apply appropriate statistical methods, such as ANOVA or
regression analysis, to analyze the data and test hypotheses.
f) Draw conclusions and make inferences: Interpret the results of the analysis in the
context of the research question and objectives. Draw conclusions and make
inferences about the effects of the factors on the response variable.
g) Communicate findings: Present the findings of the experiment through written
reports, presentations, or other appropriate means. Clearly communicate the results,
conclusions, and implications of the study.
7. Sample Size
a) Sample size refers to the number of experimental units or observations included
in the study. It is crucial for obtaining reliable and statistically valid results.
b) Determining an appropriate sample size depends on various factors, including
the desired level of precision, the expected effect size, the variability in the data,
and the desired statistical power.
Dr. A. Varun, [email protected], 9494226711
Department of Mechanical Engineering, B V Raju Institute of Technology, Narsapur 4
Design and Analysis of Experiments UNIT – 1, Experimental Design Fundamental
c) A larger sample size generally leads to more precise estimates and higher
statistical power. However, larger sample sizes may also require more resources
and increase the cost and complexity of the study.
Sample size refers to the number of individuals or observations included in a study or
experiment. Determining an appropriate sample size is crucial for obtaining reliable and
statistically valid results. The sample size should be carefully chosen to ensure that the
study has adequate statistical power to detect meaningful effects and provide precise
estimates. Here are some key considerations when determining sample size:
a) Desired level of precision: The sample size should be sufficient to provide a
desired level of precision in estimating population parameters. A larger sample
size generally leads to more precise estimates with narrower confidence
intervals.
b) Expected effect size: The expected effect size refers to the magnitude of the
difference or relationship that the study aims to detect. A larger effect size
typically requires a smaller sample size to detect it with sufficient power.
c) Variability of the data: The variability or dispersion of the data also affects the
required sample size. Higher variability generally requires a larger sample size
to achieve a desired level of precision.
d) Statistical power: Statistical power is the probability of correctly rejecting the
null hypothesis when it is false. A higher sample size increases the statistical
power, allowing for a better chance of detecting true effects. Researchers often
aim for a power of 80% or higher.
e) Significance level: The significance level (alpha) is the probability of
incorrectly rejecting the null hypothesis when it is true. Commonly used values
for alpha are 0.05 (5%) or 0.01 (1%). The sample size calculation should
consider the desired significance level.
f) Study design and analysis methods: The sample size calculation may depend on
the study design and the analysis methods employed. Different study designs,
such as cross-sectional studies, case-control studies, or randomized controlled
trials, may require different sample size considerations.
g) Resources and feasibility: Practical considerations, such as available resources,
time constraints, and feasibility, may influence the determination of sample
size. It is important to strike a balance between obtaining a sample size that is
statistically valid and feasible within the limitations of the study.
h) Population characteristics: The characteristics of the target population may also
influence the sample size calculation. For example, if the population is highly
heterogeneous, a larger sample size may be needed to capture this variability.
i) Sampling technique: The sampling technique used may affect the required
sample size. If the sampling technique is stratified or clustered, adjustments may
be needed in the sample size calculation to account for the design effect.
j) Consultation with a statistician: It is often beneficial to consult with a statistician
during the planning stage to determine an appropriate sample size. Statisticians
can help perform power calculations or provide guidance based on the specific
study objectives and design.
The basic idea behind linear regression is to find the best-fitting line (or hyperplane)
that minimizes the difference between the predicted values from the model and the
actual observed values. This line is represented by a linear equation of the form:
where:
Dr. A. Varun, [email protected], 9494226711
Department of Mechanical Engineering, B V Raju Institute of Technology, Narsapur 7
Design and Analysis of Experiments UNIT – 1, Experimental Design Fundamental
The goal of linear regression is to estimate the values of the coefficients (b0, b1, ..., bn)
that minimize the sum of squared differences between the predicted values and the
actual observed values. This is typically done using a method called ordinary least
squares (OLS).
Once the model is trained and the coefficients are estimated, you can use it to make
predictions on new data by plugging in the values of the independent variables into the
equation.
1. Linearity: The relationship between the dependent variable and the independent
variables is linear.
2. Independence: The observations are independent of each other.
3. Homoscedasticity: The variability of the errors is constant across all levels of the
independent variables.
4. Normality: The errors are normally distributed.
5. No multicollinearity: The independent variables are not highly correlated with each
other.
Linear regression is widely used for tasks such as predicting house prices, analyzing
the impact of advertising on sales, and studying the relationship between variables in
scientific research. It serves as a fundamental building block for more complex
regression models and machine learning algorithms.