0% found this document useful (0 votes)
88 views12 pages

Stat Features XL May 2003

This document discusses data analysis using Microsoft Excel and provides examples of analyzing data using analysis of variance (ANOVA) and descriptive statistics in Excel. It includes: 1) An overview of spreadsheets and their capabilities for calculations, statistical analysis, and data visualization. 2) Two examples of one-way ANOVA to test for differences in soil moisture levels and milk yield between groups. 3) An example of two-way ANOVA without replication to analyze soybean yield data in a randomized block design. 4) Descriptive statistics analysis of height data from 100 sorghum plants, including calculations of mean, standard deviation, range, and other metrics.

Uploaded by

nirmal kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views12 pages

Stat Features XL May 2003

This document discusses data analysis using Microsoft Excel and provides examples of analyzing data using analysis of variance (ANOVA) and descriptive statistics in Excel. It includes: 1) An overview of spreadsheets and their capabilities for calculations, statistical analysis, and data visualization. 2) Two examples of one-way ANOVA to test for differences in soil moisture levels and milk yield between groups. 3) An example of two-way ANOVA without replication to analyze soybean yield data in a randomized block design. 4) Descriptive statistics analysis of height data from 100 sorghum plants, including calculations of mean, standard deviation, range, and other metrics.

Uploaded by

nirmal kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

Data analysis using Microsoft Excel

M.N.Reddy, K.V.S.Rao and K.V.Kumar

Microsoft excel is an Electronic spreadsheet Program. These spreadsheet programs


have evolved dramatically over the past decade, and now comprise one of the most widely
used categories of software products. Among these Excel is versatile as it has more user
interface features in common with other Microsoft Office applications such as MS Word, MS
Power point MS Access etc.

Spreadsheet is software, which is a substitute to all types of applications using paper


worksheets. Spreadsheet displays data in the form of rows and columns. An intersection of
wows and columns is known as cell. Data and formulae are entered in the cell. Spreadsheet
allows to perform the following.

• All types of arithmetic calculations


• Allows to perform many types of Statistical Analysis from simple to advanced.
• Allows to represents data in various graphical forms and hold graphical objects like
pictures and images

I. EXERCISES ON ANALYSIS OF VARIANCE (ANOVA)

Single Factor: Completely Randomized Design (CRD):

The simplest of all Designs having a random arrangement is the "Completely


Randomized Design" . It assumes the complete homogeneity of the experimental
material.

Examples: Laboratory studies, Green house and Pot culture experiments, animal feeding
experiments, Soil moisture and related studies etc.

Example 1:

The percentage moisture content is determined for ten samples for each of four different
soils. Calculate the analysis of variance to obtain an estimate of the sampling variation
within a soil and hence calculate the standard error of the mean moisture content of a soil
and the standard error of a difference between two such means. Test the hypothesis that
there is no variation of soil moisture content between different soils and summarize briefly
the conclusions to be drawn from this data

(Table value of t for 36 df at 5% level of significance = 2.034, 1% level of significance =


3.599.)

Soil A Soil B Soil C Soil D


12.8 8.1 9.8 16.4
13.4 10.3 10.6 8.2
11.2 4.2 9.1 15.1
11.6 7.8 4.3 10.4
9.4 5.6 11.2 7.8
10.3 8.1 11.6 9.2
14.1 12.7 8.3 12.6
11.9 6.8 8.9 11.00
10.5 6.9 9.2 8.0
10.4 6.4 6.4 9.8

Hints:
a) Analysis of Variance
42
- Enter the data from A1 cell to D11 Cell (The first row contains headings on Soil
types)
- Click left mouse button on Tools > Data Analysis > Anova Single Factor > click
on OK
- Type in the input range box A1:D11 & Click in the Lables in First Row Box

The Dialog Box Looks like as follows:

- Click on OK button

To see the contents of the first column clearly

-Select Column A by clicking > Format >Column >Autofit selection

The hypotheses that "there is no variation of soil moisture content between different soils"
can be tested by comparing F-Value with F-Critical Value

Standard error of mean =SE(Mean) = s /√r = s /√10

Where
s = root f error mean square and r = number of replications

The estimate of standard error of a difference of tw treatment means = SE(Difference) =


√2 x SE(Mean)

Coefficient of Variation % = CV(%) = s /mean)*100

CD(5%) = (t .05 at 36 df ) x SE(difference)

Example 2

Four feeds were tried, including one control on 32 Friesion cows , 8 cows per treatment. In
order to reduce the extraneous variation the cows were selected from the same herd. The
experiment was conducted for 0 days commencing from a date when the majority of cows
had just passed peak lactation.

43
The data of the experiment is given below:

Treatment A: Control
B: Supplement of 150 gm. Of urea per cw per day
C: Supplement of 3 kg of sorghum grain per cow per day
D: Supplement of 3 kg. Of maize grain per cow per day

Variate y : milk yield in 40 days (10 kg units)

Y: Milk yield (10 kg units)


Sampl Treatment ATreatment Treatment C Treatment
e B D
1 35 40 44 54
2 44 42 45 49
3 45 28 52 48
4 37 38 53 46
5 45 36 51 44
6 55 54 44 42
7 42 44 51 52
8 39 49 38 50

Test the hypothesis H0 : No difference in milk yield and summarize e results briefly
(t value at 28 df at 5% level of significance = 2.05)

B. Two factor without replication: RANDOMIZED BLOCK DESIGN(RBD)

Example 3

The following data represent the yield of soyabean plants for five treatments grown in six
randomized complete blocks. The experiment was conducted in the greenhouse. The five
treatments are 20,40,60,80,100 ppm of nitrogen.

Yield of soybean in grams per plot is given in the following table.

Replications
Treatme R1 R2 R3 R4 R5 R6
nt
T1 8.8 12.9 11.7 31.2 22.0 9.9
T2 23.5 26.5 21.6 15.6 24.4 23.3
T3 41.2 22.5 21.8 46.3 15.6 22.6
T4 28.4 48.4 16.4 44.5 38.8 43.6
T5 67.4 33.2 59.5 49.8 57.1 36.6

Analyze the data and summarize the results.

Hints
- Enter data from A1 cell to G6 cell
- Click on Tools > Data Analysis > Anova Two-Factor without replication >Ok
- Click in lables Box
- OK

Example 4 :

An experiment was conducted to test the effect of 3 types protein supplement on average
milk yield of cows. The cows are arranged in 6 blocks , 3 per each block according to similar

44
productivity (milk yield) during pre-experimental period. The treatments are applied such that
no treament repeats in each block. The data is given below:

Average daily milk yield during the experiment (kg)


Blocks
Treatment B1 B2 B3 B4 B5 B6
A 10.4 10.5 5.9 6.7 8.0 6.7
B 12.6 12.5 11.2 8.8 9.5 12.0
C 9.5 9.7 12.6 9.1 8.7 10.5

Test the hypothesis of no differences of milk yield and summarize the results.

II. DESCRIPTIVE STATISTICS ANALYSIS

Example 5

The following are the data on total height(cm) of 100 plants of sorghum.
Calculate the following statistics.

1. Mean 2. Standard Error 3. Median 4. Mode


5. Standard Deviation 6. Variance 7. Range 8. Skewness
9. Kurtosis 10. Maximum 11. Minimum 12 Sum
13. Count 14. Largest(1) 15. Smallest
16. Confidence Limits 17. Frequency Distribution with Class Interval Ten
18. Histogram

90 109 69 100 115


76 82 80 68 69
79 84 84 108 83
45 59 60 63 79
72 68 80 81 84
70 67 100 103 69
79 78 83 92 93
77 76 88 89 94
92 91 76 79 73
89 85 93 90 79

Analysing the Data

• Enter the above data from D2 to D51 in a separate worksheet of the


same workbook as did earlier. Since this is a uni variate data, the user
has to enter this data in a single column, that is in Column A
• Enter the title “Height” in Cell D1
• On the Main Menu Click the Tool menu to get various options in the
Tool menu.
• Click the Data Analysis Option to get the different options of Analysis Tool
Pack as shown in the previous exercise.
• Click the Descriptive Statistics option from the displayed Analysis Tool Pack
Options.
• Click OK to get the Descriptive Statistics Analysis Dialog Window
• Fill the cell as shown below

45

From the above screen,

• Input Range is B1:B51


• Click on the Grouped by Columns check box
• Click on the Labels in the First Row check box
• Confidence Level For Mean : Select if user want to include a row in the
output table for the confidence level of the mean. In the box, enter the
desired value for confidence level. For example, a value of 95% calculates
the confidence level of the mean at a significance of 5%.
• Kth Largest : Select if user want to include a row in the output table for the
kth largest value for each range of data. In the box, enter the number to use
for k. If k is 1, this row contains the maximum of the data set.
• Kth Smallest : Select if user want to include a row in the output table for the
kth smallest value for each range of data. In the box, enter the number to use
for k. If k is 1, this row contains the minimum of the data set
• Summary Statistics : Select if user want Microsoft Excel to produce one field
for each of the following statistics in the output table: Mean, Standard Error
(of the mean), Median, Mode, Standard Deviation, Variance, Kurtosis,
Skewness, Range, Minimum, Maximum, Sum, Count, Largest, Smallest and
Confidence Level.
• Output Range is from D2
• Click OK to get the output on the screen

Example 6

Frequency Distribution and Histogram

Calculates how often values occur within a range of values, and then returns a vertical array
of numbers. For example, use FREQUENCY to count the number of plants that fall within
ranges of fixed height . Because FREQUENCY returns an array (Bin Range), it must be
entered as an array formula

46
Frequency Distribution and Histogram for Crop Yield

1. Copy the Entire Data in DATA Worksheet to New Worksheet as did


earlier and rename the new worksheet as Frequency
2. From the Summary Worksheet observe that the Minimum and
Maximum Values for Crop Yield is 45 and 115 respectively.
3. Position the mouse pointer in the cell D1 and type the heading as
Frequency Distribution for Crop Yield
4. Position the mouse pointer in Cell D2 and Type as BIN RANGE
5. Type value 40 in Cell D3 and type value 50 in Cell D4
6. Block the cells D3 and D4
7. Drag this blocked area up to Cell D10 to fill the values these cells
automatically with an incremental values of 10
8. Select DATA ANALYSIS Option from Tool Menu
9. Select HISTOGRAM Option from the displayed DATA ANALYSIS
Dialogue Window
10. Select the INPUT RANGE as B1:B51
11. Select the BIN RANGE D2:D11
12. Type the cell address F2 as the OUTPUT option
13. Click on CHART OUTPUT option to get Histogram for the selected
parameter.
14. Click OK

47
III. REGRESSION ANALYSIS

Performs linear regression analysis by using the "least squares" method to fit a line
through a set of observations. By using this tool to analyze how a single dependent
variable is affected by the values of one or more independent variables .

Example 7

The following table gives for 25 progenies of cotton the data for mean fiber length of each
progeny , the corresponding parent plant value and the mean value of the plot in which
the parent was grown. It is found that both the parental value as well as the plot mean
bear some relationship with the progeny mean. Express this relation in the form of a
partial regression equation with progeny mean as the dependent variate.

Number of Progeny Parental Parental


Progenies Mean Plant Plot
(mm) Value Mean
Y X1 X2
1 24.3 26 25.5
2 24.48 28.8 25.5
3 23.41 25.2 25.5
4 21.6 23.4 25
5 22.49 26.6 25
6 23.62 25.4 24.6
7 22.75 23.4 24.6
8 24.4 27.6 23.6
9 22.6 24.4 23.6
10 25.36 24 24.42
11 23.21 24.2 24.42
12 24.76 26 24.42
13 21.53 22.8 22.56
14 21.32 20.8 22.56
15 22.81 24.8 22.56
16 25.41 26.2 24.9
17 24.3 27.2 24.9
18 23.65 26.6 24.91
19 24.31 25 24.91
20 21.88 23.4 24.05

48
Number of Progeny Parental Parental
Progenies Mean Plant Plot
(mm) Value Mean
Y X1 X2
21 24.1 25.6 24.05
22 21.91 23 24.05
23 22.24 25.4 24.57
24 23.45 23.4 24.57
25 22.1 24.2 24.57

Analysing the Data

• Enter the above data in a separate worksheet of the same workbook as did
earlier.
• On the Main Menu Click the Tool menu to get various options in the Tool menu.
• Click the Data Analysis Option to get the different options of Analysis Tool
Pack as shown in the previous exercise.
• Click the Regression option from the displayed Analysis Tool Pack Options.
• Click OK to get the Regression Analysis Tool Window as shown below

• Input Y Range : Enter the range of dependent variable‘s data, that is, B2 :B26
The range must consist of a single column.
• Input X Range : Enter the range of independent variables data, that is, C2:D26
The maximum number of input ranges is upto 16 variables.
• Click in the Labels check box to select the first row or column of input range
contains labels.
• Click in the Confidence Level check box to include an additional level in the
summary output table. In the box, enter the desired confidence level in addition
to the default 95% level.
• Click in the Standardised Residuals check box to include standardised
residuals in the residuals output table.
.

49
Examine the standard Residuals and delete the observations having standard residuals greater
than 2. Compare the results particularly R Square values F Statistic before and after removing
out layers and draw your conclusions

Example 8

The following table contains data on a study of the influence of good grazing of events
prior to lambing on the birth weight of male lambs.

S.No Birth weight of No. of days of


lamb(kg) good grazing
y X
1 3.8 1
2 4.0 10
3 4.3 16
4 4.2 22
5 4.4 24
6 4.4 26
7 4.0 29
8 4.0 34
9 4.1 36
10 3.5 38
11 3.3 41
12 4.0 47
13 4.2 50
14 3.0 54
15 2.8 62

Fit suitable relation(Quadratic) between y and x and estimate the optimum number of
days to get maximum birth weight of lamb.

Hints

1. Data entry

• Enter the values of S.No , Y and X from A! to C 16


• Type X2 in the cell D1 and enter the formula = c2^2 in cell D2
Copy this formula up to D16
• Convert the formula in the column D in to Values
• Select D2 to D16
• Copy >Edit Paste Special > Values
• Data Analysis > Regression
• Fill the Dialog Window as follows

50

• Examine the standardized residuals , delete out layers and arrive at the best
fitted equation

t-TEST

Example: 9

The following table gives the rainfall at two places A and B near each other in the same
rainfall tract for 24 seasons. Assuming that the 24 seasons constitute a representative
sample of the rainfall at the places, can we consider the two places to have the same
mean annual rainfall.

51
Year Rainfall in inches
Place A Place B
1960 39.59 39.48
1961 19.93 17.81
1962 23.91 24.47
1963 29.38 24.32
1964 43.09 41.18
1965 25.34 23.41
1966 49.35 45.13
1967 39.62 42.83
1968 42.9 46.94
1969 53.35 51.51
1970 57.66 57.50
1971 37.05 34.35
1972 34.14 34.29
1973 38.01 38.65
1974 52.40 50.32
1975 32.2 29.94
1976 47.81 45.24
1977 33.98 34.13
1978 39.46 40.68
1979 37.78 35.54
1980 63.24 57.24
1981 39.04 42.05
1982 60.51 55.33
1983 38.08 37.45

Analyzing the data

For analysing the above data , first enter the data in a separate worksheet of the
same workbook and enter the data as did in the earlier exercise.

• On the main menu Click Tools option


• Click Data Analysis option from the displayed window
• From the displayed Analysis Tool Pack window, Click t-Test :Two sample
assuming equal variances to get the following screen
• Fill the Dialog box as shown below

52
Do the analysis by applying
1. t-Test by assuming equal variance and
2. t-Test paired two sample for means
Compare the results interpret and justify the results.

53

You might also like