0% found this document useful (0 votes)
8 views21 pages

Stats-with-R-project

The document presents an analysis of agricultural crops in India, focusing on various statistical representations such as boxplots, pie charts, and scatter plots for crops like rice, wheat, barley, and cotton. It includes data summaries and visualizations that highlight production areas, yields, and correlations among different crops across various states. The analysis emphasizes the significant concentration of rice production and the geographical factors influencing crop yields in different regions.

Uploaded by

Vedant Baiswar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

Stats-with-R-project

The document presents an analysis of agricultural crops in India, focusing on various statistical representations such as boxplots, pie charts, and scatter plots for crops like rice, wheat, barley, and cotton. It includes data summaries and visualizations that highlight production areas, yields, and correlations among different crops across various states. The analysis emphasizes the significant concentration of rice production and the geographical factors influencing crop yields in different regions.

Uploaded by

Vedant Baiswar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Analysis of Agricultural Crops in India

Vedant,Dev and Sanidhya

Acknowledgements

We wish to express our gracious gratitude to Rajni Ma’am and Uday sir, our project guide, and our Principal,
Dr. John Varghese, for their guidance and support in completing our project on “Analysis of Agricultural
Crops in India”. Their cooperation and contributions were crucial to the project’s success.

Contents
• Boxplot for rice production
• Pie chart for depicting Rice area
• Scatter plot on barley yield
• Sugarcane production line graph
• Stacked bar chart for cotton yield
• Scatter plot on Wheat Yield
• Scatter plot on Correlationship between Rice Production and Area
• Barplot on oilseed production in Rajasthan
• Line graph for area under cotton
• Boxplot on Maize Yield
• Analysing relations between different crop yields using pairs plot
• Maize Production pie chart
• Oilseeds boxplot(facet wrapped by states)
• Area under production for wheat
• Histogram and frequency polygon

data

library(dplyr)

##
## Attaching package: ’dplyr’

## The following objects are masked from ’package:stats’:


##
## filter, lag

## The following objects are masked from ’package:base’:


##
## intersect, setdiff, setequal, union

library(ggplot2)
library(readxl)
library(GGally)

## Registered S3 method overwritten by ’GGally’:


## method from
## +.gg ggplot2

1
getwd()

## [1] "/Users/vdb/Downloads"

C=(read_xlsx("ICRISAT-District Level Data (1).xlsx"))


#read.csv("ICRISAT-District Level Data (1).csv")
B=as.data.frame(C)

Dev

summary(B)

## Dist Code Year State Code State Name


## Min. : 7.0 Min. :2014 Min. : 2.000 Length:640
## 1st Qu.:123.8 1st Qu.:2015 1st Qu.: 6.000 Class :character
## Median :163.5 Median :2016 Median :10.000 Mode :character
## Mean :252.4 Mean :2016 Mean : 8.331
## 3rd Qu.:222.2 3rd Qu.:2016 3rd Qu.:12.000
## Max. :912.0 Max. :2017 Max. :13.000
## Dist Name RICE AREA (1000 ha) RICE PRODUCTION (1000 tons)
## Length:640 Min. : 0.0 Min. : 0.000
## Class :character 1st Qu.: 4.0 1st Qu.: 5.322
## Mode :character Median : 68.7 Median : 146.310
## Mean : 119.2 Mean : 302.700
## 3rd Qu.: 178.6 3rd Qu.: 437.375
## Max. :1154.2 Max. :3215.010
## RICE YIELD (Kg per ha) WHEAT AREA (1000 ha) WHEAT PRODUCTION (1000 tons)
## Min. : 0 Min. : 0.0 Min. : 0.0
## 1st Qu.:1268 1st Qu.: 75.4 1st Qu.: 180.5
## Median :2195 Median :149.1 Median : 425.5
## Mean :1919 Mean :156.6 Mean : 503.4
## 3rd Qu.:2626 3rd Qu.:216.7 3rd Qu.: 692.7
## Max. :5160 Max. :879.5 Max. :4169.4
## WHEAT YIELD (Kg per ha) MAIZE AREA (1000 ha) MAIZE PRODUCTION (1000 tons)
## Min. : 0 Min. : 0.000 Min. : 0.00
## 1st Qu.:2352 1st Qu.: 0.500 1st Qu.: 0.93
## Median :2975 Median : 6.825 Median : 12.00
## Mean :2961 Mean : 25.850 Mean : 69.30
## 3rd Qu.:3578 3rd Qu.: 33.233 3rd Qu.: 71.91
## Max. :5166 Max. :267.890 Max. :1510.95
## MAIZE YIELD (Kg per ha) BARLEY AREA (1000 ha) BARLEY PRODUCTION (1000 tons)
## Min. : 0 Min. : 0.000 Min. : 0.000
## 1st Qu.: 1381 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 1920 Median : 0.590 Median : 1.080
## Mean : 2058 Mean : 3.558 Mean : 10.087
## 3rd Qu.: 2514 3rd Qu.: 3.775 3rd Qu.: 8.707
## Max. :21429 Max. :101.510 Max. :328.710
## BARLEY YIELD (Kg per ha) SUNFLOWER AREA (1000 ha)
## Min. : 0 Min. :0.0000

2
## 1st Qu.: 0 1st Qu.:0.0000
## Median :1836 Median :0.0000
## Mean :1747 Mean :0.1375
## 3rd Qu.:2823 3rd Qu.:0.0100
## Max. :5000 Max. :5.2700
## SUNFLOWER PRODUCTION (1000 tons) SUNFLOWER YIELD (Kg per ha)
## Min. :0.0000 Min. : 0.0
## 1st Qu.:0.0000 1st Qu.: 0.0
## Median :0.0000 Median : 0.0
## Mean :0.1882 Mean : 433.2
## 3rd Qu.:0.0200 3rd Qu.:1103.9
## Max. :6.4500 Max. :2142.9
## SOYABEAN AREA (1000 ha) SOYABEAN PRODUCTION (1000 tons)
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00
## Mean : 41.38 Mean : 42.42
## 3rd Qu.: 17.12 3rd Qu.: 14.25
## Max. :468.78 Max. :673.00
## SOYABEAN YIELD (Kg per ha) OILSEEDS AREA (1000 ha)
## Min. : 0.0 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.00
## Median : 0.0 Median : 0.00
## Mean : 435.1 Mean : 48.72
## 3rd Qu.: 854.3 3rd Qu.: 45.00
## Max. :2063.8 Max. :636.70
## OILSEEDS PRODUCTION (1000 tons) OILSEEDS YIELD (Kg per ha)
## Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.0
## Median : 0.00 Median : 0.0
## Mean : 56.16 Mean : 403.5
## 3rd Qu.: 41.09 3rd Qu.: 837.4
## Max. :1101.11 Max. :3553.2
## SUGARCANE AREA (1000 ha) SUGARCANE PRODUCTION (1000 tons)
## Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.010 1st Qu.: 0.03
## Median : 0.730 Median : 3.37
## Mean : 17.871 Mean : 122.37
## 3rd Qu.: 6.732 3rd Qu.: 34.10
## Max. :277.300 Max. :2326.51
## SUGARCANE YIELD (Kg per ha) COTTON AREA (1000 ha)
## Min. : 0.0 Min. : 0.00
## 1st Qu.: 506.3 1st Qu.: 0.00
## Median : 5864.0 Median : 0.00
## Mean : 4785.4 Mean : 27.41
## 3rd Qu.: 6970.6 3rd Qu.: 5.00
## Max. :17988.0 Max. :492.87
## COTTON PRODUCTION (1000 tons) COTTON YIELD (Kg per ha)
## Min. : 0.0000 Min. : 0.0
## 1st Qu.: 0.0000 1st Qu.: 0.0
## Median : 0.0000 Median : 0.0
## Mean : 12.4371 Mean : 126.9
## 3rd Qu.: 0.6425 3rd Qu.: 108.0
## Max. :348.7300 Max. :1009.5

3
## FRUITS AND VEGETABLES AREA (1000 ha)
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 7.08
## Mean : 20.95
## 3rd Qu.: 23.91
## Max. :213.69

Boxplot for rice production

C%>%ggplot(aes(`RICE PRODUCTION (1000 tons)`))+


geom_boxplot(col=c("red"))+
facet_wrap(~Year)+
labs(title="CONCENTRATION OF RICE PROD. IN FOUR YEARS THROUGH BOXPLOTS")

CONCENTRATION OF RICE PROD. IN FOUR YEARS THROUGH BOXPLOTS


2014 2015
0.4

0.2

0.0

−0.2

−0.4
2016 2017
0.4

0.2

0.0

−0.2

−0.4
0 1000 2000 3000 0 1000 2000 3000
RICE PRODUCTION (1000 tons)

The graph makes it evident the major concentration of rice production is within 1000000 tons every year.
The extremest outlier can be located for the years 2014 and 2015, above 3000000 tonnes of production . The
median production in all the years has been around 300000-400000 tons for all the districts in the states
taken except for 2015 , where the major concentration is clearly within 500000 tonnes and median is even
less that is 100000 tonnes. The mild outliers are within 3000000 tones of production. Overall we can say ,
most of the districts produce rice for around 500000 tonnes while few districts with exceptional geographical
situation and agricultural background may be able to produce more than 3000000 tones per year.

4
Pie chart for depicting Rice area

E=filter(B,Year=="2015")
AREA=E%>%group_by(`State Name`) %>%
summarize_each(funs(sum(.,na.rm=T)),`RICE AREA (1000 ha)` )

## Warning: ‘summarise_each_()‘ was deprecated in dplyr 0.7.0.


## i Please use ‘across()‘ instead.
## i The deprecated feature was likely used in the dplyr package.
## Please report the issue at <https://github.com/tidyverse/dplyr/issues>.
## This warning is displayed once every 8 hours.
## Call ‘lifecycle::last_lifecycle_warnings()‘ to see where this warning was
## generated.

## Warning: ‘funs()‘ was deprecated in dplyr 0.8.0.


## i Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with ‘tibble::lst()‘: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call ‘lifecycle::last_lifecycle_warnings()‘ to see where this warning was
## generated.

F=AREA[1:7,2]
G=as.list(F)
H=c(3232.34 ,774.22,1353.10,2026.00 ,182.92 ,5876.26,5523.96 )

pie(H,labels = c("Bihar(61.34 %)","Gujarat(14.69%)","Haryana(25.67%)","MadhyaPradesh(38.45%)","Rajasthan

5
DISTRIBUTION OF TOTAL RICE AREA AMONG TOP PRODUCING STATES

Haryana(25.67%)
Gujarat(14.69%)
MadhyaPradesh(38.45%)
Rajasthan(6.07%)
Bihar(61.34 %)

Uttar Pradesh(111.522%)

West Bengal(104.83%)

It shows that West Bengal is the state with the largest area of Rice as it is a staple crop there , followed
by Uttar Pradesh, another leading producer of Rice .Rajasthan has the lowest area for rice as it is an arid
region whereas rice crop is irrigation intense crop.It shows that West Bengal is the state with the largest
area of Rice as it is a staple crop there , followed by Uttar Pradesh, another leading producer of Rice .
Bihar comes third among these states. We can clearly attribute this to the vast network of Ganges and its
tributaries as well as the alluvial soil.Rajasthan has the lowest area for rice as it is an arid region whereas
rice crop is an irrigation intense crop.

Scatter plot on barley yield

I=B%>%filter(.,`State Code`=="10")
RAJAREA=I%>%group_by(`Dist Name`) %>%
summarize_each(funs(sum(.,na.rm=T)),`BARLEY YIELD (Kg per ha)` )

## Warning: ‘funs()‘ was deprecated in dplyr 0.8.0.


## i Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with ‘tibble::lst()‘: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call ‘lifecycle::last_lifecycle_warnings()‘ to see where this warning was
## generated.

6
ggplot(RAJAREA)+
geom_point(aes(`Dist Name`,`BARLEY YIELD (Kg per ha)`),colour="blue",size=2)+
theme(axis.text.x=element_text(angle=90))

14000
BARLEY YIELD (Kg per ha)

12000

10000

Swami Madhopur
Ganganagar
Chittorgarh

Dungarpur

Jhunjhunu
Banswara

Bharatpur

Jaisalmer

Jhalawar
Bhilwara

Jodhpur

Udaipur
Bikaner
Barmer

Nagaur
Churu

Jaipur

Jalore
Ajmer

Bundi

Sirohi
Alwar

Sikar

Tonk
Kota

Pali
Dist Name

labs(title="BARLEY YEILD IN THE DISTRICTS OF RAJASTHAN")

## $title
## [1] "BARLEY YEILD IN THE DISTRICTS OF RAJASTHAN"
##
## attr(,"class")
## [1] "labels"

Here,This plot makes it evident that Alwar has the largest yield of Barley , followed by GangaNagar and
Sikar. The lowest yield is in Bikaner.The reason may be suggested as the better agricultural practices and
equipment available to the farmers in the areas of region of Alwar, Ganganagar, Hanumangarh. Moreover
there is certainly an influence of green revolution in these areas as they are quite close to Punjab and Haryana
Moreover, Indira Gandhi canal which flows through heart of Rajasthan has made it feasible for it to be one
of the largest producer of Barley.

Sugarcane production line graph

7
SUGPROD=B%>%group_by(`Year`) %>%
summarize_each(funs(sum(.,na.rm=T)),`SUGARCANE PRODUCTION (1000 tons)`);SUGPROD

## Warning: ‘funs()‘ was deprecated in dplyr 0.8.0.


## i Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with ‘tibble::lst()‘: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call ‘lifecycle::last_lifecycle_warnings()‘ to see where this warning was
## generated.

## # A tibble: 4 x 2
## Year ‘SUGARCANE PRODUCTION (1000 tons)‘
## <dbl> <dbl>
## 1 2014 19436.
## 2 2015 17929.
## 3 2016 19955.
## 4 2017 20995

O=as.matrix(SUGPROD)
plot(O,type="l",cex=4,col="black",main="Production of SUGARCANE over the years")

Production of SUGARCANE over the years


SUGARCANE PRODUCTION (1000 tons)

21000
20000
19000
18000

2014.0 2014.5 2015.0 2015.5 2016.0 2016.5 2017.0

Year

8
The graph clearly shows that sugarcane production was high in 2014 but drastically declined in 2015 and
then the production has risen exponentially.This can be attributed to bad climate conditions in the year
2015 because of erratic pattern of conducive conditions for sugarcane production.Moreover, over the years
in 2016 and 2017, certain govt schemes like price policy have encouraged its production.

stacked bar chart for cotton yield

P=B%>%filter(.,`State Name`==c("Haryana","Gujarat",'Madhya Pradesh'))

## Warning: There was 1 warning in ‘filter()‘.


## i In argument: ‘‘State Name‘ == c("Haryana", "Gujarat", "Madhya Pradesh")‘.
## Caused by warning in ‘‘ ‘State Name‘ == c("Haryana", "Gujarat", "Madhya Pradesh") ‘‘:
## ! longer object length is not a multiple of shorter object length

COTY=P%>%group_by(`State Name`,`Year`) %>%


summarize_each(funs(sum(.,na.rm=T)),`COTTON YIELD (Kg per ha)` )

## Warning: ‘funs()‘ was deprecated in dplyr 0.8.0.


## i Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with ‘tibble::lst()‘: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call ‘lifecycle::last_lifecycle_warnings()‘ to see where this warning was
## generated.

Y1=as.matrix(COTY)
Z=Y1[1:12,3]
Z1=as.vector(Z)
Z2=matrix(Z1,4,3,dimnames =list( c("2014","2015",'2016','2017'),c("Gujarat","Haryana","Madhya Pradesh"))
Z2

## Gujarat Haryana Madhya Pradesh


## 2014 "2951.12" "1374.29" " 728.10"
## 2015 "3236.01" " 423.83" " 522.12"
## 2016 "2675.83" " 541.35" " 289.09"
## 2017 "2999.47" "1433.48" " 851.53"

barplot(Z2,legend=T,col=c(1:12),ylim=c(0,15000))

9
2017
12000

2016
2015
2014
8000
4000
0

Gujarat Haryana Madhya Pradesh

It shows Gujarat as the largest producer of Cotton among the three top producing states of cotton with
almost equal amounts of production in all 4 years. Haryana, as the second largest producer of cotton , had
maximum yield in 2017 and the least in 2016. Madhya Pradesh, the lowest of the three , had maximum
production in 2017 and least in 2016.We can interpret that Gujarat, as it lies in the black soil belt of Deccan
Plateau has excellent yields all 4 years while Madhya Pradesh, with thin patches of black soil had the least
yield. Haryana , though doesn’t lie in any belt of black soil but due to heavy investments and revolutionised
agriculture is able to produce at par with cotton belt states.In 2017 , all three states had maximum yield of
cotton.

getwd()

## [1] "/Users/vdb/Downloads"

C=as.data.frame(read.csv("agricultural produce.csv"))# importing dataset

Vedant

Scatter plot on Wheat Yield

C%>%ggplot(aes(Year, WHEAT.YIELD..Kg.per.ha., col = WHEAT.YIELD..Kg.per.ha. > 3000)) +


geom_point() +
facet_wrap(~State.Name) +

10
scale_color_manual(labels = c(">3000", "<3000"), values = c("springgreen3", "red")) +
theme_minimal() +
theme(legend.position = "topright",
legend.title = element_blank(),
axis.text = element_text(size = 10),
axis.title = element_text(size = 12, face = "bold"),
plot.title = element_text(size = 16, hjust = 0.5, face = "bold"))+
labs(title="Wheat Yield: 2014-2017",y="Yield")

Wheat Yield: 2014−2017


Bihar Gujarat Haryana
5000
4000
3000
2000
1000
0
Madhya Pradesh Rajasthan Uttar Pradesh
5000
4000
Yield

3000
2000
1000
0
2014 2015 2016 20172014 2015 2016 2017
West Bengal
5000
4000
3000
2000
1000
0
2014 2015 2016 2017
Year

This data pertains to wheat yeild in India across different states. The red points depicts points above 3000
kg. It can be seen Uttar Pradesh is the highest producer of wheat and has yeild which exceeds 3000 kg per
ha which is depicted by the red points. This is because it has fertile alluvial soil along with dry and cool
winter season which is most suitable for growing wheat. Moreover, Madhya Pradesh and Punjab are the
other two states whose share in wheat production is substantial. West Bengal, which is the lowest producer
of wheat among these states, has a tropical climate with a significant influence of the Bay of Bengal. The
state experiences mild winters and high temperatures which is not conducive for a winter crop such as wheat.

Scatter plot on Correlationship between Rice Production and Area

C%>%ggplot(aes(RICE.PRODUCTION..1000.tons.,RICE.AREA..1000.ha.))+
geom_point(col="violetred3")+
geom_smooth(method=lm)+
labs(
title = "Correlationship between Rice Production and Area",

11
x = "Rice Production (1000 tons)",
y = "Rice Area (1000 ha)")+
theme_minimal() +
theme(plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
axis.title = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 10),
axis.text.x = element_text(angle = 45, hjust = 1))

## ‘geom_smooth()‘ using formula = ’y ~ x’

Correlationship between Rice Production and Area


1200

900
Rice Area (1000 ha)

600

300

0
0

00

00

00
10

20

30

Rice Production (1000 tons)


This is a scatter plot showing correlation between area under rice and rice production. it is evident that
there is high degree of correlation between these two variables as the points do not deviate much from the
line of best fit which is represented by the Blue line which is drawn using the geom_smooth command
through the linear model method. Also, the highest ever rice production is around 33 lakh tons with almost
11 lakh tones of production. In certain situations, there might be a negative correlation between crop area
and production due to limited availability land.

Barplot on oilseed production in Rajasthan

a=C%>%select(OILSEEDS.PRODUCTION..1000.tons.,State.Name,Dist.Name,Year)%>%
filter(State.Name=="Rajasthan"&Year=="2015")%>%group_by(OILSEEDS.PRODUCTION..1000.tons.)
d=c("584.19","541.04","473.17","432.08","398.6")
e=matrix(d,1,5)

12
colnames(e)=c("Ganganagar","Kota","Bikaner","Jodhpur","Bharatpur")
e=as.numeric(e)
barplot(e,col=colors()[31:42],names=c("Ganganagar","Kota","Bikaner","Jodhpur","Bharatpur"),xlab="Distric

Top 5 Oilseed producing districts in Rajasthan (2015)


500
OILSEEDS PRODUCTION

400
300
200
100
0

Ganganagar Kota Bikaner Jodhpur Bharatpur

Districts

This barplot describes top 5 oilseed producing districts in Rajasthan (2015). It can se seen Ganganagar
and Kota are the highest Oilseed producing states whose production exceeds 5 lakh tones. The average
production is near to 4.5 lakh tones and the lowest producer among there districts is Bharatpur which
produces around 3.9 lakh tones of production. Rajasthan experiences arid to semi-arid climates, which can
be suitable for certain oilseed crops such as mustard, groundnut, and soybeans.

Line graph for area under cotton

ve=C%>%filter(State.Name!="West Bengal"&Year=="2014")%>%select(COTTON.AREA..1000.ha.)%>%filter(COTTON.AR
v1=C%>%filter(State.Name!="West Bengal"&Year=="2015")%>%select(COTTON.AREA..1000.ha.)%>%filter(COTTON.AR
v2=C%>%filter(State.Name!="West Bengal"&Year=="2016")%>%select(COTTON.AREA..1000.ha.)%>%filter(COTTON.AR
v3=C%>%filter(State.Name!="West Bengal"&Year=="2017")%>%select(COTTON.AREA..1000.ha.)%>%filter(COTTON.AR
sum(ve)

## [1] 4660.64

sum(v1)

## [1] 4354.13

13
sum(v2)

## [1] 4038.95

sum(v3)

## [1] 4486.56

a=c("2014","2015","2016","2017","4660.64","4154.13","4038","4486.56")
vb=matrix(a,4)
colnames(vb)=c("Year","Cotton area")
par(bg="seashell")
plot(vb,type = "b", col = "darkgreen", lwd = 2,
xlab = "Years", ylab = "Cotton Area (1000 ha)",
main = "Total area under cotton in India")
legend("topright", legend = c("Cotton area"), col = c("darkgreen", "red"), lty = c(1, 2), lwd = 2, cex =

Total area under cotton in India

Cotton area
Cotton Area (1000 ha)

4500
4300
4100

2014.0 2014.5 2015.0 2015.5 2016.0 2016.5 2017.0

Years

This line chart shows the total area under cotton production across a span of 4 years in India. the year 2014
witnessed the highest area under cotton with around 46 lakh tonnes. Since then, there is a steep decline in
its area which was mainly due to deficit rainfall. But after 2016, the total area incread gradually to around
44 lakh tonnes. New varieties of genetically modified or hybrid cotton seeds constitutes a major reason for
such increase in area.

14
Boxplot on Maize Yield

C%>%filter(MAIZE.YIELD..Kg.per.ha.<15000&MAIZE.YIELD..Kg.per.ha.>0)%>%
ggplot(aes(x=reorder(Year,MAIZE.YIELD..Kg.per.ha.),y=MAIZE.YIELD..Kg.per.ha.))+
geom_boxplot(col="darkgreen",alpha=0.4,outlier.size=3,outlier.colour="red")+
coord_flip()+
labs(
title = "Maize Yield Distribution Over Years",
x = "Year",
y = "Maize Yield (Kg per ha)") +
stat_summary(fun=mean,geom="point",color="blue")

Maize Yield Distribution Over Years

2017

2016
Year

2014

2015

0 2500 5000 7500


Maize Yield (Kg per ha)
the Boxplots depict Maize yeild over 4 years. as we can see the median is near to 2200 kg per ha. In 2016,
India crossed the 8000 kg threshold mark as the yeild in Bihar was 8772 kg/ha which is an extreme outlier.
From the year 2016 to 2017 there was a substantial rise in yield as the values crossed 6000 kg/ha mark.
Interestingly, even though 2017 saw the highest total yeild, it also witnessed near to 0 yeild. That’s why the
mean in not substantially higher than others, which is represented by the blue points using the command
stat_summary.

15
Sanidhya

using pair plots

P1=B%>%filter(`State Name`==c("Madhya Pradesh","Uttar Pradesh"))


b=select(P1,contains("YIELD") & starts_with(c("RICE","WHEAT","SUGARCANE","oilseeds")))
pairs(b, # data
main ="Analysing relations between different crop yields", # title
col="red", # colour
pch=1) # plotting character

Analysing relations between different crop yields


1500 2500 3500 4500 0 2000 6000

2000
RICE YIELD (Kg per ha)

0
3500

WHEAT YIELD (Kg per ha)


1500

1000
OILSEEDS YIELD (Kg per ha)

0
4000

SUGARCANE YIELD (Kg per ha)


0

0 1000 3000 0 500 1000

attach(b)
cor(`RICE YIELD (Kg per ha)`,`SUGARCANE YIELD (Kg per ha)`) # moderate degree of positive correlation

## [1] 0.4872056

cor(`WHEAT YIELD (Kg per ha)`,`SUGARCANE YIELD (Kg per ha)`)# negligible positive correlation

## [1] 0.1308622

cor(`WHEAT YIELD (Kg per ha)`,`RICE YIELD (Kg per ha)`) #low degree of positive correlation

16
## [1] 0.2611049

cor(`RICE YIELD (Kg per ha)`,`OILSEEDS YIELD (Kg per ha)`) # low degree of negative correlation

## [1] -0.3240037

cor(`WHEAT YIELD (Kg per ha)`,`OILSEEDS YIELD (Kg per ha)`)# negligible degree of negative correlation

## [1] -0.1466258

cor(`SUGARCANE YIELD (Kg per ha)`,`OILSEEDS YIELD (Kg per ha)`) # low degree of negative correlation

## [1] -0.3127461

detach(b)

Here we have analyzed whether there was any relation between the yields of different crops in Madhya
Pradesh and Uttar Pradesh or not using pair plots and the correlation function. We have shown the degree
of linear correlations between these variables and what these numbers depict. Using these correlations we
can analyze which crop combination would be suitable for the farmer to use if he adopts the practice of
mixed cropping in these states. As shown it would be suitable to grow those 2 crops together whose yields
are positively related to each other.

Maize Production pie chart

E=filter(B,Year=="2015")
PRODUCTION=E%>%group_by(`State Name`) %>%
summarize_each(list(~sum(.,na.rm=T)),`MAIZE PRODUCTION (1000 tons)` ) # using group by and summarize co

R1=(PRODUCTION[2]) # this data is extracted in the list format


Vec1=unlist(R1)
Vec1# using unlist to create numeric vector from the list

## MAIZE PRODUCTION (1000 tons)1 MAIZE PRODUCTION (1000 tons)2


## 2517.16 608.51
## MAIZE PRODUCTION (1000 tons)3 MAIZE PRODUCTION (1000 tons)4
## 18.00 2909.00
## MAIZE PRODUCTION (1000 tons)5 MAIZE PRODUCTION (1000 tons)6
## 1156.71 1304.66
## MAIZE PRODUCTION (1000 tons)7
## 662.44

pie(Vec1,
labels = c("Bihar(27.6%)","Gujarat(6.2%)","Haryana(0.3%)","Madhya Pradesh(31.9%)","Rajasthan(12.8%)"
col =c(3:10), # slice co
main="Contribution of each state in maize production", # title
radius=0.8, # radius
border="blue",lty=7 # border and line
)

17
Contribution of each state in maize production

Gujarat(6.2%)
Haryana(0.3%)
Bihar(27.6%)

Madhya Pradesh(31.9%)
West Bengal(7.1%)

Uttar Pradesh(14.1%)
Rajasthan(12.8%)

As we can clearly see that Bihar and Madhya Pradesh contribute the most in maize production amongst
these 7 states. Rajasthan and Uttar Pradesh also contribute about 1/8th part of production each. Gujarat
and West Bengal have lesser contributions whereas Haryana produces negligible amount when compared to
the total production. The reason for this are the geographical conditions of these states that is the presence
of well drained loamy soil and moderate temperature.

Oilseeds boxplot(facet wrapped by states)

B%>%ggplot(aes(`OILSEEDS PRODUCTION (1000 tons)`))+ # data


geom_boxplot(col=c("red"),fill="black", #boxplot
outlier.colour = "dark green", # outlier color
outlier.shape = 6,
outlier.size = 1.2,
outlier.stroke = 1.1,)+
facet_wrap(~`State Name`)+
labs(title="Boxplots for oilseed production in different states")+
theme_grey()

18
Boxplots for oilseed production in different states
Bihar Gujarat Haryana
0.4
0.2
0.0
−0.2
−0.4
Madhya Pradesh Rajasthan Uttar Pradesh
0.4
0.2
0.0
−0.2
−0.4
0 300 600 900 0 300 600 900
West Bengal
0.4
0.2
0.0
−0.2
−0.4
0 300 600 900
OILSEEDS PRODUCTION (1000 tons)
As can be seen from the given data the median production of oilseeds in different states is 0. This is true
because oilseeds require specific geographical conditions which are not easily found.Most states have many
outliers especially Rajasthan and Gujarat which produce most of the oilseeds in India. Atleast 75% districts
of West Bengal,UP,Bihar do not produce oil seeds at all.Haryana is a bit different from all states and many
districts in Haryana contribute towards the oilseeds production. Madhya Pradesh and Rajasthan’s boxplots
are nearly similar as the geographical conditions in these 2 states are nearly same excluding Rajasthan’s
Thar region.

Area under production for wheat

Ar1=B%>%group_by(`Year`) %>%
summarize_each(list(~sum(.,na.rm=T)),`WHEAT AREA (1000 ha)` )

f1=as.matrix(Ar1)
plot(f1,type="b",
cex=4,
col="orange",
main="Area under wheat production",
sub="Variation in area under production",
xlab="year",
ylab="area",
lwd=1.5,
lty=2,
pch=2.1,)

19
Area under wheat production
25400
area

25000
24600

2014.0 2014.5 2015.0 2015.5 2016.0 2016.5 2017.0

year
Variation in area under production
The given line graph shows how the area under production for wheat has increased or decresead through
the years. As we can see the production fell in 2015 but rose back to previous levels in 2016. In 2017 the
area again declined. The data is not showing any upward or downward trend. One of the reasons for this is
that there is very limited amount of land left now and it is not economical to use as it is not much fertile.
Due to this the change has not been very large.

Histogram and frequency polygon

f2=filter(B,Year=="2017"& `State Name`=="Uttar Pradesh")


ggplot(f2,aes(`RICE PRODUCTION (1000 tons)`))+
geom_histogram(binwidth=100,col="black",fill="yellow")+
geom_freqpoly(binwidth=100)+
theme_gray()

20
8

6
count

0 400 800 1200


RICE PRODUCTION (1000 tons)

This chart depicts the number of districts in Uttar Pradesh and the amount of rice produced by them.The
width of each bin is 100 that is 1 bin represents 100 tons of rice.The height of the bin shows the number of
districts lying in that particular interval. This histogram has 2 peaks at 550-650 and between 50-250. This
data shows that almost all districts of Uttar Pradesh cultivate rice.

Bibliography

https://mospi.gov.in/4-agricultural-statistics Beginning R: Statistical Programming Language by Dr. Mark


Gardener

21

You might also like