Stats-with-R-project
Stats-with-R-project
Acknowledgements
We wish to express our gracious gratitude to Rajni Ma’am and Uday sir, our project guide, and our Principal,
Dr. John Varghese, for their guidance and support in completing our project on “Analysis of Agricultural
Crops in India”. Their cooperation and contributions were crucial to the project’s success.
Contents
• Boxplot for rice production
• Pie chart for depicting Rice area
• Scatter plot on barley yield
• Sugarcane production line graph
• Stacked bar chart for cotton yield
• Scatter plot on Wheat Yield
• Scatter plot on Correlationship between Rice Production and Area
• Barplot on oilseed production in Rajasthan
• Line graph for area under cotton
• Boxplot on Maize Yield
• Analysing relations between different crop yields using pairs plot
• Maize Production pie chart
• Oilseeds boxplot(facet wrapped by states)
• Area under production for wheat
• Histogram and frequency polygon
data
library(dplyr)
##
## Attaching package: ’dplyr’
library(ggplot2)
library(readxl)
library(GGally)
1
getwd()
## [1] "/Users/vdb/Downloads"
Dev
summary(B)
2
## 1st Qu.: 0 1st Qu.:0.0000
## Median :1836 Median :0.0000
## Mean :1747 Mean :0.1375
## 3rd Qu.:2823 3rd Qu.:0.0100
## Max. :5000 Max. :5.2700
## SUNFLOWER PRODUCTION (1000 tons) SUNFLOWER YIELD (Kg per ha)
## Min. :0.0000 Min. : 0.0
## 1st Qu.:0.0000 1st Qu.: 0.0
## Median :0.0000 Median : 0.0
## Mean :0.1882 Mean : 433.2
## 3rd Qu.:0.0200 3rd Qu.:1103.9
## Max. :6.4500 Max. :2142.9
## SOYABEAN AREA (1000 ha) SOYABEAN PRODUCTION (1000 tons)
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00
## Mean : 41.38 Mean : 42.42
## 3rd Qu.: 17.12 3rd Qu.: 14.25
## Max. :468.78 Max. :673.00
## SOYABEAN YIELD (Kg per ha) OILSEEDS AREA (1000 ha)
## Min. : 0.0 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.00
## Median : 0.0 Median : 0.00
## Mean : 435.1 Mean : 48.72
## 3rd Qu.: 854.3 3rd Qu.: 45.00
## Max. :2063.8 Max. :636.70
## OILSEEDS PRODUCTION (1000 tons) OILSEEDS YIELD (Kg per ha)
## Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.0
## Median : 0.00 Median : 0.0
## Mean : 56.16 Mean : 403.5
## 3rd Qu.: 41.09 3rd Qu.: 837.4
## Max. :1101.11 Max. :3553.2
## SUGARCANE AREA (1000 ha) SUGARCANE PRODUCTION (1000 tons)
## Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.010 1st Qu.: 0.03
## Median : 0.730 Median : 3.37
## Mean : 17.871 Mean : 122.37
## 3rd Qu.: 6.732 3rd Qu.: 34.10
## Max. :277.300 Max. :2326.51
## SUGARCANE YIELD (Kg per ha) COTTON AREA (1000 ha)
## Min. : 0.0 Min. : 0.00
## 1st Qu.: 506.3 1st Qu.: 0.00
## Median : 5864.0 Median : 0.00
## Mean : 4785.4 Mean : 27.41
## 3rd Qu.: 6970.6 3rd Qu.: 5.00
## Max. :17988.0 Max. :492.87
## COTTON PRODUCTION (1000 tons) COTTON YIELD (Kg per ha)
## Min. : 0.0000 Min. : 0.0
## 1st Qu.: 0.0000 1st Qu.: 0.0
## Median : 0.0000 Median : 0.0
## Mean : 12.4371 Mean : 126.9
## 3rd Qu.: 0.6425 3rd Qu.: 108.0
## Max. :348.7300 Max. :1009.5
3
## FRUITS AND VEGETABLES AREA (1000 ha)
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 7.08
## Mean : 20.95
## 3rd Qu.: 23.91
## Max. :213.69
0.2
0.0
−0.2
−0.4
2016 2017
0.4
0.2
0.0
−0.2
−0.4
0 1000 2000 3000 0 1000 2000 3000
RICE PRODUCTION (1000 tons)
The graph makes it evident the major concentration of rice production is within 1000000 tons every year.
The extremest outlier can be located for the years 2014 and 2015, above 3000000 tonnes of production . The
median production in all the years has been around 300000-400000 tons for all the districts in the states
taken except for 2015 , where the major concentration is clearly within 500000 tonnes and median is even
less that is 100000 tonnes. The mild outliers are within 3000000 tones of production. Overall we can say ,
most of the districts produce rice for around 500000 tonnes while few districts with exceptional geographical
situation and agricultural background may be able to produce more than 3000000 tones per year.
4
Pie chart for depicting Rice area
E=filter(B,Year=="2015")
AREA=E%>%group_by(`State Name`) %>%
summarize_each(funs(sum(.,na.rm=T)),`RICE AREA (1000 ha)` )
F=AREA[1:7,2]
G=as.list(F)
H=c(3232.34 ,774.22,1353.10,2026.00 ,182.92 ,5876.26,5523.96 )
5
DISTRIBUTION OF TOTAL RICE AREA AMONG TOP PRODUCING STATES
Haryana(25.67%)
Gujarat(14.69%)
MadhyaPradesh(38.45%)
Rajasthan(6.07%)
Bihar(61.34 %)
Uttar Pradesh(111.522%)
West Bengal(104.83%)
It shows that West Bengal is the state with the largest area of Rice as it is a staple crop there , followed
by Uttar Pradesh, another leading producer of Rice .Rajasthan has the lowest area for rice as it is an arid
region whereas rice crop is irrigation intense crop.It shows that West Bengal is the state with the largest
area of Rice as it is a staple crop there , followed by Uttar Pradesh, another leading producer of Rice .
Bihar comes third among these states. We can clearly attribute this to the vast network of Ganges and its
tributaries as well as the alluvial soil.Rajasthan has the lowest area for rice as it is an arid region whereas
rice crop is an irrigation intense crop.
I=B%>%filter(.,`State Code`=="10")
RAJAREA=I%>%group_by(`Dist Name`) %>%
summarize_each(funs(sum(.,na.rm=T)),`BARLEY YIELD (Kg per ha)` )
6
ggplot(RAJAREA)+
geom_point(aes(`Dist Name`,`BARLEY YIELD (Kg per ha)`),colour="blue",size=2)+
theme(axis.text.x=element_text(angle=90))
14000
BARLEY YIELD (Kg per ha)
12000
10000
Swami Madhopur
Ganganagar
Chittorgarh
Dungarpur
Jhunjhunu
Banswara
Bharatpur
Jaisalmer
Jhalawar
Bhilwara
Jodhpur
Udaipur
Bikaner
Barmer
Nagaur
Churu
Jaipur
Jalore
Ajmer
Bundi
Sirohi
Alwar
Sikar
Tonk
Kota
Pali
Dist Name
## $title
## [1] "BARLEY YEILD IN THE DISTRICTS OF RAJASTHAN"
##
## attr(,"class")
## [1] "labels"
Here,This plot makes it evident that Alwar has the largest yield of Barley , followed by GangaNagar and
Sikar. The lowest yield is in Bikaner.The reason may be suggested as the better agricultural practices and
equipment available to the farmers in the areas of region of Alwar, Ganganagar, Hanumangarh. Moreover
there is certainly an influence of green revolution in these areas as they are quite close to Punjab and Haryana
Moreover, Indira Gandhi canal which flows through heart of Rajasthan has made it feasible for it to be one
of the largest producer of Barley.
7
SUGPROD=B%>%group_by(`Year`) %>%
summarize_each(funs(sum(.,na.rm=T)),`SUGARCANE PRODUCTION (1000 tons)`);SUGPROD
## # A tibble: 4 x 2
## Year ‘SUGARCANE PRODUCTION (1000 tons)‘
## <dbl> <dbl>
## 1 2014 19436.
## 2 2015 17929.
## 3 2016 19955.
## 4 2017 20995
O=as.matrix(SUGPROD)
plot(O,type="l",cex=4,col="black",main="Production of SUGARCANE over the years")
21000
20000
19000
18000
Year
8
The graph clearly shows that sugarcane production was high in 2014 but drastically declined in 2015 and
then the production has risen exponentially.This can be attributed to bad climate conditions in the year
2015 because of erratic pattern of conducive conditions for sugarcane production.Moreover, over the years
in 2016 and 2017, certain govt schemes like price policy have encouraged its production.
Y1=as.matrix(COTY)
Z=Y1[1:12,3]
Z1=as.vector(Z)
Z2=matrix(Z1,4,3,dimnames =list( c("2014","2015",'2016','2017'),c("Gujarat","Haryana","Madhya Pradesh"))
Z2
barplot(Z2,legend=T,col=c(1:12),ylim=c(0,15000))
9
2017
12000
2016
2015
2014
8000
4000
0
It shows Gujarat as the largest producer of Cotton among the three top producing states of cotton with
almost equal amounts of production in all 4 years. Haryana, as the second largest producer of cotton , had
maximum yield in 2017 and the least in 2016. Madhya Pradesh, the lowest of the three , had maximum
production in 2017 and least in 2016.We can interpret that Gujarat, as it lies in the black soil belt of Deccan
Plateau has excellent yields all 4 years while Madhya Pradesh, with thin patches of black soil had the least
yield. Haryana , though doesn’t lie in any belt of black soil but due to heavy investments and revolutionised
agriculture is able to produce at par with cotton belt states.In 2017 , all three states had maximum yield of
cotton.
getwd()
## [1] "/Users/vdb/Downloads"
Vedant
10
scale_color_manual(labels = c(">3000", "<3000"), values = c("springgreen3", "red")) +
theme_minimal() +
theme(legend.position = "topright",
legend.title = element_blank(),
axis.text = element_text(size = 10),
axis.title = element_text(size = 12, face = "bold"),
plot.title = element_text(size = 16, hjust = 0.5, face = "bold"))+
labs(title="Wheat Yield: 2014-2017",y="Yield")
3000
2000
1000
0
2014 2015 2016 20172014 2015 2016 2017
West Bengal
5000
4000
3000
2000
1000
0
2014 2015 2016 2017
Year
This data pertains to wheat yeild in India across different states. The red points depicts points above 3000
kg. It can be seen Uttar Pradesh is the highest producer of wheat and has yeild which exceeds 3000 kg per
ha which is depicted by the red points. This is because it has fertile alluvial soil along with dry and cool
winter season which is most suitable for growing wheat. Moreover, Madhya Pradesh and Punjab are the
other two states whose share in wheat production is substantial. West Bengal, which is the lowest producer
of wheat among these states, has a tropical climate with a significant influence of the Bay of Bengal. The
state experiences mild winters and high temperatures which is not conducive for a winter crop such as wheat.
C%>%ggplot(aes(RICE.PRODUCTION..1000.tons.,RICE.AREA..1000.ha.))+
geom_point(col="violetred3")+
geom_smooth(method=lm)+
labs(
title = "Correlationship between Rice Production and Area",
11
x = "Rice Production (1000 tons)",
y = "Rice Area (1000 ha)")+
theme_minimal() +
theme(plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
axis.title = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 10),
axis.text.x = element_text(angle = 45, hjust = 1))
900
Rice Area (1000 ha)
600
300
0
0
00
00
00
10
20
30
a=C%>%select(OILSEEDS.PRODUCTION..1000.tons.,State.Name,Dist.Name,Year)%>%
filter(State.Name=="Rajasthan"&Year=="2015")%>%group_by(OILSEEDS.PRODUCTION..1000.tons.)
d=c("584.19","541.04","473.17","432.08","398.6")
e=matrix(d,1,5)
12
colnames(e)=c("Ganganagar","Kota","Bikaner","Jodhpur","Bharatpur")
e=as.numeric(e)
barplot(e,col=colors()[31:42],names=c("Ganganagar","Kota","Bikaner","Jodhpur","Bharatpur"),xlab="Distric
400
300
200
100
0
Districts
This barplot describes top 5 oilseed producing districts in Rajasthan (2015). It can se seen Ganganagar
and Kota are the highest Oilseed producing states whose production exceeds 5 lakh tones. The average
production is near to 4.5 lakh tones and the lowest producer among there districts is Bharatpur which
produces around 3.9 lakh tones of production. Rajasthan experiences arid to semi-arid climates, which can
be suitable for certain oilseed crops such as mustard, groundnut, and soybeans.
ve=C%>%filter(State.Name!="West Bengal"&Year=="2014")%>%select(COTTON.AREA..1000.ha.)%>%filter(COTTON.AR
v1=C%>%filter(State.Name!="West Bengal"&Year=="2015")%>%select(COTTON.AREA..1000.ha.)%>%filter(COTTON.AR
v2=C%>%filter(State.Name!="West Bengal"&Year=="2016")%>%select(COTTON.AREA..1000.ha.)%>%filter(COTTON.AR
v3=C%>%filter(State.Name!="West Bengal"&Year=="2017")%>%select(COTTON.AREA..1000.ha.)%>%filter(COTTON.AR
sum(ve)
## [1] 4660.64
sum(v1)
## [1] 4354.13
13
sum(v2)
## [1] 4038.95
sum(v3)
## [1] 4486.56
a=c("2014","2015","2016","2017","4660.64","4154.13","4038","4486.56")
vb=matrix(a,4)
colnames(vb)=c("Year","Cotton area")
par(bg="seashell")
plot(vb,type = "b", col = "darkgreen", lwd = 2,
xlab = "Years", ylab = "Cotton Area (1000 ha)",
main = "Total area under cotton in India")
legend("topright", legend = c("Cotton area"), col = c("darkgreen", "red"), lty = c(1, 2), lwd = 2, cex =
Cotton area
Cotton Area (1000 ha)
4500
4300
4100
Years
This line chart shows the total area under cotton production across a span of 4 years in India. the year 2014
witnessed the highest area under cotton with around 46 lakh tonnes. Since then, there is a steep decline in
its area which was mainly due to deficit rainfall. But after 2016, the total area incread gradually to around
44 lakh tonnes. New varieties of genetically modified or hybrid cotton seeds constitutes a major reason for
such increase in area.
14
Boxplot on Maize Yield
C%>%filter(MAIZE.YIELD..Kg.per.ha.<15000&MAIZE.YIELD..Kg.per.ha.>0)%>%
ggplot(aes(x=reorder(Year,MAIZE.YIELD..Kg.per.ha.),y=MAIZE.YIELD..Kg.per.ha.))+
geom_boxplot(col="darkgreen",alpha=0.4,outlier.size=3,outlier.colour="red")+
coord_flip()+
labs(
title = "Maize Yield Distribution Over Years",
x = "Year",
y = "Maize Yield (Kg per ha)") +
stat_summary(fun=mean,geom="point",color="blue")
2017
2016
Year
2014
2015
15
Sanidhya
2000
RICE YIELD (Kg per ha)
0
3500
1000
OILSEEDS YIELD (Kg per ha)
0
4000
attach(b)
cor(`RICE YIELD (Kg per ha)`,`SUGARCANE YIELD (Kg per ha)`) # moderate degree of positive correlation
## [1] 0.4872056
cor(`WHEAT YIELD (Kg per ha)`,`SUGARCANE YIELD (Kg per ha)`)# negligible positive correlation
## [1] 0.1308622
cor(`WHEAT YIELD (Kg per ha)`,`RICE YIELD (Kg per ha)`) #low degree of positive correlation
16
## [1] 0.2611049
cor(`RICE YIELD (Kg per ha)`,`OILSEEDS YIELD (Kg per ha)`) # low degree of negative correlation
## [1] -0.3240037
cor(`WHEAT YIELD (Kg per ha)`,`OILSEEDS YIELD (Kg per ha)`)# negligible degree of negative correlation
## [1] -0.1466258
cor(`SUGARCANE YIELD (Kg per ha)`,`OILSEEDS YIELD (Kg per ha)`) # low degree of negative correlation
## [1] -0.3127461
detach(b)
Here we have analyzed whether there was any relation between the yields of different crops in Madhya
Pradesh and Uttar Pradesh or not using pair plots and the correlation function. We have shown the degree
of linear correlations between these variables and what these numbers depict. Using these correlations we
can analyze which crop combination would be suitable for the farmer to use if he adopts the practice of
mixed cropping in these states. As shown it would be suitable to grow those 2 crops together whose yields
are positively related to each other.
E=filter(B,Year=="2015")
PRODUCTION=E%>%group_by(`State Name`) %>%
summarize_each(list(~sum(.,na.rm=T)),`MAIZE PRODUCTION (1000 tons)` ) # using group by and summarize co
pie(Vec1,
labels = c("Bihar(27.6%)","Gujarat(6.2%)","Haryana(0.3%)","Madhya Pradesh(31.9%)","Rajasthan(12.8%)"
col =c(3:10), # slice co
main="Contribution of each state in maize production", # title
radius=0.8, # radius
border="blue",lty=7 # border and line
)
17
Contribution of each state in maize production
Gujarat(6.2%)
Haryana(0.3%)
Bihar(27.6%)
Madhya Pradesh(31.9%)
West Bengal(7.1%)
Uttar Pradesh(14.1%)
Rajasthan(12.8%)
As we can clearly see that Bihar and Madhya Pradesh contribute the most in maize production amongst
these 7 states. Rajasthan and Uttar Pradesh also contribute about 1/8th part of production each. Gujarat
and West Bengal have lesser contributions whereas Haryana produces negligible amount when compared to
the total production. The reason for this are the geographical conditions of these states that is the presence
of well drained loamy soil and moderate temperature.
18
Boxplots for oilseed production in different states
Bihar Gujarat Haryana
0.4
0.2
0.0
−0.2
−0.4
Madhya Pradesh Rajasthan Uttar Pradesh
0.4
0.2
0.0
−0.2
−0.4
0 300 600 900 0 300 600 900
West Bengal
0.4
0.2
0.0
−0.2
−0.4
0 300 600 900
OILSEEDS PRODUCTION (1000 tons)
As can be seen from the given data the median production of oilseeds in different states is 0. This is true
because oilseeds require specific geographical conditions which are not easily found.Most states have many
outliers especially Rajasthan and Gujarat which produce most of the oilseeds in India. Atleast 75% districts
of West Bengal,UP,Bihar do not produce oil seeds at all.Haryana is a bit different from all states and many
districts in Haryana contribute towards the oilseeds production. Madhya Pradesh and Rajasthan’s boxplots
are nearly similar as the geographical conditions in these 2 states are nearly same excluding Rajasthan’s
Thar region.
Ar1=B%>%group_by(`Year`) %>%
summarize_each(list(~sum(.,na.rm=T)),`WHEAT AREA (1000 ha)` )
f1=as.matrix(Ar1)
plot(f1,type="b",
cex=4,
col="orange",
main="Area under wheat production",
sub="Variation in area under production",
xlab="year",
ylab="area",
lwd=1.5,
lty=2,
pch=2.1,)
19
Area under wheat production
25400
area
25000
24600
year
Variation in area under production
The given line graph shows how the area under production for wheat has increased or decresead through
the years. As we can see the production fell in 2015 but rose back to previous levels in 2016. In 2017 the
area again declined. The data is not showing any upward or downward trend. One of the reasons for this is
that there is very limited amount of land left now and it is not economical to use as it is not much fertile.
Due to this the change has not been very large.
20
8
6
count
This chart depicts the number of districts in Uttar Pradesh and the amount of rice produced by them.The
width of each bin is 100 that is 1 bin represents 100 tons of rice.The height of the bin shows the number of
districts lying in that particular interval. This histogram has 2 peaks at 550-650 and between 50-250. This
data shows that almost all districts of Uttar Pradesh cultivate rice.
Bibliography
21