SlideShare a Scribd company logo
A course work on R programming for
basics to advance statistics and GIS
SEEMAB AKHTAR
1
PREFACE
R has been around since 1995 and has today become the most popular programming language
among data scientists around the word. It includes several data packages and functions which
makes it an attractive programming language for data scientists. R gives wonderful platform
in data analysis, data wrangling, data visualization, machine learning and open source. This
course covers traditional statistics to advance statistics and GIS applications of R, such as
models, graph descriptive statistics, mathematical trend modeling and spatial plot. R is
designed to be a tool that helps scientists for analyzing data and It has many excellent
functions that make plots and fit models to data. Because of this, many statisticians learn to
use R as if it were a piece of software; they discover which functions accomplish what they
need and ignore the rest.
Speaker & Instructor
SEEMAB AKHTAR
M.Tech (Mineral Exploration)
IIT (ISM) Dhanbad
M.Sc. (Applied Geology)
University of Allahabad
Email: akhtariitdhn@gmail.com
Social site: https://www.linkedin.com/in/seemab-akhtar-3b7856139/
YouTube: https://www.youtube.com/c/KnowledgeEducationHub
Specialization: Geostatistics, GIS & Groundwater resource management
Experience: Six years’ experience in the field of Geostatistics, GIS and Groundwater resource
management
SEEMAB AKHTAR
2
A course work on R programming for basics to advance statistics and GIS
Serial
No.
Contents Time
1 R and R studio installation, Packages 10:00 AM–
10:30 AM
2 Part 1 Basics statistics by R programming Starting 10:30
AM to 12:00
PM
1. The Fundamental of Descriptive Statistics
2. Box Plot, Bar Plot, Histogram Plot, QQ plot
3. Measures of Central Tendency (Mean, Median & Mode)
4.1 Skewed Plot
4.2 Normal Distribution
4.3 Standard Normal Distribution
4.4 Central limit Theorem
4.5 Different Statistical Error (Introduction)
5. P value (Introduction)
6. Regression Analysis
Part 2 Advanced Statistics by R Programming
3
1.1 Mathematical Polynomial Trend Surface Identification
1.2 Trend Removal
2.1 Mann Kendall Test
2.3 Sen’s Slope
Starting
12:30 PM to
3:30 PM
Part 3 GIS with R Programming
4 1. Polygonal Shape file, Line Shape File, Point Shape File
2. Clipping and Mask
3. Spatial Plot, Level plot
4. Countering
5. Image Stacking and Regression, Pixel-Pixel, box plots over
Raster surfaces
Starting 4:00
PM to 5:30 PM
SEEMAB AKHTAR
3
R and R Studio Installation
Install first R (4.1.0 version or above) after then install R studio from the website
(https://www.r-project.org/ & https://www.rstudio.com/products/rstudio/download/)
Code for packages installment
#install.packages(“package name”)
Or open R studio and click on install and type the packages name in the browser (figure 1)
(IDE interface of R studio)
SEEMAB AKHTAR
4
Preliminary requirement for R programming
 Laptop or PC (4 GB RAM)
 Good internet connection like Jio 4G volte
 Code (This will be sent to all participant before starting the course)
 Make a folder on your desktop gives name R
 After the installation of R and R studio open the R studio (Integrated Development
Environment) interface.
 Make a directory and gives the path for your folder name (R)
 #setwd("C:UsersdellDesktopR")
Install the following packages
 raster
 rasterVis
 zoo
 xts
 ppcor
 rts
 rgdal
 spatialEco
 Kendall
 readr
 readxl
 gstat
 sp
 lattice
 ggplot2
 rgeos
 spacetime
 RColorBrewer
 latticeExtra
 map
 if(!require(psych)){install.packages("psych")}
 if(!require(DescTools)){install.packages("DescTools")}
 if(!require(Rmisc)){install.packages("Rmisc")}
 if(!require(FSA)){install.packages("FSA")}
 if(!require(plyr)){install.packages("plyr")}
 if(!require(boot)){install.packages("boot")}
SEEMAB AKHTAR
5
Part 1 Basics statistics by R programming
Descriptive Statistics
(Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
(Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
SEEMAB AKHTAR
6
(Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
(Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
SEEMAB AKHTAR
7
(Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
SEEMAB AKHTAR
8
Kurtosis
Kurtosis is a measure of the tailedness of a distribution. Tailedness is how
often outliers occur. Excess kurtosis is the tailedness of a distribution relative to a normal
distribution.
 Distributions with medium kurtosis (medium tails) are mesokurtic.
 Distributions with low kurtosis (thin tails) are platykurtic.
 Distributions with high kurtosis (fat tails) are leptokurtic.
Tails are the tapering ends on either side of a distribution. They represent the probability or
frequency of values that are extremely high or low compared to the mean. In other words, tails
represent how often outliers occur.
(Source https://www.scribbr.com/statistics/kurtosis/)
SEEMAB AKHTAR
9
Box and Whisker Plots
A Box and Whisker plot shows the five number summary of a set of DATA
QQ Plot
Q-Q (quantile-quantile) plots are used in statistics to graphically analyze and compare two
probability distributions by plotting their quantiles against each other. If the two distributions
under consideration are exactly equal, the points on the Q-Q plot will perfectly lie on a straight
line y = x. As a data scientist or statistician in general, you need to know whether the distribution
is normal or not in order to apply various statistical measures to the data and interpret it in a much
more human-understandable visual representation, which is where their Q-Q plot comes in. Q-Q
plots are used to determine the type of distribution for a random variable, such as a Gaussian
Distribution, Uniform Distribution, Exponential Distribution, or Normal Distribution. (Source:
https://towardsdatascience.com/q-q-plots-explained-5aa8495426c0)
SEEMAB AKHTAR
10
(Source: https://towardsdatascience.com/q-q-plots-explained-5aa8495426c0)
SEEMAB AKHTAR
11
Normal Distribution
Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the mean are more frequent
in occurrence than data far from the mean. In graphical form, the normal distribution appears as a
"bell curve".( Source: https://www.investopedia.com/terms/n/normaldistribution.asp)
Normality Test in R: A data set said to be normal distribution if its skewness is zero and kurtosis
is 3. There are four methods in R for testing the normality of any data set and these are-
1. (Visual Method) Create a histogram.
 If the histogram is roughly “bell-shaped”, then the data is assumed to be normally
distributed.
2. (Visual Method) Create a Q-Q plot.
 If the points in the plot roughly fall along a straight diagonal line, then the data is assumed
to be normally distributed.
3. (Formal Statistical Test) Perform a Shapiro-Wilk Test.
 If the p-value of the test is greater than α = .05, then the data is assumed to be normally
distributed.
4. (Formal Statistical Test) Perform a Kolmogorov-Smirnov Test.
 If the p-value of the test is greater than α = .05, then the data is assumed to be normally
distributed.
5. (Formal Statistical Test) Perform a Jarque-Bera Normality Test.
 If the p-value of the test is greater than α = .05, then the data is assumed to be normally
distributed.
SEEMAB AKHTAR
12
6. (Formal Statistical Test) Perform an Anderson-Darling Test.
 An Anderson-Darling Test is a goodness of fit test that measures how well your data fit a
specified distribution. The null hypothesis for the A-D test is that the data does follow a
normal distribution. Thus, if our p-value for the test is below our significance level
(common choices are 0.10, 0.05, and 0.01), then we can reject the null hypothesis and
conclude that we have sufficient evidence to say our data does not follow a normal
distribution.
7. (Formal Statistical Test) Perform a Chi-Square of goodness of fit Test
 The Chi-Square Test for Normality allows us to check whether or not a model or theory
follows an approximately distribution. To apply the Chi-Square Test for Normality to any
data set, let your null hypothesis be that your data is sampled from a normal distribution
and apply the Chi-Square Goodness of Fit Test. Given your mean and standard deviation,
you will need to calculate the expected values under the normal distribution for every data
point. Then use the formula-
to find the chi-square statistic. Compare this to the critical chi-square value from a chi-
square table, given your degrees of freedom and desired alpha level. If your chi-square
statistic is larger than the table value, you may conclude your data is not normal.
Reasons for the Non Normal Distribution
1. Outliers can cause your data become skewed. The mean is especially sensitive to outliers.
Try removing any extreme high or low values and testing your data again
2. Multiple distributions may be combined in your data, giving the appearance of
a bimodal or multimodal distribution. For example, two sets of normally distributed test
results are combined in the following image to give the appearance of bimodal data.
3. Insufficient Data can cause a normal distribution to look completely scattered.
SEEMAB AKHTAR
13
Dealing with Non Normal Distributions
You have several options for handling your non normal data. Many tests, including the one
sample Z test, T test and ANOVA assume normality. You may still be able to run these tests if
your sample size is large enough (usually over 20 items). You can also choose to transform the data
with a function, forcing it to fit a normal model. However, if you have a very small sample, a sample
that is skewed or one that naturally fits another distribution type, you may want to run a non-
parametric test. A non-parametric test is one that doesn’t assume the data fits a specific distribution
type. Non parametric tests include the Wilcoxon signed rank test, the Mann-Whitney U Test and
the Kruskal-Wallis test.
(Source: https://www.statisticshowto.com/probability-and-statistics/non-normal-distributions/)
Standard Normal Distribution
The standard normal distribution, also called the z-distribution, is a special normal
distribution where the mean is 0 and the standard deviation is 1. Any normal distribution can be
standardized by converting its values into z-scores. Z-scores tell you how many standard
deviations from the mean each value lies.(Source: https://www.scribbr.com/statistics/standard-
normaldistribution/#:~:text=The%20standard%20normal%20distribution%2C%20also,the%20m
ean%20each%20value%20lies). The probability density function for the normal distribution
having mean μ and standard deviation σ is given by the function-
If we let the mean μ = 0 and the standard deviation σ = 1 in the probability density function in
Figure 1, we get the probability density function for the standard normal distribution-
SEEMAB AKHTAR
14
68%-95%-99.7% Rule
The 68% - 95% - 99.7% is a rule of thumb that allows practitioners of statistics to estimate
the probability that a randomly selected number from the standard normal distribution occurs
within 1, 2, and 3 standard deviations of the mean at zero.
(Source: https://mse.redwoods.edu/darnold/math15/UsingRInStatistics/StandardNormal.php)
Central Limit Theorem
The Central Limit Theorem tells us that the distribution of sample means x, of samples of
size n taken from any given population
1.Becomes more "normal" in shape as n increases;
2.Mean that agrees with the population mean, μ; and
3.Standard deviation equal to n/√σ, where σ is the standard deviation of the population
(Source http://mathcenter.oxford.emory.edu/site/home/futurePages/rProjectCentralLimitTheorem
SEEMAB AKHTAR
15
In this project, we will construct a population, and then approximate the distributions of sample
means for various sample sizes through repeated sampling, so that we can "see" this theorem in
action through a sequence of histograms -- as suggested by the below graphic
Types of Errors in Statistics
There are two types of error in statistics that is the type I & type II. In a statistical test,
the Type I error is the elimination of the true null theories. In contrast, the type II error is the non-
elimination of the false null hypothesis. Plenty of the statistical method rotates around the
reduction of one or both kind of errors, although the complete rejection of either of them is
impossible. But by choosing the low threshold value and changing the alpha level, the features of
the hypothesis test could be maximized. The information on type I error & type II error is used for
bio-metrics, medical science, and computer science. (https://statanalytica.com/blog/types-of-error-
in-statistics/).
SEEMAB AKHTAR
16
(Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
SEEMAB AKHTAR
17
(Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
(Source: https://towardsdatascience.com/q-q-plots-explained-5aa8495426c0)
SEEMAB AKHTAR
18
Part 2 Advanced Statistics by R Programming
Trend Surface Analysis
Trend Surface Analysis is the model used attempts to decompose each observation on a spatially
distributed variable into a component associated with any regional trends present in the data and a
component associated with purely local effects. This separation into two components is
accomplished by fitting a best-fit surface by using standard regression techniques. The values
predicted by this trend-surface are assigned to the regional effects whereas the local departures of
the observed data from it, or residuals, are assigned to the local effects. In order to plot values on
a map the geographer needs three pieces of information, the x, y spatial co-ordinates of each point
together with the heights above some datum and the z co-ordinate. The z values might relate to
any variable but the whole operation defines a spatial series in which the z observations are ordered
with respect to the two spatial co-ordinates, x and y. The map would be completed by drawing
lines of equal z value (contours or isolines) through the points. The resulting contour-type map
defines a complex surface which in most cases will reveal a spatial structure or form. A trend
surface analysis assumes that each mapped value can be decomposed into two components that
arise from two scales of process-
A) According to Krumbein and Graybill (1965) this trend is 'associated with 'large scale' systematic
changes that extend from one map edge to the other'. Similarly, Grant (1961) defines trend as '. . .
that part of the data that varies smoothly. In other words, it is the function that behaves predictably'.
B) The combined result of two processes that operate over an area substantially smaller than the
study area, random fluctuations and errors of measurement. This forms an assumed error, local
component, or residual defined by Krumbein and Graybill as '. . . apparently non-systematic
fluctuations that are superimposed on the large scale patterns'. It is important to notice that at the
scale of observation, these residuals appear to be spatially random; they may prove to be
systematically related to a spatial process but at this scale they do not vary systematically over the
mapped area.
SEEMAB AKHTAR
19
Mathematically-
observed value of trend component + residual at surface at that point
If component (A), the trend, varies smoothly over space its value (height) at any particular point
can be expressed in terms of the spatial co-ordinates of that point, so that the basic equation of any
trend analysis becomes
Zobs= f(Xi+Yi)+Ui
Zi= The observed value of the surface at the ith
point.
Xi= The co-ordinate on the x-axis (northing) of the ith data point
Yi= The co-ordinate on the x-axis (northing) of the ith data point.
Ui= The residual at the ith data point.
f denotes “some function”, and thus the term f(Xi, Yi) indicates the trend component. By function
we simply mean that if we know the location of any point i as a pair of spatial co-ordinate Xi and
Yi , then the height of the trend component at this point can be found by simply plugging these Xi
and Yi in to a known equation (or function). It follows that we can calculate a trend component
for any combination of X and Y that f(Xi+Yi) denotes a complete surface of trend components
called trend surface. Trend surface analysis is very important in a fundamental concept of
geostatistics when mathematically deal with the notion of spatial information that exhibit the areas
if massive values and one-of-a kind areas smaller values. Then a concept of geographical trend
arises which is related to the position of spatial data because geostatistical estimation would
assume stationarity and away from the spatial data would estimate with global mean (simple
kriging) (S R Vieira et al., 2009). The order of the stationary hypothesis can rely on the order of
the applied mathematics wished to be stagnant. Thus, once second order stationarity is needed, a
minimum of the second order variable (mean and variance) should be stationary. A collection of n
Values of Z (Xi), and the mean value E exist and doesn’t rely on the geographic location Xi,
where Xi are going to be inherent if it follows equation 1 & 2:
E{Z(Xi)} = m------------- (1)
SEEMAB AKHTAR
20
The augmentation [Z(Xi) –Z(Xi+h)] in variance is finite and does not rest on on the geographic
location Xi. This condition can be written in the form of equation as:
VAR[Z(Xi)-Z(Xi+h)] = E[Z(Xi)-Z(Xi+h)]2
---------------- (2)
Where m is the mean and h is the small increment in position Xi.
Order of the model
The order of the polynomial model is kept as low as possible. Some transformations can be used
to keep the model to be of the first order. If this is not satisfactory, then the second-order
polynomial is tried. Arbitrary fitting of higher-order polynomials can be a serious abuse of
regression analysis. A model which is consistent with the knowledge of data and its environment
should be taken into account. It is always possible for a polynomial of order n 1 to pass through
n points so that a polynomial of sufficiently high degree can always be found that provides a
“good” fit to the data. Such models neither enhance the understanding of the unknown function
nor be a good predictor.
First Degree Polynomial Function
First degree polynomials have terms with a maximum degree of 1. In other words, you
wouldn’t usually find any exponents in the terms of a first degree polynomial. For example,
the following are first degree polynomials:
 2x + 1,
 xyz + 50,
 10a + 4b + 20.
Second Degree Polynomial Function
Second degree polynomials have at least one second degree term in the expression (e.g. 2x2
,
a2
, xyz2
). There are no higher terms (like x3
or abc5
). The quadratic function
f(x) = ax2
+ bx + c is an example of a second degree polynomial.
SEEMAB AKHTAR
21
General Form of Nth Degree Polynomial
The general form used to represent nth degree polynomial is,
P(x)=anxn
+an−1xn-1
+an−2xn-2
+....+ a1x1
+a0
Here, a0,a1,a2,...,ana0,a1,a2,...,an are the coefficients that take numerical values as their
inputs, xx is the variable, and nn is the degree of the polynomial, which is a whole number.
Mathematical Polynomial Surface Trend Identification
(Source: David J. Unwin, 1978)
Third Degree Polynomial Function
A cubic function (or third-degree polynomial) can be written as:
where a, b, c, and d are constants terms, and a is nonzero.
SEEMAB AKHTAR
22
SEEMAB AKHTAR
23
Mann Kendall Test on Time Series Data
The Mann Kendall Trend Test (sometimes called the M-K test) is used to analyze data
collected over time for consistently increasing or decreasing trends (monotonic) in Y values. It is
a non-parametric test, which means it works for all distributions (i.e. your data doesn’t have to
meet the assumption of normality), but your data should have no serial correlation. Sen’s slope is
a process for estimating the slope of trend in a sample of n pairs of data. It is a linear model Y(t)
and can be described as Y(t)= Mt+C where M is the slope and C is a constant. Slopes of all data
pairs of the slopes M (Sen’s slope) are calculated as:
𝑀 =
𝑋𝑖 − 𝑋𝑘
𝑗 − 𝑘
Where i= 1,2,3…………………...N, j>k.
Median of the N values of Mi and the N values of Mi are ranked from the smallest to the largest is
called Sen’s slope. The confidence interval about the slope (Gilbert 1987) can be calculated as:
C.I.α= Z1- α/2√Var(s)
Var(s) is defined in equation (3) and Z1- α/2 is estimated from the standard normal distribution table
SEEMAB AKHTAR
24
Part 3 GIS with R Programming
Polygonal Shape file, Line Shape File, Point Shape File & Contouring
In this section, we will open and plot point, line, counter and polygon vector data stored
in shape file format in R.
Clipping and Crop
This tutorial will guide you in a step-by-step process to mask and crop a raster from shape
file in R.
(Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/clip.htm)
SEEMAB AKHTAR
25
Spatial Plot, Level plot
In this part we will see an introduction to analyzing spatial data in R, specifically through
map-making with R’s ‘base’ graphics and various dedicated map-making packages. It teaches the
basics of using R as a fast, user-friendly and extremely powerful command-line Geographic
Information System (GIS).
Image Stacking and Regression pixel to pixel and box plots over Raster
In this part we will learn about the Image stacking of different raster band, Regression and
box plots of raster surfaces.
Image stacking of raster Surfaces
SEEMAB AKHTAR
26
References
Drapela K. & Drapelova I. 2011. “Application of Mann-Kendall test and Sen’s slope estimates for
trend detection in deposition data from Biky Kriz, Mendelova Univerzita V Brne, Beskydy.” Vol
4(2), pp 133-146.
Gilbert, R. O. 1988. “Statistical Methods for Environmental Pollution Monitoring.” Biometrics 44(1): 319.
https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/clip.htm)
https://statanalytica.com/blog/types-of-error-in-statistics/
http://mathcenter.oxford.emory.edu/site/home/futurePages/rProjectCentralLimitTheorem
https://towardsdatascience.com/q-q-plots-explained-5aa8495426c0
https://www.investopedia.com/terms/n/normaldistribution.asp
https://www.r-project.org/
https://www.rstudio.com/products/rstudio/download/
https://www.udemy.com/user/sandeepkumar1/
Kampata, J. M., B. P. Parida, and D. B. Moalafhi. 2008. “Trend Analysis of Rainfall in the
Headstreams of the Zambezi River Basin in Zambia.” Physics and Chemistry of the Earth 33(8–
13): 621–25.
Pyrcz, M. J., & Deutsch, C. V. (2014). Geostatistical reservoir modeling. Oxford university press.
Silva, Richarde Marques et al. 2015. “Rainfall and River Flow Trends Using Mann–Kendall and
Sen’s Slope Estimator Statistical Tests in the Cobres River Basin.” Natural Hazards 77(2): 1205–
21.
Vekaria, R. M., Shirley, D. G., Sévigny, J., & Unwin, R. J. (2006). Immunolocalization of
ectonucleotidases along the rat nephron. American Journal of Physiology-Renal
Physiology, 290(2), F550-F560.
Vieira, Sidney Rosa, José Ruy Porto de Carvalho, Marcos Bacis Ceddia, and Antonio Paz
González. 2010. “Detrending Non Stationary Data for Geostatistical Applications.” Bragantia
69(suppl): 01–08.
Wu, Chunfa et al. 2011. “Spatial Interpolation of Severely Skewed Data with Several Peak Values
by the Approach Integrating Kriging and Triangular Irregular Network Interpolation.”
Environmental Earth Sciences 63(5): 1093–1103.
Ad

More Related Content

Similar to A course work on R programming for basics to advance statistics and GIS.pdf (20)

Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
Optimus Information Inc.
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
Roger Barga
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
Raman Kannan
 
Regression kriging
Regression krigingRegression kriging
Regression kriging
FAO
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Shesha R
 
Engineering Data Analysis OEL Presentation.pptx
Engineering Data Analysis OEL Presentation.pptxEngineering Data Analysis OEL Presentation.pptx
Engineering Data Analysis OEL Presentation.pptx
ramzanshafiq524
 
Seasonal Decomposition of Time Series Data
Seasonal Decomposition of Time Series DataSeasonal Decomposition of Time Series Data
Seasonal Decomposition of Time Series Data
Programming Homework Help
 
Curvefitting
CurvefittingCurvefitting
Curvefitting
Philberto Saroni
 
FAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisFAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and Analysis
Quynh Tran
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
Taylor Martell
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsData Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Colleen Farrelly
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Seval Çapraz
 
difference between dynamic programming and divide and conquer
difference between dynamic programming and divide and conquerdifference between dynamic programming and divide and conquer
difference between dynamic programming and divide and conquer
SRISHTISRIVASTAVA212
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
RAHUL WAGAJ
 
Edu 8006 8 assignment edu8006
Edu 8006 8 assignment edu8006Edu 8006 8 assignment edu8006
Edu 8006 8 assignment edu8006
arnitaetsitty
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerData
IRJET Journal
 
icpr_2012
icpr_2012icpr_2012
icpr_2012
Andrew Gilbert
 
Pca seminar final report
Pca seminar final reportPca seminar final report
Pca seminar final report
Institute Of Technical Education And Research
 
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczxPCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
JuanManuelNasralaAlv1
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
PyData
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
Roger Barga
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
Raman Kannan
 
Regression kriging
Regression krigingRegression kriging
Regression kriging
FAO
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Shesha R
 
Engineering Data Analysis OEL Presentation.pptx
Engineering Data Analysis OEL Presentation.pptxEngineering Data Analysis OEL Presentation.pptx
Engineering Data Analysis OEL Presentation.pptx
ramzanshafiq524
 
FAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisFAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and Analysis
Quynh Tran
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
Taylor Martell
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsData Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Colleen Farrelly
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Seval Çapraz
 
difference between dynamic programming and divide and conquer
difference between dynamic programming and divide and conquerdifference between dynamic programming and divide and conquer
difference between dynamic programming and divide and conquer
SRISHTISRIVASTAVA212
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
RAHUL WAGAJ
 
Edu 8006 8 assignment edu8006
Edu 8006 8 assignment edu8006Edu 8006 8 assignment edu8006
Edu 8006 8 assignment edu8006
arnitaetsitty
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerData
IRJET Journal
 
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczxPCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
JuanManuelNasralaAlv1
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
PyData
 

More from SEEMAB AKHTAR (8)

Master Business Strategy with R Programming_part_2.pdf
Master Business Strategy with R Programming_part_2.pdfMaster Business Strategy with R Programming_part_2.pdf
Master Business Strategy with R Programming_part_2.pdf
SEEMAB AKHTAR
 
Master Business Strategy with R Programming.pdf
Master Business Strategy with R Programming.pdfMaster Business Strategy with R Programming.pdf
Master Business Strategy with R Programming.pdf
SEEMAB AKHTAR
 
GATE 2022 GG Exam! | Comprehensive Solved Papers for Geology & Geophysics – U...
GATE 2022 GG Exam! | Comprehensive Solved Papers for Geology & Geophysics – U...GATE 2022 GG Exam! | Comprehensive Solved Papers for Geology & Geophysics – U...
GATE 2022 GG Exam! | Comprehensive Solved Papers for Geology & Geophysics – U...
SEEMAB AKHTAR
 
2023_Geology_Geophysics_Solved_Question_Paper.pdf
2023_Geology_Geophysics_Solved_Question_Paper.pdf2023_Geology_Geophysics_Solved_Question_Paper.pdf
2023_Geology_Geophysics_Solved_Question_Paper.pdf
SEEMAB AKHTAR
 
Time Trend Analysis of Rainfall and Geostatistical Modelling of Groundwater ...
Time Trend Analysis of Rainfall and Geostatistical Modelling of  Groundwater ...Time Trend Analysis of Rainfall and Geostatistical Modelling of  Groundwater ...
Time Trend Analysis of Rainfall and Geostatistical Modelling of Groundwater ...
SEEMAB AKHTAR
 
Indian stratigraph part 2
Indian stratigraph part 2Indian stratigraph part 2
Indian stratigraph part 2
SEEMAB AKHTAR
 
INDIAN STRATIGRAPHY COMPLETE PART 1 FOR GATE NET GSI & IAS EXAM
INDIAN  STRATIGRAPHY COMPLETE PART 1 FOR GATE NET GSI & IAS EXAMINDIAN  STRATIGRAPHY COMPLETE PART 1 FOR GATE NET GSI & IAS EXAM
INDIAN STRATIGRAPHY COMPLETE PART 1 FOR GATE NET GSI & IAS EXAM
SEEMAB AKHTAR
 
GIS MODELING AND ROUTE OPTIMIZATION
GIS MODELING AND ROUTE OPTIMIZATIONGIS MODELING AND ROUTE OPTIMIZATION
GIS MODELING AND ROUTE OPTIMIZATION
SEEMAB AKHTAR
 
Master Business Strategy with R Programming_part_2.pdf
Master Business Strategy with R Programming_part_2.pdfMaster Business Strategy with R Programming_part_2.pdf
Master Business Strategy with R Programming_part_2.pdf
SEEMAB AKHTAR
 
Master Business Strategy with R Programming.pdf
Master Business Strategy with R Programming.pdfMaster Business Strategy with R Programming.pdf
Master Business Strategy with R Programming.pdf
SEEMAB AKHTAR
 
GATE 2022 GG Exam! | Comprehensive Solved Papers for Geology & Geophysics – U...
GATE 2022 GG Exam! | Comprehensive Solved Papers for Geology & Geophysics – U...GATE 2022 GG Exam! | Comprehensive Solved Papers for Geology & Geophysics – U...
GATE 2022 GG Exam! | Comprehensive Solved Papers for Geology & Geophysics – U...
SEEMAB AKHTAR
 
2023_Geology_Geophysics_Solved_Question_Paper.pdf
2023_Geology_Geophysics_Solved_Question_Paper.pdf2023_Geology_Geophysics_Solved_Question_Paper.pdf
2023_Geology_Geophysics_Solved_Question_Paper.pdf
SEEMAB AKHTAR
 
Time Trend Analysis of Rainfall and Geostatistical Modelling of Groundwater ...
Time Trend Analysis of Rainfall and Geostatistical Modelling of  Groundwater ...Time Trend Analysis of Rainfall and Geostatistical Modelling of  Groundwater ...
Time Trend Analysis of Rainfall and Geostatistical Modelling of Groundwater ...
SEEMAB AKHTAR
 
Indian stratigraph part 2
Indian stratigraph part 2Indian stratigraph part 2
Indian stratigraph part 2
SEEMAB AKHTAR
 
INDIAN STRATIGRAPHY COMPLETE PART 1 FOR GATE NET GSI & IAS EXAM
INDIAN  STRATIGRAPHY COMPLETE PART 1 FOR GATE NET GSI & IAS EXAMINDIAN  STRATIGRAPHY COMPLETE PART 1 FOR GATE NET GSI & IAS EXAM
INDIAN STRATIGRAPHY COMPLETE PART 1 FOR GATE NET GSI & IAS EXAM
SEEMAB AKHTAR
 
GIS MODELING AND ROUTE OPTIMIZATION
GIS MODELING AND ROUTE OPTIMIZATIONGIS MODELING AND ROUTE OPTIMIZATION
GIS MODELING AND ROUTE OPTIMIZATION
SEEMAB AKHTAR
 
Ad

Recently uploaded (20)

How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhhChapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
ChrisjohnAlfiler
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Collibra DQ Installation setup and debug
Collibra DQ Installation setup and debugCollibra DQ Installation setup and debug
Collibra DQ Installation setup and debug
karthikprince20
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhhChapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
ChrisjohnAlfiler
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Collibra DQ Installation setup and debug
Collibra DQ Installation setup and debugCollibra DQ Installation setup and debug
Collibra DQ Installation setup and debug
karthikprince20
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Ad

A course work on R programming for basics to advance statistics and GIS.pdf

  • 1. A course work on R programming for basics to advance statistics and GIS
  • 2. SEEMAB AKHTAR 1 PREFACE R has been around since 1995 and has today become the most popular programming language among data scientists around the word. It includes several data packages and functions which makes it an attractive programming language for data scientists. R gives wonderful platform in data analysis, data wrangling, data visualization, machine learning and open source. This course covers traditional statistics to advance statistics and GIS applications of R, such as models, graph descriptive statistics, mathematical trend modeling and spatial plot. R is designed to be a tool that helps scientists for analyzing data and It has many excellent functions that make plots and fit models to data. Because of this, many statisticians learn to use R as if it were a piece of software; they discover which functions accomplish what they need and ignore the rest. Speaker & Instructor SEEMAB AKHTAR M.Tech (Mineral Exploration) IIT (ISM) Dhanbad M.Sc. (Applied Geology) University of Allahabad Email: [email protected] Social site: https://www.linkedin.com/in/seemab-akhtar-3b7856139/ YouTube: https://www.youtube.com/c/KnowledgeEducationHub Specialization: Geostatistics, GIS & Groundwater resource management Experience: Six years’ experience in the field of Geostatistics, GIS and Groundwater resource management
  • 3. SEEMAB AKHTAR 2 A course work on R programming for basics to advance statistics and GIS Serial No. Contents Time 1 R and R studio installation, Packages 10:00 AM– 10:30 AM 2 Part 1 Basics statistics by R programming Starting 10:30 AM to 12:00 PM 1. The Fundamental of Descriptive Statistics 2. Box Plot, Bar Plot, Histogram Plot, QQ plot 3. Measures of Central Tendency (Mean, Median & Mode) 4.1 Skewed Plot 4.2 Normal Distribution 4.3 Standard Normal Distribution 4.4 Central limit Theorem 4.5 Different Statistical Error (Introduction) 5. P value (Introduction) 6. Regression Analysis Part 2 Advanced Statistics by R Programming 3 1.1 Mathematical Polynomial Trend Surface Identification 1.2 Trend Removal 2.1 Mann Kendall Test 2.3 Sen’s Slope Starting 12:30 PM to 3:30 PM Part 3 GIS with R Programming 4 1. Polygonal Shape file, Line Shape File, Point Shape File 2. Clipping and Mask 3. Spatial Plot, Level plot 4. Countering 5. Image Stacking and Regression, Pixel-Pixel, box plots over Raster surfaces Starting 4:00 PM to 5:30 PM
  • 4. SEEMAB AKHTAR 3 R and R Studio Installation Install first R (4.1.0 version or above) after then install R studio from the website (https://www.r-project.org/ & https://www.rstudio.com/products/rstudio/download/) Code for packages installment #install.packages(“package name”) Or open R studio and click on install and type the packages name in the browser (figure 1) (IDE interface of R studio)
  • 5. SEEMAB AKHTAR 4 Preliminary requirement for R programming  Laptop or PC (4 GB RAM)  Good internet connection like Jio 4G volte  Code (This will be sent to all participant before starting the course)  Make a folder on your desktop gives name R  After the installation of R and R studio open the R studio (Integrated Development Environment) interface.  Make a directory and gives the path for your folder name (R)  #setwd("C:UsersdellDesktopR") Install the following packages  raster  rasterVis  zoo  xts  ppcor  rts  rgdal  spatialEco  Kendall  readr  readxl  gstat  sp  lattice  ggplot2  rgeos  spacetime  RColorBrewer  latticeExtra  map  if(!require(psych)){install.packages("psych")}  if(!require(DescTools)){install.packages("DescTools")}  if(!require(Rmisc)){install.packages("Rmisc")}  if(!require(FSA)){install.packages("FSA")}  if(!require(plyr)){install.packages("plyr")}  if(!require(boot)){install.packages("boot")}
  • 6. SEEMAB AKHTAR 5 Part 1 Basics statistics by R programming Descriptive Statistics (Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/) (Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
  • 7. SEEMAB AKHTAR 6 (Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/) (Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
  • 8. SEEMAB AKHTAR 7 (Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
  • 9. SEEMAB AKHTAR 8 Kurtosis Kurtosis is a measure of the tailedness of a distribution. Tailedness is how often outliers occur. Excess kurtosis is the tailedness of a distribution relative to a normal distribution.  Distributions with medium kurtosis (medium tails) are mesokurtic.  Distributions with low kurtosis (thin tails) are platykurtic.  Distributions with high kurtosis (fat tails) are leptokurtic. Tails are the tapering ends on either side of a distribution. They represent the probability or frequency of values that are extremely high or low compared to the mean. In other words, tails represent how often outliers occur. (Source https://www.scribbr.com/statistics/kurtosis/)
  • 10. SEEMAB AKHTAR 9 Box and Whisker Plots A Box and Whisker plot shows the five number summary of a set of DATA QQ Plot Q-Q (quantile-quantile) plots are used in statistics to graphically analyze and compare two probability distributions by plotting their quantiles against each other. If the two distributions under consideration are exactly equal, the points on the Q-Q plot will perfectly lie on a straight line y = x. As a data scientist or statistician in general, you need to know whether the distribution is normal or not in order to apply various statistical measures to the data and interpret it in a much more human-understandable visual representation, which is where their Q-Q plot comes in. Q-Q plots are used to determine the type of distribution for a random variable, such as a Gaussian Distribution, Uniform Distribution, Exponential Distribution, or Normal Distribution. (Source: https://towardsdatascience.com/q-q-plots-explained-5aa8495426c0)
  • 12. SEEMAB AKHTAR 11 Normal Distribution Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graphical form, the normal distribution appears as a "bell curve".( Source: https://www.investopedia.com/terms/n/normaldistribution.asp) Normality Test in R: A data set said to be normal distribution if its skewness is zero and kurtosis is 3. There are four methods in R for testing the normality of any data set and these are- 1. (Visual Method) Create a histogram.  If the histogram is roughly “bell-shaped”, then the data is assumed to be normally distributed. 2. (Visual Method) Create a Q-Q plot.  If the points in the plot roughly fall along a straight diagonal line, then the data is assumed to be normally distributed. 3. (Formal Statistical Test) Perform a Shapiro-Wilk Test.  If the p-value of the test is greater than α = .05, then the data is assumed to be normally distributed. 4. (Formal Statistical Test) Perform a Kolmogorov-Smirnov Test.  If the p-value of the test is greater than α = .05, then the data is assumed to be normally distributed. 5. (Formal Statistical Test) Perform a Jarque-Bera Normality Test.  If the p-value of the test is greater than α = .05, then the data is assumed to be normally distributed.
  • 13. SEEMAB AKHTAR 12 6. (Formal Statistical Test) Perform an Anderson-Darling Test.  An Anderson-Darling Test is a goodness of fit test that measures how well your data fit a specified distribution. The null hypothesis for the A-D test is that the data does follow a normal distribution. Thus, if our p-value for the test is below our significance level (common choices are 0.10, 0.05, and 0.01), then we can reject the null hypothesis and conclude that we have sufficient evidence to say our data does not follow a normal distribution. 7. (Formal Statistical Test) Perform a Chi-Square of goodness of fit Test  The Chi-Square Test for Normality allows us to check whether or not a model or theory follows an approximately distribution. To apply the Chi-Square Test for Normality to any data set, let your null hypothesis be that your data is sampled from a normal distribution and apply the Chi-Square Goodness of Fit Test. Given your mean and standard deviation, you will need to calculate the expected values under the normal distribution for every data point. Then use the formula- to find the chi-square statistic. Compare this to the critical chi-square value from a chi- square table, given your degrees of freedom and desired alpha level. If your chi-square statistic is larger than the table value, you may conclude your data is not normal. Reasons for the Non Normal Distribution 1. Outliers can cause your data become skewed. The mean is especially sensitive to outliers. Try removing any extreme high or low values and testing your data again 2. Multiple distributions may be combined in your data, giving the appearance of a bimodal or multimodal distribution. For example, two sets of normally distributed test results are combined in the following image to give the appearance of bimodal data. 3. Insufficient Data can cause a normal distribution to look completely scattered.
  • 14. SEEMAB AKHTAR 13 Dealing with Non Normal Distributions You have several options for handling your non normal data. Many tests, including the one sample Z test, T test and ANOVA assume normality. You may still be able to run these tests if your sample size is large enough (usually over 20 items). You can also choose to transform the data with a function, forcing it to fit a normal model. However, if you have a very small sample, a sample that is skewed or one that naturally fits another distribution type, you may want to run a non- parametric test. A non-parametric test is one that doesn’t assume the data fits a specific distribution type. Non parametric tests include the Wilcoxon signed rank test, the Mann-Whitney U Test and the Kruskal-Wallis test. (Source: https://www.statisticshowto.com/probability-and-statistics/non-normal-distributions/) Standard Normal Distribution The standard normal distribution, also called the z-distribution, is a special normal distribution where the mean is 0 and the standard deviation is 1. Any normal distribution can be standardized by converting its values into z-scores. Z-scores tell you how many standard deviations from the mean each value lies.(Source: https://www.scribbr.com/statistics/standard- normaldistribution/#:~:text=The%20standard%20normal%20distribution%2C%20also,the%20m ean%20each%20value%20lies). The probability density function for the normal distribution having mean μ and standard deviation σ is given by the function- If we let the mean μ = 0 and the standard deviation σ = 1 in the probability density function in Figure 1, we get the probability density function for the standard normal distribution-
  • 15. SEEMAB AKHTAR 14 68%-95%-99.7% Rule The 68% - 95% - 99.7% is a rule of thumb that allows practitioners of statistics to estimate the probability that a randomly selected number from the standard normal distribution occurs within 1, 2, and 3 standard deviations of the mean at zero. (Source: https://mse.redwoods.edu/darnold/math15/UsingRInStatistics/StandardNormal.php) Central Limit Theorem The Central Limit Theorem tells us that the distribution of sample means x, of samples of size n taken from any given population 1.Becomes more "normal" in shape as n increases; 2.Mean that agrees with the population mean, μ; and 3.Standard deviation equal to n/√σ, where σ is the standard deviation of the population (Source http://mathcenter.oxford.emory.edu/site/home/futurePages/rProjectCentralLimitTheorem
  • 16. SEEMAB AKHTAR 15 In this project, we will construct a population, and then approximate the distributions of sample means for various sample sizes through repeated sampling, so that we can "see" this theorem in action through a sequence of histograms -- as suggested by the below graphic Types of Errors in Statistics There are two types of error in statistics that is the type I & type II. In a statistical test, the Type I error is the elimination of the true null theories. In contrast, the type II error is the non- elimination of the false null hypothesis. Plenty of the statistical method rotates around the reduction of one or both kind of errors, although the complete rejection of either of them is impossible. But by choosing the low threshold value and changing the alpha level, the features of the hypothesis test could be maximized. The information on type I error & type II error is used for bio-metrics, medical science, and computer science. (https://statanalytica.com/blog/types-of-error- in-statistics/).
  • 17. SEEMAB AKHTAR 16 (Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/)
  • 18. SEEMAB AKHTAR 17 (Source: Sandeep Kumar, https://www.udemy.com/user/sandeepkumar1/) (Source: https://towardsdatascience.com/q-q-plots-explained-5aa8495426c0)
  • 19. SEEMAB AKHTAR 18 Part 2 Advanced Statistics by R Programming Trend Surface Analysis Trend Surface Analysis is the model used attempts to decompose each observation on a spatially distributed variable into a component associated with any regional trends present in the data and a component associated with purely local effects. This separation into two components is accomplished by fitting a best-fit surface by using standard regression techniques. The values predicted by this trend-surface are assigned to the regional effects whereas the local departures of the observed data from it, or residuals, are assigned to the local effects. In order to plot values on a map the geographer needs three pieces of information, the x, y spatial co-ordinates of each point together with the heights above some datum and the z co-ordinate. The z values might relate to any variable but the whole operation defines a spatial series in which the z observations are ordered with respect to the two spatial co-ordinates, x and y. The map would be completed by drawing lines of equal z value (contours or isolines) through the points. The resulting contour-type map defines a complex surface which in most cases will reveal a spatial structure or form. A trend surface analysis assumes that each mapped value can be decomposed into two components that arise from two scales of process- A) According to Krumbein and Graybill (1965) this trend is 'associated with 'large scale' systematic changes that extend from one map edge to the other'. Similarly, Grant (1961) defines trend as '. . . that part of the data that varies smoothly. In other words, it is the function that behaves predictably'. B) The combined result of two processes that operate over an area substantially smaller than the study area, random fluctuations and errors of measurement. This forms an assumed error, local component, or residual defined by Krumbein and Graybill as '. . . apparently non-systematic fluctuations that are superimposed on the large scale patterns'. It is important to notice that at the scale of observation, these residuals appear to be spatially random; they may prove to be systematically related to a spatial process but at this scale they do not vary systematically over the mapped area.
  • 20. SEEMAB AKHTAR 19 Mathematically- observed value of trend component + residual at surface at that point If component (A), the trend, varies smoothly over space its value (height) at any particular point can be expressed in terms of the spatial co-ordinates of that point, so that the basic equation of any trend analysis becomes Zobs= f(Xi+Yi)+Ui Zi= The observed value of the surface at the ith point. Xi= The co-ordinate on the x-axis (northing) of the ith data point Yi= The co-ordinate on the x-axis (northing) of the ith data point. Ui= The residual at the ith data point. f denotes “some function”, and thus the term f(Xi, Yi) indicates the trend component. By function we simply mean that if we know the location of any point i as a pair of spatial co-ordinate Xi and Yi , then the height of the trend component at this point can be found by simply plugging these Xi and Yi in to a known equation (or function). It follows that we can calculate a trend component for any combination of X and Y that f(Xi+Yi) denotes a complete surface of trend components called trend surface. Trend surface analysis is very important in a fundamental concept of geostatistics when mathematically deal with the notion of spatial information that exhibit the areas if massive values and one-of-a kind areas smaller values. Then a concept of geographical trend arises which is related to the position of spatial data because geostatistical estimation would assume stationarity and away from the spatial data would estimate with global mean (simple kriging) (S R Vieira et al., 2009). The order of the stationary hypothesis can rely on the order of the applied mathematics wished to be stagnant. Thus, once second order stationarity is needed, a minimum of the second order variable (mean and variance) should be stationary. A collection of n Values of Z (Xi), and the mean value E exist and doesn’t rely on the geographic location Xi, where Xi are going to be inherent if it follows equation 1 & 2: E{Z(Xi)} = m------------- (1)
  • 21. SEEMAB AKHTAR 20 The augmentation [Z(Xi) –Z(Xi+h)] in variance is finite and does not rest on on the geographic location Xi. This condition can be written in the form of equation as: VAR[Z(Xi)-Z(Xi+h)] = E[Z(Xi)-Z(Xi+h)]2 ---------------- (2) Where m is the mean and h is the small increment in position Xi. Order of the model The order of the polynomial model is kept as low as possible. Some transformations can be used to keep the model to be of the first order. If this is not satisfactory, then the second-order polynomial is tried. Arbitrary fitting of higher-order polynomials can be a serious abuse of regression analysis. A model which is consistent with the knowledge of data and its environment should be taken into account. It is always possible for a polynomial of order n 1 to pass through n points so that a polynomial of sufficiently high degree can always be found that provides a “good” fit to the data. Such models neither enhance the understanding of the unknown function nor be a good predictor. First Degree Polynomial Function First degree polynomials have terms with a maximum degree of 1. In other words, you wouldn’t usually find any exponents in the terms of a first degree polynomial. For example, the following are first degree polynomials:  2x + 1,  xyz + 50,  10a + 4b + 20. Second Degree Polynomial Function Second degree polynomials have at least one second degree term in the expression (e.g. 2x2 , a2 , xyz2 ). There are no higher terms (like x3 or abc5 ). The quadratic function f(x) = ax2 + bx + c is an example of a second degree polynomial.
  • 22. SEEMAB AKHTAR 21 General Form of Nth Degree Polynomial The general form used to represent nth degree polynomial is, P(x)=anxn +an−1xn-1 +an−2xn-2 +....+ a1x1 +a0 Here, a0,a1,a2,...,ana0,a1,a2,...,an are the coefficients that take numerical values as their inputs, xx is the variable, and nn is the degree of the polynomial, which is a whole number. Mathematical Polynomial Surface Trend Identification (Source: David J. Unwin, 1978) Third Degree Polynomial Function A cubic function (or third-degree polynomial) can be written as: where a, b, c, and d are constants terms, and a is nonzero.
  • 24. SEEMAB AKHTAR 23 Mann Kendall Test on Time Series Data The Mann Kendall Trend Test (sometimes called the M-K test) is used to analyze data collected over time for consistently increasing or decreasing trends (monotonic) in Y values. It is a non-parametric test, which means it works for all distributions (i.e. your data doesn’t have to meet the assumption of normality), but your data should have no serial correlation. Sen’s slope is a process for estimating the slope of trend in a sample of n pairs of data. It is a linear model Y(t) and can be described as Y(t)= Mt+C where M is the slope and C is a constant. Slopes of all data pairs of the slopes M (Sen’s slope) are calculated as: 𝑀 = 𝑋𝑖 − 𝑋𝑘 𝑗 − 𝑘 Where i= 1,2,3…………………...N, j>k. Median of the N values of Mi and the N values of Mi are ranked from the smallest to the largest is called Sen’s slope. The confidence interval about the slope (Gilbert 1987) can be calculated as: C.I.α= Z1- α/2√Var(s) Var(s) is defined in equation (3) and Z1- α/2 is estimated from the standard normal distribution table
  • 25. SEEMAB AKHTAR 24 Part 3 GIS with R Programming Polygonal Shape file, Line Shape File, Point Shape File & Contouring In this section, we will open and plot point, line, counter and polygon vector data stored in shape file format in R. Clipping and Crop This tutorial will guide you in a step-by-step process to mask and crop a raster from shape file in R. (Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/clip.htm)
  • 26. SEEMAB AKHTAR 25 Spatial Plot, Level plot In this part we will see an introduction to analyzing spatial data in R, specifically through map-making with R’s ‘base’ graphics and various dedicated map-making packages. It teaches the basics of using R as a fast, user-friendly and extremely powerful command-line Geographic Information System (GIS). Image Stacking and Regression pixel to pixel and box plots over Raster In this part we will learn about the Image stacking of different raster band, Regression and box plots of raster surfaces. Image stacking of raster Surfaces
  • 27. SEEMAB AKHTAR 26 References Drapela K. & Drapelova I. 2011. “Application of Mann-Kendall test and Sen’s slope estimates for trend detection in deposition data from Biky Kriz, Mendelova Univerzita V Brne, Beskydy.” Vol 4(2), pp 133-146. Gilbert, R. O. 1988. “Statistical Methods for Environmental Pollution Monitoring.” Biometrics 44(1): 319. https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/clip.htm) https://statanalytica.com/blog/types-of-error-in-statistics/ http://mathcenter.oxford.emory.edu/site/home/futurePages/rProjectCentralLimitTheorem https://towardsdatascience.com/q-q-plots-explained-5aa8495426c0 https://www.investopedia.com/terms/n/normaldistribution.asp https://www.r-project.org/ https://www.rstudio.com/products/rstudio/download/ https://www.udemy.com/user/sandeepkumar1/ Kampata, J. M., B. P. Parida, and D. B. Moalafhi. 2008. “Trend Analysis of Rainfall in the Headstreams of the Zambezi River Basin in Zambia.” Physics and Chemistry of the Earth 33(8– 13): 621–25. Pyrcz, M. J., & Deutsch, C. V. (2014). Geostatistical reservoir modeling. Oxford university press. Silva, Richarde Marques et al. 2015. “Rainfall and River Flow Trends Using Mann–Kendall and Sen’s Slope Estimator Statistical Tests in the Cobres River Basin.” Natural Hazards 77(2): 1205– 21. Vekaria, R. M., Shirley, D. G., Sévigny, J., & Unwin, R. J. (2006). Immunolocalization of ectonucleotidases along the rat nephron. American Journal of Physiology-Renal Physiology, 290(2), F550-F560. Vieira, Sidney Rosa, José Ruy Porto de Carvalho, Marcos Bacis Ceddia, and Antonio Paz González. 2010. “Detrending Non Stationary Data for Geostatistical Applications.” Bragantia 69(suppl): 01–08. Wu, Chunfa et al. 2011. “Spatial Interpolation of Severely Skewed Data with Several Peak Values by the Approach Integrating Kriging and Triangular Irregular Network Interpolation.” Environmental Earth Sciences 63(5): 1093–1103.