0% found this document useful (0 votes)
7 views21 pages

4-distribution and probability

The document discusses data analysis and probability in relation to geographical climatology, focusing on parameters that affect energy and water balance, and the scales of observation from global to microscale. It explains statistical concepts such as population parameters, sample statistics, frequency distributions, and the normal distribution, including methods for calculating probabilities and analyzing extreme values. Additionally, it covers measures of central tendency, dispersion, and skewness in data distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views21 pages

4-distribution and probability

The document discusses data analysis and probability in relation to geographical climatology, focusing on parameters that affect energy and water balance, and the scales of observation from global to microscale. It explains statistical concepts such as population parameters, sample statistics, frequency distributions, and the normal distribution, including methods for calculating probabilities and analyzing extreme values. Additionally, it covers measures of central tendency, dispersion, and skewness in data distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Analysis and

Probability
Terms:

Geographical climatology parameters: especially the physical


processes in the lowest atmosphere and upper soil layers, which
are the climatic determinants for the local or regional biosphere.

Parameters to pertain energy and water balance, such as precipitation,


humidity, temperature, solar radiation and air motion. CO2 and SO2; dissolved and
suspended matter in precipitation; and soil temperature, moisture and salinity.

Scale of observations
observations are required on the macro-, meso- and microscales. the mesoscale is
defined as 3 km to 100 km, the toposcale or local scale as 100 m to 3 km and the
microscale as less than 100 m
phenomena are analyzed at a variety of scale:
The global scale: includes events that impact large areas of the global and last for
weeks and even months at a time. Such as polar jets influence circles the globe
and influences the polar and middle latitudes with varying extent throughout the
year.
the synoptic scale: can span over 1000s of kilometers and last for many days.
Mid-latitude cyclones, hurricanes, and fronts are examples of synoptic weather
events. Used for making weather forecasts beyond 1 day out.

Mesoscale: typically last from an hour to a day and influence 10s to 100s of
kilometers of distance. Examples include thunderstorms, differential heating
boundaries (i.e. sea breeze).
Microscale: occur typically from minutes up to an hour and cover small distances
such as less than 10 kilometers. Examples include tornadoes, rainbows,
convective updrafts, and downdrafts.
Population parameters and sample statistics:
Any one of the
The two population characteristics mean (µ or m) and statistics mean,
standard deviation (σ or s) are called parameters of the median, mode and
mid-interquartile
population/distribution.
range would seem
to be suitable
each of the sample characteristics, such as sample mean x and for use as an
sample standard deviation s, is called a sample statistic. estimator of the
population mean
µ.
Point estimator: A sample statistic used to provide an estimate of
a corresponding population parameter such as population mean
µ.
Choose the best estimator of a parameter from a set of
estimators, three important desirable properties to choose the
best : are unbiasedness, efficiency and consistency.
1: Frequency distributions:

-used with large set of data


-arrange the observations (obs) in classes
-the number of obs in each class is called frequency.
- The number of classes should not exceed five times the logarithm (base 10) of
the number of observations. e.g. 100 obs should have a maximum number of 10
classes.

- k: number of classes = 5 * log obs


- n: number of obs
- i : lowest class interval = (max obs – min obs) / k
- Mid-mark xi=( max obs + min obs) / 2
- Frequency fi = how many obs in that class
- Probability = Relative frequency % = (frequency / n) *100%
Climatological series of annual rainfall (mm) for Mbabane, Swaziland- south
Africa (1930–1979)

Log 50 = 1.8 *5 times ˷ 9 classes


n= 50
Max obs= 2080 , min obs= 912
i= 2080-912/9=129.8=130 is the lowest class interval
We choose higher interval = 149
Frequency distribution of annual precipitation for Mbabane,
the probability of getting between 1 480 mm and 1 620 mm of rain in
Mbabane is 20 per cent = frequency / n =10/50 = 20%
The probability of getting less than 1 779 mm of rain in Mbabane as in class six is 0.94,
which is arrived at by dividing the cumulative frequency up to this point by 50, the
total number of observations or frequencies = 47/50=94%.
Doing probability based on
normal distributions:
normal curve is symmetric about its centre, ntre s
Ce int i n
having a horizontal axis that runs po mea
indefinitely both to the left and to the right, the

with the tails of the curve towards the axis in


both directions. The vertical axis is chosen in
such a way that the total area under the curve
is exactly 1 (one square unit).
Mean µ = ∑ x / n that
su re
Variance = ∑(x-x) 2 / (n-1) s n o t ti on
e i ul a
standard deviation (σ ) = √ variance If on rent pop icted to
the p
a
b e restr
o rm al,
is n f s i ze ≥30
Standard error of x = m ple so
sa

A comparison among different distributions with different means and different


standard deviations requires -----------transformation

Which means to centre each distribution about the same mean by ( x -xi ) in each
population. This will move each of the distributions along the scale until they are
centred about zero, which is the mean of all transformed distributions.
Each distribution will still maintain a different bell shape, however.
The z-score

It is called standardization or the variable Z equals to:


tells how
m
removed any standard de
from the viations
score is; m e an t h
to the ri e origina
ght or to l x-
the left
Examples:
µ = 80 and σ = 4, in order to convert the X-score 85 into a z-score.
the X-score lies one
standard
deviation to the right of the
If a z-score equivalent of X=74 is computed,mean
you obtain:
the X-score lies 1.5 standard
deviation to the left of the
Workout example: Suppose a population of pumpkins is known to have a normal
distribution with a mean and standard deviation of its length equal to 14.2 cm and
4.7 cm, respectively. What is the probability of finding, by chance, a specimen
shorter than 3 cm? ill lie in
h at X w
b i lity t n db
ro b a n a a
th e p g e b e t w ee
a l to
ra n is e q u
some v en p o in t
a l
n y g i n o r m
on a un d e r t h e
e a r e a a n d b .
t h e e na
e t w
curve b

From the table: 0.9918 the Excel function NORMDIST


can be used to calculate the
1-0.9918 = 0.0082 = 0.82% very small probability
standard normal cumulative
distribution.

But for area between two points, you should calculate z per each point then find the
probability for each z, then prob of Z1 – prob of Z2 = probability between a and b
The heights of all the rice stalks in a
farm are thought to be normally
distributed with mean X = 38 cm and
standard deviation s = 4.5 cm, find the
probability that the height of a stalk
taken at random will be between 35
and 40 cm.
In other words, one would expect
41.86 per cent of the paddy field’s
rice stalks to have heights in that
range.

Extreme value distributions:


Certain crops may be exposed to lethal conditions (frost, excessive heat or cold,
drought, high winds, and so on).
Extreme value analysis typically involves the collection and analysis of annual
maxima of parameters that are observed daily, such as temperature, precipitation
and wind speed.
The Gumbel double exponential distribution is the one most used for describing
extreme values.

m times of an event that has


occurred in a long series of n
independent trials, one per year
say, has an estimated
Probability p
Average interval between recurrences of the event during a long
period would be (Return Period) T:
For example, if there is a 5 % chance that an
event will occur in any one year, its probability of
occurrence is 0.05.
T = 1/0.05 = 20 yrs
an event having a return period of five times in
100 years or once in 20 years.
The mode
The mode is the most frequent value in any array.
The median
The median is obtained by selecting the middle value in an odd-numbered series of
variates or taking the average of the two middle values of an
even-numbered series.

Fractiles
Fractiles such as quartiles, quintiles and deciles are obtained by:
1. ranking the data in ascending order and then
2. counting an appropriate fraction of the integers in the series (n + 1).
3. For quartiles, n + 1 is divided by four,
4. for deciles n + 1 is divided by 10, and
5. for percentiles n + 1 is divided by a hundred.
Thus if n = 50, the first decile is the:

1 / 10 [n + 1]th or the 5.1 th observation in the ascending order.

and the 7th decile is the 7 / 10 [n + 1]th in the rank or the 35.7th observation.

Interpolation is required between


observations.
v a lue
in g a
find en two
et w e line
b o n a
nt s
poi e .
u r v
or c
https://www.microsoft.com/en-us/videoplayer/embed/RWfgd6?
pid=ocpVideo0-innerdiv-oneplayer&maskLevel=20&market=en-us
Measuring dispersion

gives information about the spread or dispersion of the measurements about the
average.

1. The range
This is the difference between the largest and the smallest values. e.g. the annual
range of mean temperature is the difference between the mean daily
temperatures of the hottest and coldest months
2. Variance and the standard deviation.

3. Skewness
represents a tendency of a data distribution
to show a pronounced tail to one side or
another.
A distribution is asymmetrical when
its left and right side are not mirror
images. A distribution can have right
(or positive), left (or negative), or
zero skewness

You might also like