0% found this document useful (0 votes)
108 views

Notes For R Tool

R is an open source statistical software created over 20 years ago by combining S and Scheme programming languages. It has a flexible environment where users can access, modify, and improve source code. R is commonly used by researchers for statistical analysis and machine learning. It includes an integrated development environment called RStudio that has editing, console, workspace, and plot windows to write and run code. Users can install additional R packages from CRAN to expand functionality.

Uploaded by

Sakshi Sharda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Notes For R Tool

R is an open source statistical software created over 20 years ago by combining S and Scheme programming languages. It has a flexible environment where users can access, modify, and improve source code. R is commonly used by researchers for statistical analysis and machine learning. It includes an integrated development environment called RStudio that has editing, console, workspace, and plot windows to write and run code. Users can install additional R packages from CRAN to expand functionality.

Uploaded by

Sakshi Sharda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Introduction to R

• R is an open source software created over 20


years ago by Ihaka and Gentleman at the
University of Auckland, New Zealand.
However, its history is even longer as its
lineage goes back to the S programming
language created by John Chambers out of
Bell Labs back in the 1970s. R is actually a
combination of S with lexical scoping
semantics inspired by Scheme (Morandat and
Hill 2012 ).
Slides used for Educational Purpose only
Flexibility
• Another benefit of open source is that anybody can access the
source code, modify and improve it. As a result, many
excellent programmers contribute to improving existing R
code and developing new capabilities.

• Many researchers in academic institutions are using and


developing R code to develop the latest techniques in statistics
and machine learning

Slides used for Educational Purpose only


The Basics
• Installing R and RStudio
• First, you need to download and install R, a free software
environment for statistical computing and graphics from
CRAN , the Comprehensive R Archive Network. It is highly
recommended to install a precompiled binary distribution
for your operating system; follow these instructions:
• 1. Go to https://cran.r-project.org/
• 2. Click “Download R for Mac/Windows”
• 3. Download the appropriate file:
• (a) Windows users click Base, and download the installer
for the latest R version
• (b) Mac users select the fi le R-3.X.X.pkg that aligns with
your OS version
• 4. Follow the instructions of the installer

Slides used for Educational Purpose only


• Next, you can download RStudio ’s IDE
(integrated development environment), a
powerful user interface for R . RStudio includes a
text editor, so you do not have to install another
stand-alone editor. Follow these instructions:
• 1. Go to RStudio for desktop
https://www.rstudio.com/products/rstudio/downlo
ad/
• 2. Select the install file for your OS
• 3. Follow the instructions of the installer.

Slides used for Educational Purpose only


RStudio Console

Slides used for Educational Purpose only


Script Editor
• The top left window is where your script files will
display. There are multiple forms
• of script files but the basic one to start with is the .
R file. To create a new file you use
• the File → New File menu. To open an existing
file you use either the File → Open
• File… menu or the Recent Files menu to select
from recently opened files.
• RStudio ’s script editor includes a variety of
productivity enhancing features including
• syntax highlighting, code completion, multiple-
file editing, and find/replace.
Slides used for Educational Purpose only
Workspace Environment

Four fundamental windows of the RStudio console


Slides used for Educational Purpose only
• # returns path for the current working directory getwd()
• # set the working directory to a specifi ed directory setwd(directory_name)
• # list all objects
• ls()
• # identify if an R object with a given name is present
• exists("object_name")
• # remove defined object from the environment
• rm ("object_name")
• # you can remove multiple objects by using the c() function
• rm(c("object1", "object2"))
• # basically removes everything in the working environment -- use with
• # caution!
• rm(list = ls() )

Slides used for Educational Purpose only


• You can also save and load your workspaces. Saving your
workspace will save all
• R files and objects within your workspace to a .RData file in
your working directory and loading your workspace will load
any .RData files in your working directory.
• # save all items in workspace to a .RData fi le
• save.image()
• # save specified objects to a .RData fi le
• save (object1, object2, file = "myfi le.RData")
• # load workspace into current session
• load("myfile.RData")

Slides used for Educational Purpose only


Console
• The bottom left window contains the console .
You can code directly in this window but it
will not save your code. It is best to use this
window when you are simply performing
calculator type functions. This is also where
your outputs will be presented when you run
code in your script.

Slides used for Educational Purpose only


Installing Packages
• Your primary source to obtain packages will
likely be from CRAN . To install packages
from CRAN:
• # install packages from CRAN
• install.packages ("packagename")

Slides used for Educational Purpose only


Loading Packages
• Once the package is downloaded to your
computer you can access the functions and
resources provided by the package in two
different ways:
• # load the package to use in the current R
session
• library (packagename)

Slides used for Educational Purpose only


Useful Packages
• There are thousands of helpful R packages for
you to use, but navigating them all can be a
challenge. To help you out, RStudio compiled a
guide to some of the best packages for loading,
manipulating, visualizing, analyzing, and
reporting data. In addition, their list captures
packages that specialize in spatial data, time
series and financial data, increasing speed and
performance, and developing your own R
packages.

Slides used for Educational Purpose only


Assignment and Evaluation
• # assignment
• x <- 3
• # evaluation
• x
• ## [1] 3
• Interestingly, R actually allows for fi ve assignment operators:
• # leftward assignment
• x <- value
• x = value
• x <<- value
• # rightward assignment
• value -> x
• value ->> x

Slides used for Educational Purpose only


R as a Calculator
• At its most basic function R can be used as a calculator . When applying basic
arithmetic, the PEMBDAS order of operations applies: parentheses first followed
by exponentiation, multiplication and division, and finally addition and
subtraction.
• 8+9/5^2
• ## [1] 8.36
• 8 + 9 / (5 ^ 2)
• ## [1] 8.36
• 8 + (9 / 5) ^ 2
• ## [1] 11.24
• (8 + 9) / 5 ^ 2
• ## [1] 0.68
• By default R will display seven digits but this can be changed using options()
• as previously outlined.
• 1/7
• ## [1] 0.1428571
• options(digits = 3)
• 1/7
• ## [1] 0.143
Slides used for Educational Purpose only
Vectorization
• A key difference between R and many other
languages is a topic known as vectorization.
• What does this mean? It means that many
functions that are to be applied individually to
each element in a vector of numbers require a
loop assessment to evaluate; however, in R many
of these functions have been coded in C to
perform much faster than a for loop would
perform. For example, let’s say you want to add
the elements of two separate vectors of numbers (
x and y ).

Slides used for Educational Purpose only


• x <- c(1, 3, 4)
• y <- c(1, 2, 4)
• x ## [1] 1 3 4
• y ## [1] 1 2 4

Slides used for Educational Purpose only


Dealing with Numbers
• Integer vs. Double
• The two most common numeric classes used in R
are integer and double (for double precision
floating point numbers). R automatically converts
between these two classes when needed for
mathematical purposes. As a result, it’s feasible to
use R and perform analyses for years without
specifying these differences.
• To check whether a pre-existing vector is made up
of integer or double values you can use typeof(x)
which will tell you if the vector is a double,
integer, logical, or character type.

Slides used for Educational Purpose only


Creating Integer and Double Vectors
• By default, when you create a numeric vector using the c()
function it will produce a vector of double precision
numeric values. To create a vector of integers using c() you
must specify explicity by placing an L directly after each
number.

• # create a string of double-precision values


• dbl_var <- c (1, 2.5, 4.5)
• dbl_var
• ## [1] 1.0 2.5 4.5
• # placing an L after the values creates a string of integers
• int_var <- c (1L, 6L, 10L)
• int_var
• ## [1] 1 6 10
Slides used for Educational Purpose only
Converting Between Integer and Double Values
• By default, if you read in data that has no decimal points or you create
numeric values using the x <- 1:10 method the numeric values will be
coded as integer.
• If you want to change a double to an integer or vice versa you can specify
one of the following:
• # converts integers to double-precision values
• as.double (int_var)
• ## [1] 1 6 10
• # identical to as.double ()
• as.numeric (int_var)
• ## [1] 1 6 10
• # converts doubles to integers
• as.integer (dbl_var)
• ## [1] 1 2 4

Slides used for Educational Purpose only


Generating Sequence of Non-random
Numbers
• There a re a few R operators and functions that
are especially useful for creating vectors of
non-random numbers. These functions provide
multiple ways for generating sequences of
numbers.

Slides used for Educational Purpose only


Specifying Numbers Within a Sequence
• To explicitly specify numbers in a sequence you can use the colon :
operator to specify all integers between two specified numbers or
the combine c() function to explicity specify all numbers in the
sequence.
• # create a vector of integers between 1 and 10
• 1:10
• ## [1] 1 2 3 4 5 6 7 8 9 10
• # create a vector consisting of 1, 5, and 10
• c (1, 5, 10)
• ## [1] 1 5 10
• # save the vector of integers between 1 and 10 as object x
• x <- 1:10
• x
• ## [1] 1 2 3 4 5 6 7 8 9 10

Slides used for Educational Purpose only


Normal Distribution Numbers
• The normal (or Gaussian) distribution is the most common and well known
distribution. Within R , the normal distribution functions are written as norm() .
• # generate n random numbers from a normal distribution with given
• # mean and standard deviation
• rnorm (n, mean = 0, sd = 1)
• # generate CDF probabilities for value(s) in vector q
• pnorm (q, mean = 0, sd = 1)
• # generate quantile for probabilities in vector p
• qnorm (p, mean = 0, sd = 1)
• # generate density function probabilites for value(s) in vector x
• dnorm (x, mean = 0, sd = 1)
• For example, to generate 25 random numbers from a normal distribution with
• mean = 100 and standard deviation = 15 :
• x <- rnorm (25, mean = 100, sd = 15)
• x
• summary (x)
Slides used for Educational Purpose only
• You can also pass a vector of values. For
instance, say you want to know the
• CDF probabilities for each value in the vector
x created above:
• pnorm (x, mean = 100, sd = 15)

Slides used for Educational Purpose only


Binomial Distribution Numbers
• This is conventionally interpreted as the number of
successes in size = x trials
• and with prob = p probability of success:
• # generate a vector of length n displaying the number of
successes
• # from a trial size = 100 with a probability of success =
0.5
• rbinom (n, size = 100, prob = 0.5)
• # generate CDF probabilities for value(s) in vector q
• pbinom (q, size = 100, prob = 0.5)

Slides used for Educational Purpose only


• # generate quantile for probabilities in vector
p
• qbinom (p, size = 100, prob = 0.5)
• # generate density function probabilites for
value(s) in vector x
• dbinom (x, size = 100, prob = 0.5)

Slides used for Educational Purpose only


Poisson Distribution Numbers

• The Poisson distribution is a discrete probability distribution that expresses


the
• probability of a given number of events occurring in a fixed interval of time
and/or space if these events occur with a known average rate and
independently of the time since the last event.
• # generate a vector of length n displaying the random number of
• # events occurring when lambda (mean rate) equals 4.
• rpois (n, lambda = 4)
• # generate CDF probabilities for value(s) in vector q when lambda
• # (mean rate) equals 4.

Slides used for Educational Purpose only


• ppois (q, lambda = 4)
• # generate quantile for probabilities in vector p
when lambda
• # (mean rate) equals 4.
• qpois (p, lambda = 4)
• # generate density function probabilites for
value(s) in vector x
• # when lambda (mean rate) equals 4.
• dpois (x, lambda = 4)

Slides used for Educational Purpose only


Statistical Methods for Decision Making
Probability Distributions

Slides used for Educational Purpose only


Introduction

• Managers will have to cope with uncertainty


in many decision situations. Concepts of
probability will help you measure uncertainty
and perform associated analysis that are
essential in making effective business
decisions.

Slides used for Educational Purpose only


Probability – Meaning and Concepts

• Probability refers to chance or likelihood of a particular event –


taking place

• An event is an outcome of an experiment.

• An experiment is a process that is performed to understand and


observe possible outcomes.

• Set of all outcomes of an experiment is called the sample space.

Slides used for Educational Purpose only


Example

• In a manufacturing unit three parts from the


assembly are selected. You are observing
whether they are defective or non-defective.
Determine
a) The sample space.
b) the event of getting at least two
defective parts.

Slides used for Educational Purpose only


Solution

a) Let S = Sample space. It is pictured as under


D – Defective
G – Non-defective

b) Let E denote the event of getting at least two defective


parts. This implies that E will contain two defective, and
three defectives. Looking at the sample diagram above, E =
{ GDD, DGD, DDG, DDD}. It is easy to see that E is a part
of S and commonly called as a subset of S. Hence an event
is always a subset of the sample space

Slides used for Educational Purpose only


Definition of probability

• Probability of an event A is defined as the ratio of two numbers


m and n. In symbols

Where m = number of ways that are favourable to the occurrence


of A and n = the total number of outcomes of the experiment
(All possible outcomes)
Please note that P(A) is always >= 0 and always <=1.
P(A) is a pure number.

Slides used for Educational Purpose only


Portability values

• Probability 0 – Impossible event

• Probability 1: Certain event

• Probability always in the range of 0 to 1

Slides used for Educational Purpose only


Assessing Probability

• A Priori classical probability: Based on


knowledge of the process

• Empirical probability: Based on data

• Subjective probability: Based on experience,


analysis of situation and expert opinion

Slides used for Educational Purpose only


Mutually exclusive events
• Two events A and B are said to be mutually exclusive if the occurrence of A precludes
the occurrence of B. For example, from a well shuffled pack of cards, if you pick up
one card at random and would like to know whether it is a King or a Queen. The
selected card will be either a King or a Queen. It cannot be both a King and a Queen. If
King occurs, Queen will not occur and Queen occurs, King will not occur.

Slides used for Educational Purpose only


Independents events

• Two events A and B are said to be


independent if the occurrence of A is in no
way influenced by the occurrence of B.
Likewise occurrence of B is in no way
influenced by the occurrence of A.

Slides used for Educational Purpose only


Rules for computing probability

• 1) Addition Rule – Mutually exclusive events


P(A U B) = P(A) + P(B)
This rule says that the probability of the union of A and B is
determined by adding the probability of the events A and B.

Here the symbol A U B is called A union B meaning A occurs,


or B occurs or both A and B simultaneously occur. When A and
B are mutually exclusive, A and B cannot simultaneously occur.

Slides used for Educational Purpose only


Rules for computing probability

• 1) Addition Rule – Events are not Mutually


exclusive events
P(A U B) = P(A) + P(B) – P(A ∩ B)
This rule says that the probability of the union of A
and B is determined by adding the probability of
the event A and Band then subtracting the
probability of the intersection of the events A and
B.

The symbol A ∩ B is called A intersection B meaning


both A and B simultaneously occur.
Slides used for Educational Purpose only
Example of Addition Rules:

• From a pack of well-shuffled cards, a card is


picked up at random.
1) What is the probability of selected card is a
King or a Queen?
2) What is the probability that the selected card
is a King or a Diamond

Slides used for Educational Purpose only


Solution to part 1)

• Let A = getting a King


• Let B = getting a Queen

There are 4 kings and there are 4 Queens. The events are clearly
mutually exclusive. Apply the formula.
P(A U B) = P(A) + P(B) = 4/52 + 4/52 = 8/52 = 2/13

Slides used for Educational Purpose only


Solution to part 2

• Look at the diagram:


There are totally 52 cards in the pack out of which 4 are kings and 13
are diamonds. Let A = getting a King and B = getting a Diamond. The
two events here are not mutually exclusive because you can have a a
card, which is both king and a Diamond called King Diamond.
P(K U D) = P(K) + P(D) – P(K ∩ D)
= 4/52 + 13/52 – 1/52 = 16/52 = 4/13

Slides used for Educational Purpose only


Multiplication rule

• Independent events
P(A ∩ B) = P(A).P(B)
This rule says when the two events A and B are
independent the probability of the simultaneous
occurrence of A and B (also known as probability
of intersection A and B) equals the product of
the probability of A and the probability of B.
Of course this rule can be extended to more
than two events.
Slides used for Educational Purpose only
Multiplication rule

Independent Events-Example
• Example:
The probability that you will get an A grade in Quantitative Methods is
0.7. The probability that you will get an A grade in Marketing is 0.5.
Assuming these two courses are independent, compute the probability
that you will get an A grade in both these subjects.
• Solution:
Let A = getting A grade in Quantitative Methods
Let B =getting A grade in Marketing
It is given that A and B are independent.
P(A ∩ B) = P(A).P(B) = 0.7.0.5 = 0.35

Slides used for Educational Purpose only


Multiplication Rule

• Events are not independent


P(A ∩ B) = P(A).P(B|A)
• This rule says that the probability of the intersection of the events A
and B equals the product of the probability of A and the probability of
B given that A has happened or known to you. This is symbolized in
the second term of the above expression as P (B/A). P(B/A) is called
the conditional probability of B given the fact that A has happened.

• We can also write P(A ∩ B) = P(B).P(A|B) if B has already happened.

Slides used for Educational Purpose only


Multiplication rule
Events are not independent-Example
From a pack of cards, 2 cards are drawn in succession one after the other.
After every draw, the selected card is not replaced. What is the
probability that in both the draws you will get Spades?
Solution:
Let A = getting spade in the first draw
Let B = getting spade in the second draw.
The cards are not replaced.
This situation requires the use of conditional probability.
• P(A) = 13/52 (There are 13 Spades and 52 cards in a pack)
• P(B|A) = 12/51(There are 12 Spades and 51 cards because the first
card selected
• P(A ∩ B) = P(A).P(B|A) = (13/52).(12/51)=156/2652= 1/17.

Slides used for Educational Purpose only


Marginal Probability
• Contingency table consists of rows and
columns of two attributes at different levels
with frequencies or numbers in each of the
cells. It is a matrix of frequencies assigned to
rows and columns.

• The term marginal is used to indicate that the


probabilities are calculated using contingency
table (also called joint probability table).

Slides used for Educational Purpose only


Marginal Probability-Example
• A survey involving 200 families was conducted. Information regarding
family income per year and whether the family buys a car are given in
the following table.

• a) What is the probability that a randomly selected family is a buyer of


the car?
• b) What is the probability that a randomly selected family is both a
buyer of car and belonging to income of Rs.10 lakhs and above?
• c) A family selected at random is found to be belonging to income of Rs
10 lakhs and above. What is the probability that this family is buyer of
car?

Slides used for Educational Purpose only


Solution
• a) What is the probability that a randomly selected family is a buyer of
the Car?
80/200 = 0.40.

• b) What is the probability that a randomly selected family is both a


buyer of car and belonging to income of Rs.10 lakhs and above?
42/200 =0.21

• c) A family selected at random is found to be belonging to income of


Rs 10 lakhs and above. What is the probability that this family is buyer
of car?
42/80 =0.525. Note this is a case of conditional probability of
buyer given income is Rs.10 lakhs and above.

Slides used for Educational Purpose only


Some interesting problems
for discussion

• What is the probability that a car has a CD


player, given that it has AC?
– i.e., we want to find P(CD | AC)

Slides used for Educational Purpose only


Bayes’ Theorem

• Bayes’ Theorem is used to revise previously


calculated probabilities based on new
information.

• Developed by Thomas Bayes in the 18th


Century.

• It is an extension of conditional probability.

Slides used for Educational Purpose only


Bayes’ Theorem

• Where:
• Bi = ith event of K mutually exclusive and collectively exhaustive
events.
• A = new event that might impact P(Bi)

Slides used for Educational Purpose only


Bayes’ Theorem

P(A|B) P(B)
P(B|A) =
P(A|B) P(B) + P(A|B') P(B')

• Where:
• B’ = Complement of B
• A = new event that might impact P(B)

Slides used for Educational Purpose only


What is Probability Distribution?
• In precise terms, a probability distribution is a total listing of the various
values the random variable can take along with the corresponding
probability of each value. A real life example could be the pattern of
distribution of the machine breakdowns in a manufacturing unit.
• The random variable in this example would be the various values the
machine breakdowns could assume.
• The probability corresponding to each value of the breakdown is the
relative frequency of occurrence of the breakdown.
• The probability distribution for this example is constructed by the actual
breakdown pattern observed over a period of time. Statisticians use the
term “observed distribution” of breakdowns.

Slides used for Educational Purpose only


Binomial Distribution

• The Binomial Distribution is a widely used probability distribution of a


discrete random variable.

• It plays a major role in quality control and quality assurance function.


Manufacturing units do use the binomial distribution for defective analysis.

• Reducing the number of defectives using the proportion defective control


chart (p chart) is an accepted practice in manufacturing organizations.

• Binomial distribution is also being used in service organizations like banks,


and insurance corporations to get an idea of the proportion customers who
are satisfied with the service quality.

Slides used for Educational Purpose only


Binomial Probability Function

• The probability of getting x successes out of n trials is indeed the definition of a


Binomial Distribution. The Binomial Probability Function is given by the following
expression
x can take values 0, 1, 2, ………., n

is the number of ways in which x successes can take place out of n trials

• Where P(x) is the probability of getting x successes in n trials


• p is the probability of success, which is the same through out
the n trials.
• p is the parameter of the Binomial distribution

Slides used for Educational Purpose only


Example for Binomial Distribution

• A bank issues credit cards to customers under the


scheme of Master Card. Based on the past data, the
bank has found out that 60% of all accounts pay on
time following the bill. If a sample of 7 accounts is
selected at random from the current database,
construct the Binomial Probability Distribution of
accounts paying on time.

Slides used for Educational Purpose only


Solution
• This problem can be structured as a Bernoulli Process where an account paying on
time is taken as success and an account not paying on time is taken as failure. The
random variable x represents here an account paying on time, which can take values
0,1,2,3,4,5,6,7. You need to prepare a table containing x and P(x) for all the values
of x. Performing calculations using Binomial Probability Function
For x = 0, 1, 2, 3, 4, 5, 6, 7 is very tedious

• The best option is to use the Microsoft Excel to calculate the Binomial
Probabilities both for individual values and for the cumulative position.
This facility is available under the option "Paste Function". The form of the
function is: n, p, O or 1) where x is the number of successes, n is the
number of trials, and p is the probability of success in each trial. The last
term 0 or I performs a logical operation. If you enter 0, the computer
returns the individual probability value; if 1 is entered, the computer gives
the cumulative probability value

Slides used for Educational Purpose only


Spreadsheet Showing the Solution

Slides used for Educational Purpose only


Poisson Distribution

• Poisson Distribution is another discrete distribution which also plays a major role
in quality control in the context of reducing the number of defects per standard
unit.

• Examples include number of defects per item, number of defects per transformer
produced, number of defects per 100 m2 of cloth, etc.

• Other real life examples would include 1) The number of cars arriving at a highway
check post per hour; 2) The number of customers visiting a bank per hour during
peak business period.

Slides used for Educational Purpose only


Poisson Process

• The probability of getting an event in a continuous interval


such as length, area, time and the like is constant.

• The probability of an event occurs in any one interval is


independent of the probability of event occurring in any other
interval.

• The probability of getting more than one event in an interval


approaches 0 as the interval becomes smaller.

Slides used for Educational Purpose only


Poisson Probability Function

Poisson Distribution Formula

where
P(x) = Probability of x events in an interval given an idea of λ
λ = Average number of events per unit
e = 2.71828(based on natural logarithm)
x = events per unit which can take values 0, 1, 2, 3,…………..∞
λ is the Parameter of the Poisson Distribution.

Slides used for Educational Purpose only


Example

If on an average, 6 customers arrive every two minutes at a bank during the


busy hours of working, a) what is the probability that exactly four customers
arrive in a given minute? b) What is the probability that more than three
customers will arrive in a given minute?

6 customers arrive every two minutes. Therefore , 3 customers arrive every


minute. That implies my lambda=3
P(X=4)=?
P(X>3)=? Implies 1-P(X< =3)?

Slides used for Educational Purpose only


Spreadsheet showing Solution

Poisson Distribution Poisson Distribution


Question Solution (formula) Question Solution
Find P(X=4) =POISSON.DIST(4,3,0) Find P(X=4) 0.168031356

Find P(X > 3) Find P(X > 3)


i.e. Find 1 - P(X <= 3) =1-POISSON.DIST(3,3,1) i.e. Find 1 - P(X <= 3) 0.352768111

The above spreadsheet shows the formula and solution.

Slides used for Educational Purpose only


Normal Distribution

• The Normal Distribution is the most widely used continuous distribution. It


occupies a unique place in the field of statistics. In fact, the entire quality control
function that employs the statistical techniques heavily will come to a grinding halt
without the use of the normal distribution. The control charts for reducing and
stabilizing variation relies on the normal distribution. Process capability studies to
meet the customer specifications needs the normal distribution. The whole subject
matter inferential statistics is based on the normal distribution. In all management
functions including the human side, the observed frequency distributions
encountered are all fairly close to the normal distribution when the sample size is
reasonably large.
Slides used for Educational Purpose only
Properties of Normal Distribution

• The normal distribution is a continuous distribution looking


like a bell. Statisticians use the expression “Bell Shaped
Distribution”.
• It is a beautiful distribution in which the mean, the median,
and the mode are all equal to one another.
• It is symmetrical about its mean.
• If the tails of the normal distribution are extended, they will
run parallel to the horizontal axis without actually touching it.
(asymptotic to the x-axis)
• The normal distribution has two parameters namely the mean
µ and the standard deviation σ

Slides used for Educational Purpose only


Normal Probability Density Function

In the usual notation, the probability density function of the normal distribution is
given below:

x is a continuous normal random variable with -∞ < x < ∞.

Slides used for Educational Purpose only


Standard Normal Distribution

• The Standard Normal Variable is defined as follows:

• Please note that Z is a pure number independent of the unit of measurement. The
random variable Z follows a normal distribution with mean=0 and standard
deviation =1.

Slides used for Educational Purpose only


Example Problem

The mean weight of a morning breakfast cereal pack is 0.295 kg


with a standard deviation of 0.025 kg. The random variable
weight of the pack follows a normal distribution.

a)What is the probability that the pack weighs less than 0.280 kg?

b)What is the probability that the pack weighs more than 0.350
kg?

c)What is the probability that the pack weighs between 0.260 kg


to 0.340 kg?

Slides used for Educational Purpose only


Solution-a)

𝑥−µ
z= = (0.280-0.295)/0.025 = -0.6. Click “Paste Function” of Microsoft Excel, then click the
σ
“statistical” option, then click the standard normal distribution option and enter the z value. You
get the answer. Excel accepts directly both the negative and positive values of z. Excel always
returns the cumulative probability value. When z is negative, the answer is direct. When z is
positive, the answer is =1- the probability value returned by Excel. The answer for part a) of the
question = 0.2743(Direct from Excel since z is negative). This says that 27.43 % of the packs
weigh less than 0.280 kg.

Slides used for Educational Purpose only


Solution-b)

𝑥−µ
z= = (0.350-0.295)/0.025 =2.2. Excel returns a value of 0.9861. Since z is
σ
positive, the required probability is = 1-0.9861 =0.0139. This means that 1.39% of the
packs weigh more than 0.350 kg.

Slides used for Educational Purpose only


Solution-c)

For this part, you have to first get the cumulative probability up to 0.340 kg and then
𝑥−µ
subtract the cumulative probability up to 0.260. z = = (0.340-0.295)/0.025
σ
=1.8(up to 0.340)
𝑥−µ
z = = (0.260-0.295)/0.025 = -1.4(up to 0.260). These two probabilities from
σ
Excel are 0.9641 and 0.0808 respectively. The answer is = 0.9641 - 0.0808 = 0.8833.
This means that 88.33% of the packs weigh between 0.260 kg and 0.340 kg.

Slides used for Educational Purpose only


Example 2

A company produces a bolt of length 10mm for its customers The bolts
produces are normally distributed with average length of 10.01mm & standard
deviation 0.06mm
(a) What is the probability that the bolt produced will be longer than 10.2 mm

(b) The sales team is negotiating with a new customer who has more stringent
quality requirements. The new customer requires bolts shall be between 9.9
and 10.15 mm. What is the probability that a bolt produces by the current
process will be acceptable to the new customer

(c) What is the length for which 99% of bolts produced will be less that the
length?

Slides used for Educational Purpose only

You might also like