Chapter 1
Chapter 1
Using Materials
• In LMS
• In Teams/Files
O
Using R and Devore7 package. (See video clips and pay attention
to the following)
1. Why we need of using R?
2. Install R
3. Install RStudio,
And then:
4. Run RStudio and open script file GettingStart
5. Install packages: Devore7, prob, Maples,....and taste the R with discussion.
O
Basic concepts
1. Population-Sample-Variable
• Population: The set of all objects under an investigation
• Sample: a subset of population
• A variable: is a characteristic whose value change from one object to another in the
population
Exam:
x:=brand of calculator owned by a student
y = number of visits to a particular website during a specified period
z = braking distance of an automobile under specified conditions
O
2. Observation Data: Univariate, Bivariate, Multivariate
• Univariate data set consists of observations made on a single variable
• Bivariate: on two variables
• Multivariate: on more than one variable
O
Branches of Statistics
Descriptive Statistics-Inference Statistics
• Descriptive Stat. provides methods to summarize and describe important features of the
data. Main discriptive methods consists of a) graphical tools like: histogram, boxplots,
scatter plot,...b) calculation of numeric measures: mean, median, mode,...
• Inference Stat. provide techniques for using sample information to draw some type of
conclusion about the population.
O
Collecting Data
Remark on collecting Data
• Statistics deals not only with the organization and analysis of data once it has been collected
but also with the development of techniques for collecting the data.
• If data is not properly collected, an investigator may not be able to answer the questions
under consideration with a reasonable degree of confidence.
Common Problem
• One common problem is that the target population—the one about which conclusions are
to be drawn—may be different from the population actually sampled.
Exam
• Advertisers would like various kinds of information about the television-viewing habits
of potential customers. The most systematic information of this sort comes from placing
monitoring devices in a small number of homes across the United States.
• It has been conjectured that placement of such devices in and of itself alters viewing
behavior, so that characteristics of the sample may be different from those of the target
population.
Simple Random Sample
Any particular subset of the specified size (e.g., a sample of size 100) has the same chance of
being selected.
Sampling methods (read yourself)
• focus on stratified sampling
Note
• A display based on between 5 and 20 stems is recommended.
Exams 1.1 and 1.5
Exam 1.5
Dotplots
When:
• when the data set is small or
• there are relatively few distinct data values
How
• Each observation is represented by a dot above the corresponding location on a horizontal
measurement scale.
• When value occurs more than once there is a dot for each occurrence, and these dots are
stacked vertically
What does it tell you?
As with a stem-and-leaf display, a dotplot gives information about location, spread, extremes,
and gaps.
Exam 1.7
Histograms
Number of classes
O
Discrete Numerical Variables
Def
O
Construction
O
Exam 1.8
Continuous Numerical Variables
Number of classes
O
Equal Class Widths
O
Example 1.9
O
Unequal Class Widths
O
Example 1.10
O
Histogram Shapes
O
Quanlitative Data
Exam:
Work out examples 25,26 in Sec 1.2 (Edi. 7) with R
Exam 1.12
Dotplot
O
Exam 1.13. Compute sample mean and median by R and express by graphical
tools (dotplot, stem-and-leaf, barchart, histogram,..).
O
Remarks
• The sample median is not sensitive to outliers.
• The middle value in the population, the population median,denoted by
• Both quantities mean and median describe where the data is centered, but they will not in
general be equal.
• The population mean and median will not generally be identical.
3. Similarly, a data set (sample or population) can be even more finely divided using
percentiles; the 99th percentile separates the highest 1% from the bottom 99%, and so on.
4. A trimmed mean is a compromise between mean and median. A 10% trimmed mean, for
example, would be computed by eliminating the smallest 10% and the largest 10% of the
sample and then averaging what is left over.
Exam 1.14
Our primary measures of variability involve the deviations from the mean:
Exams 1.15
Using R
• Load examples 1.15
• Compute directly the mean by using the commands >sum, >length. Check by command
>mean()
• Compute directly Sxx and variance and deviation by using formulae. Check by command
>var and >sd
O
Exam 1.16
Using R
• Load examples 1.16.
• Compute directly Sxx and variance and deviation by using formulse and check by
command >var and >sd
O
Boxplots
• Stem-and-leaf, dotplot displays and histograms convey rather general impressions about a data
set
• Mean, median or standard deviation focuses on just single aspects of the data
• A pictorial summary called a boxplot has been used successfully to describe several of a data
set’s most prominent features.
What does a boxplot tell you?
• Include
(1) center,
(2) spread,
(3) the extent and nature of any departure from symmetry, and
(4) identification of “outliers,” observations that lie unusually far from the main body of the
data.
• It uses the median and a measure of variability called the fourth spread.
Pictures of Boxplot and Definition
Exam 1.17
Boxplots that show outliers
Definition
O
Exam1.18 and 1.19
Using R
• Use bwplot(C1) or boxplot(C1,horizontal=TRUE)
Exam 1.19
Exam (Exc1.15)
R practice
1. After installing R, RStudio (read file GettingStart), open file lab1_DesStat.R (in Teams/Files)
2. Please choose a data from internet (like titanic) or any data from Devore7/MAS library or from
any source in life which has at least 100 observations. Try to use tools of descriptive statistics to
analyse and present your data.
Note: You can see following video clips:
-https://www.youtube.com/watch?v=49fADBfcDD4
-https://www.youtube.com/watch?v=xYXif1UCs-g
O