Lecture Note (Chapter-I and II) PDF
Lecture Note (Chapter-I and II) PDF
FUCULITY of INFORMATICS
DEPARTMENT of COMPUTER SCIENCE
BY:
Ayele Gebeyehu
Ass. Professor, Department of Statistics, Wolkite University
Email: [email protected]/ or
[email protected]
In every research, meaningful conclusions can only be drawn based on data collected from a
valid scientific design using appropriate statistical methods. Therefore, the selection of an
appropriate study design is important to provide an unbiased and scientific evaluation of the
research questions. Each design is based on a certain rationale and is applicable in certain
experimental situations.
4. Presentation of the data: The organized data can now be presented in the form of
tables and diagram. At this stage, large data will be presented in tables in a very
summarized and condensed manner. The main purpose of data presentation is to
facilitate statistical analysis. Graphs and diagrams may also be used to give the data a
bright meaning and make the presentation attractive.
6. Inference of data: This is the stage where draw valid conclusions from the results
obtained through data analysis. Interpretation means drawing conclusions from the data
which form the basis for decision making. The interpretation of data is a difficult task and
necessitates a high degree of skill and experience. If data that have been analyzed are not
properly interpreted, the whole purpose of the investigation may be defected and
fallacious conclusion be drawn. So that great care is needed when making interpretation.
h. Variable: It is an item of interest that can take on many different numerical values.
1.4 Applications, Uses and Limitations of Statistics
Applications of statistics:
Statistics can be applied in any field of study which seeks quantitative evidence. For
instance, engineering, economics, natural science, etc.
Engineering: Statistics have wide application in engineering.
To compare the strength of two types of materials
To determine the probability of reliability of a product.
To control the quality of products in a given production process.
To compare the improvement of yield due to certain additives such as fertilizer,
herbicides, e t c.
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena. The
following are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
2. Ordinal Scales
Ordinal Scales are measurement systems that possess the property of order.
Level of measurement, which classifies data into categories that can be ranked.
Differences between the ranks do not exist.
Arithmetic operations are not applicable but relational operations are applicable.
Note: Ordering is the sole property of ordinal scale.
Examples: Letter grades (A, B, C, D, F).
Rating scales (Excellent, Very good, Good, Fair, poor).
Military status
3. Interval Scales
Interval scales are measurement systems that possess the properties of Order and
distance, but not the property of fixed zero (Absolute zero).
Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
i) Direct Observation
In this approach, an investigator stays the place of survey and notes down the first hand
information. Direct observations can be used to discover a variety of information
including consumer behavior, working methods & other aspects of social & economic
behavior. Direct observation is more experimental and usually applied in scientific
studies. It is time consuming and also costly. Also the method is highly subjective.
It is a conversation between two groups, i.e incited by the interviewer in order to obtain
the required information. The interviewer sets a series of questions directly elected for
his/her work in advance & conducts the interview. Interviewing is a technique that is
primarily used to gain an understanding of the underlying reasons and motivations for
people’s attitudes, preferences or behavior. Interviews can be undertaken on a personal
one-to-one basis or in a group. They can be conducted at work, at home, in the street or in
a shopping centre, or some other agreed location.
Limitation:
The personal bias & prejudice of the interview may affect the result.
Types of Questions
1. Open-ended Questions: permit free responses that should be recorded in the
respondent’s own words. The respondent is not given any possible answers to
choose from. Such questions are useful to obtain information on:
- Facts with which the researcher is not very familiar,
- Opinions, attitudes, and suggestions of informants, or
- Sensitive issues
2. Closed Questions: offer a list of possible options or answers from which the
respondents must choose. When designing closed questions one should try to:
- Offer a list of options that are exhaustive and mutually exclusive, and
- Keep the number of options as few as possible.
The following are the major points that we need to take into account while preparing the
questionnaire. The number of questions should be small. Naturally respondents are not
comfortable with lengthy questionnaires. Lengthy questionnaire usually bore
respondents. If a lengthy questionnaire is unavoidable, it should preferably be divided in
to two or more parts.
The question should be short, clear, simple, and unambiguous. Moreover, the question
must be arranged in to a logical order so that natural and spontaneous reply to each is
induced. For instance it is not appropriate to ask a person how many packets of cigarette
he /she smoke before asking whether he/she smoke or not.
Questions of sensitive nature should be avoided. Sensitive questions are those questions
that are too personal and pecuniary like source of income, drinking habit, etc. The logic
here is that respondents do not willingly answer sensitive questions. Such information, if
necessary, may be gathered through interviews or through other indirect questions.
Mail questionnaires should be accomplished by a covering letter, which should state the
purpose of the questionnaire, promise of confidentially of responses, etc.
Summary Exercise
1. What is the difference between descriptive and inferential statistics?
2. Clearly state stages in statistical investigation?
3. Write uses and limitation of statistics?
4. Classify each of the following as nominal, ordinal, interval and ration scale of
measurement.
a. Pages in the 25 best-selling mystery novels.
b. Rankings of golfers in a tournament.
c. Temperatures inside 10 pizza ovens.
d. Weights of selected cell phones.
e. Times required completing a chess game.
f. Ratings of textbooks (poor, fair, good, excellent).
g. Number of amps delivered by battery chargers.
Tabular presentation
Diagrammatic and Graphic presentation.
The process of arranging data in to classes or categories according to similarities
technically is called classification. Classification is a preliminary and it prepares the
ground for proper presentation of data.
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital
status M, S, D, and W. These types will be used as class for the distribution. We follow
procedure to construct the frequency distribution.
Step 1: Make a table as shown.
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
% * 100 Where f= frequency of the class, n=total number of value.
n
Percentages are not normally a part of frequency distribution but they can be added since
they are used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.
Class Tally Frequency Percent
S //// // 7 28
D //// // 7 28
W //// 5 24
2) Ungrouped frequency Distribution is a table of all the potential raw score values that
could possible occur in the data along with the number of times each actually occurred. It
is often constructed for small set or data on discrete variable.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Solution:
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
When the range of the data is large, the data must be grouped in to classes that are more than
one unit in width.
Definitions:
Example*:
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 9: Write the numeric values for the tallies in the frequency column.
The three most commonly used diagrammatic presentation for discrete as well as
qualitative data are:
Pie chart
Bar chart
Pictogram
A) Pie chart
A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:
Solutions:
Step 3: Using a protractor and compass, graph each section and write its name with
corresponding percentage.
Men 2500 25 90
Women 2000 20 72
Boys 1500 15 54
Girls Women
40% 20%
B) Bar Charts
Used to represent & compare the frequency distribution of discrete variables and
attributes or categorical series.
Bars can be drawn either vertically or horizontally.
In presenting data using bar diagram,
All bars must have equal width and the distance between bars must be equal.
The height or length of each bar indicates the size (frequency)of the figure represented.
There are different types of bar charts. The most common are:
Solution:
Simple bar chart
800 600
Frequency
Solution:
800
600 Female
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department
Example 2.7: The following data represent sales by product, 1957- 1959 of a given company for
three products A, B, C.
A 12 14 18
B 24 21 18
C 24 35 54
Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
Solution:
C) Pictograph
In this diagram, we represent data by means of some picture symbols. We decide about a suitable
picture to represent a definite number of units in which the variable is measured.
Histogram
Frequency
20
15
15 12
10
10
4 4
5 3 2
0
Class boundaries
Frequency polygon: If we join the mid-points of the tops of the adjacent rectangles of the
histogram with line segments a frequency polygon is obtained. When the polygon is continued to
the x-axis just outside the range of the lengths the total area under the polygon will be equal to
the total area under the histogram.
Example 2.9: Construct a frequency polygon for the previous data in example 2.8.
Solution:
Class Frequency Class Class R.F. % R.F. Less than More than
limits marks boundaries (percent) C.F. C. F.
Adding two class marks with f i 0 , we have 9.5 at the beginning, and 89.5 at the end, the
following frequency polygon is plotted:
Frequency Polygon
20
F
r 15
e
q
10
u 5
e
n 0
c 9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5
y
Class mark
Summary exercises
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
2. The following table is a grouped frequency distribution of money spent per visit by a random
sample of 100 customers at a dep’t store.
Amount spent 3-7 8-12 13-17 18-22 23-27 Total
no of customers 10 30 35 20 5 100
i.) For each of the above class state
a) class boundary
b) the class width
c) the class mark
d) draw histogram and ogive curve