0% found this document useful (0 votes)
7 views5 pages

Introduction of Statistics

Uploaded by

zanemoba419
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Introduction of Statistics

Uploaded by

zanemoba419
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

AN INTRODUCTION TO

STATISTICS

Prepared By
Dr Umme Habibah Rahman
Introduction of Statistics
Definition of Statistics:
The word ‘Statistics’ is probably derived from the Latin word ‘status’ or the Italian word
‘Statista’. The word ‘Statistics’ is used in singular as well as in plural sense. As a plural,
statistics may be defined as the numerical data relating to an aggregate of individuals and as a
singular it is defined as the science of collection, organization, presentation, analysis and
interpretation of numerical data. According to Sir R.A. Fisher “The science of Statistics is
essentially a branch of applied mathematics and may be regarded as mathematics applied to
observational data”. Fisher’s definition is the most exact in the sense that it covers all aspects
and fields of Statistics who is known as “Father of Modern Statistics”.

Scope of Statistics:
During last few decades statistics has penetrated into almost all sciences like agriculture,
biology, business, social, engineering, medical, etc. Statistical methods are commonly used for
analysing and interpreting experimental data. Also, wide and varied applications have led to
the growth of many new branches of statistics such as Industrial Statistics, Biometrics,
Biostatistics, Agricultural Statistics and the most recently developed Statistical Bioinformatics.
In brief we may summarize the scope of statistics as follows:
(a) Statistics has great significance in the field of physical and natural sciences. It is used in
propounding and verifying scientific laws.
(b) Statistics is often used in agricultural and biological research for efficient planning of
experiments and for interpreting experimental data.
(c) Statistical techniques are used to study the various economic phenomena such as wages,
price analysis, analysis of time series, demand analysis etc.
(d) Successful business executives make use of statistical techniques for studying the needs
and future prospects of their products.
(e) In industry, the statistical tools are very helpful in the quality control and assessment.

Limitations of Statistics:
i) Statistical methods are best applicable to quantitative data.
ii) Statistical decisions are subject to certain degree of error.
iii)Statistical laws do not deal with individual observations but with a group of observations.
iv) Statistical conclusions are true on an average.
v) Statistics is liable to be misused. The misuse of statistics may arise because of the use of
statistical tools by inexperienced and untrained persons.
Types of Data
The data collected by an investigator which have not been organized numerically and used by
anybody else is known as raw data. An arrangement of raw numerical data in ascending or
descending order of magnitude. The data can also be classified into two types----
(a) Primary data
(b) Secondary data.
Primary data: The data collected directly from the original source is called the primary data
i.e. the data collected for the first time. The primary data may be collected by:

1.Direct interview method


2.Through mail
3.Through designed experiments
Direct interview method: In this method the investigator contacts the units/individuals and
has personal interview. The information is recorded on the questionnaire or schedule. This
information will be more reliable and correct but more expenditure may be involved and more
time will be spent as the person himself will be going from place to place to collect the data.
Through mail: The data may be collected through correspondence. The questionnaire or
schedules are sent by mail with the instructions for filling the same and return. It is less costly
to get the data by mail. The main drawback of this method is the poor response. Usually the
response by mail in surveys has been found to be about 40%.
Through designed experiments: Data are generated as outcome of the research conducted by
the investigator himself.

Secondary Data: The secondary data is one which has already been collected by a source other
than the present investigator. Data which are obtained from published or unpublished sources
are known as secondary data.
The chief sources of Secondary data can be classified into two groups-----
1. Published Sources
2. Unpublished Sources
Published Sources: There are certain national, international or local sources which published
statistical data on a regular basis. These sources can be summed up as follows----
(i) National/ International Publications: Indian and Foreign governments and international
agencies published regular and occasional reports on various topics. These are perennial
sources of information.
(ii) Newspapers and Magazines: There are millions of newspapers and magazines who
maintained their research bureaus and publish original data on important problems.
(iii)Individuals research scholars: The individual research scholars of universities and other
allied agencies also supply a rich material on matters of importance.
Unpublished Sources: There are various sources of unpublished data such as the records
maintained by the various Government and private offices, studies made by the research
scholars in the universities and other research institutions etc.

Q. State the difference between Primary and Secondary data.


Ans.:
Basis Primary Data Secondary data
1. Originality It is original, because the It is not original. The
investigator himself/herselfinvestigator makes use of
collects the data. the data collected by other
agencies.
2. Collection Expenses It involves large expanses It is relatively a less costly
in term of time, energy and method.
money.
3. Sustainability If the data has been It may or may not suit the
collected in a systematic objects of enquiry.
manner, its sustainability
will be positive.
4. Precautions No extra precautions are It should be used with
needed while making use of special care.
this data.

Classification of data on the basis of Scales


Four levels or scales of Data measurement are:
i) Nominal Scale: Lowest level where only names are meaningful
ii) Ordinal Scale: Ordinal adds an order to the names.
iii) Interval Scale: Interval adds meaningful differences
iv) Ratio Scale: Ratio adds a zero so that ratios are meaningful.
Frequency: The number of times an individual item is repeated in a series is called its
frequency. In case of grouped data, the number of observations lying in any class is known
as the frequency of that class.
Frequency Distribution: It is tabular arrangement of data values along with their along
with their frequencies.
Cumulative Frequency (less than type): The cumulative frequency corresponding to any
value or class is the number of observations less than or equal to that value or upper limit
of that class. It may also be defined as the total of all frequencies up to the value or the
class. On similar lines we can define more than type cumulative frequencies.
Relative Frequency: The relative frequency of a class is the frequency of the class divided
by the total frequency of all the classes and is generally expressed as a percentage.
Relative Frequency= Frequency of the class/Total frequency of all classes.
Rules for Constructing a Frequency Distribution:
The following points should be borne in mind while tabulating or classifying an observed
frequency distribution.

1. The classes should be well defined and non-overlapping.


2. As far as possible the class interval should be of equal width.
3. The classes should be exhaustive i.e. the range of the classes should cover the entire
range of the data.
4. As a general rule, the number of classes should be between 10 and 15 and never more
than 20 and not less than 5. However, the exact number depends upon the data in hand.
5. Open-ended classes should be avoided.

Struge’s formula: A numerical formula as suggested by H.A. Struge may be used for
determining approximately the class size and the number of classes. According to this
formula the number of classes (k) is given
k = 1+ 3.322 log10 N, where N is the number of observations. Then class size is determined
as
Class width (h)=Largest value−smallest test value/Number of Classes.

You might also like