Collection of Data
Collection of Data
PROCESSING OF
DATA
OUTLINE
• Introduction to data
• Classification of data
• Collection of data
• Methods of data collection
• Assessment of qualitative data
• Processing of data
- Editing
- Coding
- Tabulation
- Graphical representation
What is data?
Data Information
Data
Variables
The characteristic or the trait in the behavioral science which
can be quantified is termed as variable
Variables
1. Continuous variables
2. Discrete variables
Are those on the other hand, which exist only in units not the
fractional value (usually units of one)
E.g. No. of advertisements that appear during a web series, No: of
customers in a departmental store, No: of students in an online
class
Attribute vs. Variable
Attribute Variable
The data are classified by counting the individuals having the same
characteristics or attribute and not by measurement
E.g. Level of sales , no. of employees., No: of benefit schemes, weight, etc.
Qualitative Quantitative
Data Data
• Nominal Scale • Interval Scale
• Ordinal Scale • Ratio Scale
Nominal Scale
Implies the classification of an item into 2 or more categories without any extent or
magnitude
The frequency or numbers are used to give a name to something that may be used for
determining per cent, mode
Eg. boys and girls; pass and fail; rural and urban
Ordinal Scale
The ordinal scale is more precise scale than the nominal scale
The variables has been categorized or leveled with meaningful natural order
But there is no information about the interval Eg. Pain: none, mild,
moderate, severe
Interval Scale
The interval scale is more precise and refined scale than nominal and ordinal scales
This scale has all the characteristics and relationship of the ordinal scale, besides
which distances between any two numbers on the scale are known
It has the same properties as an interval scale as well as a true or absolute zero value
The ratio scale numerals have the qualities of real numbers, and can be added,
subtracted, multiplied or divided
Eg. Mean SALES, Average Salary drawn by the employees, Marks obtained by
students,
Collection of Data
Process of systematic gathering of data for a particular purpose from various sources,
that has been systematically observed, recorded, organized
The choice of procedures usually depends on the objectives and design of the study
and the availability of time, money and personnel
Purpose of Data Collection
To obtain information
To keep a record
To make decisions about important issues
To pass information onto others
For research study
How Important is data?
• Data collection is an extremely important part of any research because the
conclusions of a study are based on what the data reveal
• It is through data collection that a business or management has the quality
information they need to make informed decisions for further analysis, study,
and research.
• Without data collection, companies would stumble around in the dark using
outdated methods to make their decisions.
• Data collection instead allows them to stay on top of trends, provide answers
to problems, and analyze new insights to great effect.
• Data allows organizations to visualize relationships between what is
happening in different locations, departments, and systems.
Factors to be considered before data
collection
Sources of information
Availability of funds
Source of Data
External Internal
It is original in character
OBSERVATION METHOD
2. The information obtained under this method 2. The information provided by this method is very
relates to what is currently happening: it is not limited.
complicated by either the past behavior or future
intentions or attitudes.
4. This method is particularly suitable in studies 4. At times, the fact that some people are rarely
which deal with subjects (i.e., respondents) who accessible to direct observation creates obstacle
are not capable of giving verbal reports of their for this method to collect data effectively.
feelings for one reason or the other.
OBSERVATION METHOD-TYPES
While using this method, the researcher should keep in mind things like:
What should be observed?
How the observations should be recorded?
Or how the accuracy of the observation can be ensured?
Two types of observation:
Structured – in case the observation is characterized by a careful definition of the units to be
observed, the style of recording the information, standardized conditions of observation and the
selection of pertinent data of observation.
Unstructured – observation taking place without the above characteristics, to be thought of in
advance.
Structured observation is appropriate in descriptive studies, whereas in an exploratory
study, the observational procedure is most likely to be relatively unstructured.
Participant observation – if the observer observes by making himself, more or less, a member of the
group he is observing so that he can experience what the members of the group do.
Non-participant observation – when the observer observes as a detached emissary without an
attempt on his part to experience through participation what others feel. When the observer is observing
in such a manner that his presence may be unknown to the people he is observing, such an observation
is described as disguised observation
Merits Demerits
1. The researcher can record the 1. The observer may lose the objectivity to the
natural behavior of the group. extent he participates emotionally.
2. The researcher can even gather information which 2. The problem of observation- control is not solved.
could not have been easily obtained in other artificial
settings.
3. The researcher can even verify the truth of 3. It may narrow-down the researcher’s range of
statements made by informants in the context of a experience.
questionnaire or a schedule.
Uncontrolled observation
If the observation takes place in the natural setting.
No attempt is made to use precision instruments.
The major aim of this type of observation is to get a
spontaneous picture or life and persons.
It has a tendency to supply naturalness and completeness of behavior, allowing
sufficient time for observing it.
Controlled observation
When observation takes place according to definite pre- arranged plans, involving
experimental procedure.
We use mechanical (or precision) instruments as aids to
accuracy and standardization.
Such observation has a tendency to supply formalized data upon which generalizations
can be built with some degree of assurance.
OBSERVATION
Behaviourial Observation:
• Non-verbal behaviour
• Linguistic behaviour
• Extra linguistic behaviour
• Spatial Relationship
SIMULATION
PROCESS OF SIMULATION:
• Identification of the process or system that has to be simulated
• Deciding the purpose
• Collecting the required input data
• Determining the type of simulation
• Analysing the results
INTERVIEW METHOD
3. Greater flexibility, in case of unstructured 3. High profile interviewees may not be accessible.
interviews.
5. Personal information can be obtained. 5. Non-realistic answers might be provided just to impress the
interviewer.
PROJECTIVE TECHNIQUES
Semantic Sentence
Mapping Data Completion
Collection
Sensory sorts Techniques Cartoons
Merits Demerits
On small scale survey
Cover large population avoided
Time consuming
Less expensive
Information does not
penetrate deeply
2. Is free from the bias of the interviewer. The 2. Can be used only when respondents are
answers are in respondents’ own words. educated and co- operative.
3. Respondents have adequate time to give well 3. Control over questionnaire may be lost once it is
thought out answers. sent.
4. Respondents, who are not easily approachable, 4. Possibility of ambiguous replies or omission of
can also be reached conveniently. replies altogether to certain questions.
5. Large samples can be made use of and thus the 5. It is difficult to know whether willing
results can be made more dependable and reliable. respondents are truly representative.
PILOT STUDY
Merits
Of all the methods, the mailed questionnaire is the most
economical
It can be widely used, when the area of investigation is
large
It saves money, labor and time
Demerits
Cannot be sure about the accuracy and reliability of the data
There is long delay in receiving questionnaires duly filled in
Data Collection Through Schedules
Enumerator goes to the respondents, asks them the questions from the
Performa in the order listed, and records the responses in the space provided
2. Enables to trace out the natural history of the 2. Time consuming and expensive.
social unit and its relationship with social factors
and the forces involved in its surrounding
environment.
4. The researcher can use one or more of the 4. This method can be used only in a limited
several research methods depending upon the sphere. Sampling is not possible under a case
circumstances. study.
Focus Group Discussion
Types (Denzin
1978)
Application and combination of several research methods in the study of the same
phenomenon-Beating the Bias
Researchers can hope to overcome the weakness or intrinsic biases and the problems
that come from single method, single-observer and single-theory studies
Secondary data are those data which have been already collected and analysed
by some earlier agency for its own use and later the same data are used by a
different agency
Sources of
Secondary Data
Secondary sources’ interpret, analyse and critique primary sources. They can provide a second-hand
version of events or an interpretation of first-hand accounts. They can tell a story one or more steps
removed from the original person, time, place or event. The following are examples of secondary sources:
• scientific debates
• analyses of clinical trials
• datasets and databases that have been processed, analysed or interpreted
• texts and books that use a variety of primary sources as evidence to back up arguments and/or
conclusions
• analyses/interpretations/critiques of previous research
• book and article reviews
• Biographies
• critiques of literary works
• critiques of art
• television documentaries or science programmes
• analyses of historical events
Published Sources
1Published Sources:
Secondary data is usually gathered from the published (printed) sources. A
few major sources of published information are mentioned below:
•Published articles of local bodies and Central and State Governments.
•Statistical synopses, census records and other reports issued by different
departments of the Government.
•Official statements and publications of the foreign Governments.
•Publications and Reports of chambers of commerce, financial institutions,
trade associations, etc.,
•Magazines, journals and periodicals.
•Publications of Government organisations like the Central Statistical
Organization (CSO), National Sample Survey Organization (NSSO).
•Reports presented by Research Scholars, Bureaus, Economists, etc.,
Unpublished Sources
Statistical data can be obtained from several unpublished
references. Some of the major unpublished sources from which
secondary data can be gathered are:
•The research works conducted by teachers, professors and
professionals.
•The records that are maintained by private and business
enterprises.
•Statistics maintained by different departments and agencies of
the Central and State Governments, Undertakings, Corporations,
etc.,
Precautions in the use of Secondary Data
Suitability of data
Adequacy of data
Reliability of
Secondary Data must possess the following
characteristics
Reliability of data – may be tested by checking:
Who collected the data?
What were the sources of the data?
Was the data collected properly?
Suitability of data
Data that are suitable for one enquiry may not be necessarily suitable in another
enquiry
Objective, scope and nature of the original enquiry must be studied
Adequacy of data – data is considered inadequate, if they are related to area which may be
either narrower or wider than the area of the present enquiry
Data Processing
Collected data is raw and it must undergo some processing before analysis
The result of the analysis are affected a lot by the form of the data
Sum up the materials so as to prepare tables, charts, graphs and various groupings
and breakdowns for presenting the result
Minimizing the errors which may creep in at various stage of the survey
Types of Data Processing
Implies many chances for errors, such as delays in data capture, high amount of
operator misprints
Implies higher labour expenses in regards to spending for equipment and supplies,
rent, etc.
Types of Data Processing
The use of mechanical machines makes data processing easier and less time-
consuming
The chances of errors also become far less than manual data processing
Types of Data Processing
3. Electronic Data Processing
Eg, when a bank customer withdraws a sum of money from his or her account it is
vital that the transaction be processed and the account balance updated as soon as
possible
Types of Data Processing
5. Batch Processing
Batch processing requires separate programs for input, process and output
QUESTIONNAIRE
EDITING CODING CLASSIFICATION
CHECKING
GRAPHICAL
DATA ADJUSTING DATA CLEANING TABULATION
REPRESENTATION
Questionnaire Checking
When the data is collected through questionnaires, the first steps of
data process is to check the questionnaires if they are accepted or not
• Modification to facilitate
Editing for
tabulation,
tabulation • Ignoring extremely high/low
• Translating or rewriting
Field editing
72
Necessity of Coding
The process of arranging the primary data in a definite pattern and presenting it
in a systematic way
Qualitative classification
Qualitative Classification
Geographical Classification
Data are classified by location of occurrence (i.e. area, region) eg No of students who
cleared NEET . district wise
Chronological classification
Data are classified by time of occurrence of the observations, events
The categories are arranged in chronological order eg, no. of Corona
patients who recovered -recorded from March to September 2020
Qualitative Classification
Qualitative classification (Classification according to attributes)
Data are classified according to some quality such as religion, literacy, sex,
occupation etc.
Simple classification
Classification is made into 2 classes, such as classification by male or female
Manifold classification
2 or more than 2 attributes are studied simultaneously
Eg. Classification according to sex, again marital status and again literacy
Tabulation
To identify trends
Table number
Title of the table
Caption and stubs
Body
Prefatory or head note
Footnotes
Kinds of Tables
According to According to
According to
Purpose
Originality
Construction
Double or
Manifold
Two-Way Treble Table Table
Table
TABULATION
Commerce 70
Arts 90
Total 210
Numbers of User
Faculties
Girls Boys Total
Science 20 30 50
Commerce 30 40 70
Arts 35 55 90
Total 85 125 210
Numbers of User
Total
Faculties Girls Boys (1)+(2)
Science 15 20 35 20 30 50 85
Commerce 35 30 65 45 40 85 150
Arts 25 35 60 35 55 90 150
GUIDELINES FOR TABULATION
• Numbering
First, assign a number to the table for its identification and reference in future.
Such number should be put at the top of the table.
• Heading
Then give a proper heading or title to the table keeping in vie the nature of the
data the table is going to present. Such title should be given in bold and
prominent letters just below the number of the table, For this, the title should be
as short as possible without losing clarity. It should be able to speak what exactly
the table exhibits.
• Abbreviations
No abbreviation should be used in the titles and subtitles.
• Ditto Marks
No ditto marks should be used, as at times it creates confusion on the part of the
observers.
• Clarity
The table should be drawn clearly and completely so that it can be easily
GUIDELINES FOR TABULATION
• Units
The units of the data presented such as ‘price in Rs.’ Or ‘weights in tonnes’ etc. should be
clearly but briefly stated under the prefatory heading immediately below the title line of the
table. If the different data have different units they should be stated at the top of the
respective columns.
• Fixing the Number of Columns and Rows
Keeping in view the nature and types of data to be presented, the number of rows and
columns should be carefully fixed. Any mistake this respect will vitiate the whole efforts made
in the tabulation.
• Size
The length and width of different columns and rows and those of the table as a whole should
be fixed keeping in mind the size of the paper available and the quantum of data to be
exhibited.
• Marking
Every column and row of the table should be marked with number in a serial order so that it
can be readily referred to as and when needed.
• Overcrowding
Overcrowding of the table with large number of data should be avoided . In such cases, the
Minimization of Main Headings
The number of main headings should be few in order that the main points of the table may be ea
grasped. However, the number of sub-headings may be large.
Self-Explanatory
The caption (column headings), and the studs (row headings) should be self explanatory with
leaving any room for further clarification.
Vicinity
The columns to be compared with each other should be kept close to each other. Similarly,
columns of percentages, averages, etc. should be kept close to the columns of the data.
Approximation
Figures to be put in the body of the table should be approximated first
Totality
The totals of the rows should be shown in the extreme right column while the totals of the colum
should be shown in the last row of the table.
Arrangement
The items should be arranged in some logical order viz. alphabetical, chronolohical, size, importance
causal relationship to facilitate comparison and analysis of the data.
Indicating the Emphasis
When certain figures need emphasis, they should be shown in boxes or circles or between two
thick bars
Logical Sequence
In the preparation of the table logical sequence must be maintained. The table should be
simple, but compact and should be free from overlapping and ambiguity.
Suitability
The table must be so prepared that it suits the purpose of the enquiry.
Aesthetics:
The table should be drawn with an attractive get up so that it would be appealing to one’s eyes
and mind and one can understand it without much strain. For this, the adjacent rows and
columns should be separated by single, double or thick lines keeping in view the broad classes
and sub-classes used.
Explicity
The data should be tabulated in an explicit fashion without leaving any room for implicit
meaning. The expression ‘etc’. should be avoided as it is likely to create confusion in the mind
of an observe. Similarly, to put a zero data is not available, it should be put rather than the
word zero. When any data is not available, it should be indicated by the abbreviation N.A. (Not
Graphical Representation
The non statistical minded people also easily understands the data and compares
them
Most common graphs are bar charts and pie charts in qualitative study and
histogram in quantitative study
Graphical Representation
Advantages
It is easier to read
Can show relationship between 2 or more sets of observations in one look
Universally applicable
Has high communication power
Simplifies complex data
Has more lasting effect on brain
Graphical Representation
1. Bar Diagram
• Consists of equally spaced vertical (or horizontal) rectangular bars of equal
width placed on a common horizontal (or vertical) base line
300
200
100
0
BPH
Component Bar diagram
MBBS
B.Optom
Simple Bar diagram
B.Pharma
HEA
LTH
PRO
GRA
M
2. Pie Chart
• Circular diagram divided into segments and each
segment represent frequency in a category
Graphical Representation
Line diagram
Pictogram
Cartogram
Graphical Representation
Frequency Curve
Frequency Polygon
Although preliminary consistency checks have been made during editing, the
checks at this stage are more thorough and extensive, because they are made by
computer
Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to
identify out-of-range values for each variable
Data Adjusting
If any correction needs to be done for the statistical analysis, the data is
adjusted accordingly
Data adjusting is not always necessary but it may improve the quality of
analysis sometimes
Data Analysis