0% found this document useful (0 votes)
18 views

Collection of Data

The document discusses data collection and processing. It defines data and different types of data including qualitative and quantitative data. It also discusses the classification of data as attributes and variables, and different measurement scales for variables. The document outlines different methods of collecting data, both primary and secondary sources. It emphasizes that data collection is an important first step for research as it provides the information and basis for analysis. Factors to consider for effective data collection include the objective, available sources and funds, and techniques.

Uploaded by

uthira
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Collection of Data

The document discusses data collection and processing. It defines data and different types of data including qualitative and quantitative data. It also discusses the classification of data as attributes and variables, and different measurement scales for variables. The document outlines different methods of collecting data, both primary and secondary sources. It emphasizes that data collection is an important first step for research as it provides the information and basis for analysis. Factors to consider for effective data collection include the objective, available sources and funds, and techniques.

Uploaded by

uthira
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 115

COLLECTION AND

PROCESSING OF
DATA
OUTLINE
• Introduction to data
• Classification of data
• Collection of data
• Methods of data collection
• Assessment of qualitative data
• Processing of data
- Editing
- Coding
- Tabulation
- Graphical representation
What is data?

 Data are observations or evidences about the social world

 Data can be quantitative or qualitative in nature


Data are characteristics or information, usually numerical, that are
collected through observation. In a more technical sense, data are a set of
values of qualitative or quantitative variables about one or more persons or
objects, while a datum (singular of data) is a single value of a single
variable
Data & Information
 The terms 'data' and 'information' are used interchangeably
 However the terms have distinct meanings

Data Information

Data that have been


Facts, events, transactions produced in such a way as
which have been to be useful to the recipient
recorded

Input raw materials from Basic data are processed in


which information is some way to form
processed information
Nature of Data
 The research studies in behavioral science are mainly concerned with the
characteristics or traits
 Thus, tools are administered to quantify these characteristics
- but all traits or characteristics can not be quantified The data can be classified
into two broad categories:

Data

Qualitative Data or Quantitative Data or


Attributes Variables
Nature of Data
1.Qualitative Data or Attributes

The characteristics or traits for which numerical value


can not be assigned, are called attributes
e.g. gender, motivation, etc.

2. Quantitative Data or Variables

The characteristics or traits for which numerical value


can be assigned, are called variables
e.g. height, weight etc.
Constants
A constant is all characteristic or condition that is the same for
all the observed units or sample subjects of a study

Variables
The characteristic or the trait in the behavioral science which
can be quantified is termed as variable

Variables

Continuous variables Discrete variables


Variables

1. Continuous variables

 A characteristic whose observation can take any values over a


particular range
 It can assure either fractional or integral values
 E.g. weight of children in kg, height of girls in cms.

2. Discrete variables

 Are those on the other hand, which exist only in units not the
fractional value (usually units of one)
 E.g. No. of advertisements that appear during a web series, No: of
customers in a departmental store, No: of students in an online
class
Attribute vs. Variable
Attribute Variable

 A category of a characteristic,  Variable describes a


to which a subject either characteristic in terms of a
belongs or does not belong or numerical value, which is
property that a subject either expressed in units of
possesses or does not measurements
possess

 The attributes are  The variables are height,


preference to a product, weight, age,distance, volume,
advertisement speed. etc.
Qualitative Data
 In such data there is no notion of magnitude of size of the
characteristic

 They are just categorized

 The data are classified by counting the individuals having the same
characteristics or attribute and not by measurement

 For examples: Gender: male/female


Disease: present/absent Smoke: smoking/not
smoking, Preference: like/don’t like, Satisfaction:
satisfied/ot satisfied
Quantitative Data
 Anything that can be expressed as a number, or quantity or magnitude

 Describes characteristics in term of a numerical value, which are expressed in units of


measurements

 E.g. Level of sales , no. of employees., No: of benefit schemes, weight, etc.

 Quantitative observations: as each individual is represented by a number

 These data can be measured in interval and ratio scales


Measurement Scale

 The choice of appropriate statistical technique depends


upon the type of data in question

Qualitative Quantitative
Data Data
• Nominal Scale • Interval Scale
• Ordinal Scale • Ratio Scale
Nominal Scale

 The least precise or crude of the 4 basic scales of measurement

 Implies the classification of an item into 2 or more categories without any extent or
magnitude

 There is no particular order assigned to them

 The frequency or numbers are used to give a name to something that may be used for
determining per cent, mode
Eg. boys and girls; pass and fail; rural and urban
Ordinal Scale

 The ordinal scale is more precise scale than the nominal scale

 The variables has been categorized or leveled with meaningful natural order

 But there is no information about the interval Eg. Pain: none, mild,

moderate, severe
Interval Scale

 The interval scale is more precise and refined scale than nominal and ordinal scales

 This scale has all the characteristics and relationship of the ordinal scale, besides
which distances between any two numbers on the scale are known

 The size of interval between two observations can be measured

Eg. The temperature of a body, time of the day


Ratio Scale

 It has the same properties as an interval scale as well as a true or absolute zero value

 The ratio scale numerals have the qualities of real numbers, and can be added,
subtracted, multiplied or divided

Eg. Mean SALES, Average Salary drawn by the employees, Marks obtained by
students,
Collection of Data
 Process of systematic gathering of data for a particular purpose from various sources,
that has been systematically observed, recorded, organized

 It is the first step of statistical study

 There are several ways of collecting data

 The choice of procedures usually depends on the objectives and design of the study
and the availability of time, money and personnel
Purpose of Data Collection

 To obtain information
 To keep a record
 To make decisions about important issues
 To pass information onto others
 For research study
How Important is data?
• Data collection is an extremely important part of any research because the
conclusions of a study are based on what the data reveal
• It is through data collection that a business or management has the quality
information they need to make informed decisions for further analysis, study,
and research.
• Without data collection, companies would stumble around in the dark using
outdated methods to make their decisions.
• Data collection instead allows them to stay on top of trends, provide answers
to problems, and analyze new insights to great effect.
• Data allows organizations to visualize relationships between what is
happening in different locations, departments, and systems.
Factors to be considered before data
collection

 Nature, scope & objective of the enquiry

 Sources of information

 Availability of funds

 Techniques of data collection

 Availability of trained persons


Sources of Data

Source of Data

External Internal

Primary Data Secondary Data


Example: Example:
Interviews Unpublished thesis and
Observations dissertations
Surveys Published records
Questionnaires Manuscript
Books
Journals
Internal & External Sources of Data
Internal sources of Data External sources of data

o Many institutions and o When information is collected


departments have information from outside agencies is called
about their regular functions , external sources of data
for their own internal
purposes
o Such types of data are either
primary or secondary
o When those information
are used in any survey is
called internal sources of o This type of information can
odata
Eg. social welfare be collected by census or
society, audit reports, Cost sampling method by
reduction data, Internal conducting survey
communications
Primary Data
 Data collected by investigator from personal experimental studies for a specific
research goal is called primary data

 The data are collected specially for a research project

 Used when secondary data are unavailable and inappropriate

 Data are to be unique, original, reliable and accurate in nature

 Primary data’s validity is greater than secondary data


Primary Data
Merits Demerits

Targeted issues are Escalates the cost


addressed

Data interpretation is better Time consuming

High accuracy of data More number of resources


are required

Addresses specific research Inaccurate feedback


issues

Greater control Requires lot of skill


Primary Data Collection
Techniques
 Some of the important methods for collecting primary
data in descriptive research and surveys are:
 Observation method
 Interview method
 Through questionnaires
 Through schedules
 Other methods:
 Warranty cards
 Distributor audits
 Pantry audits
 Consumer panels
 Using mechanical devices
 Through projective techniques
 Depth interviews
 Content analysis
PRIMARY SOURCES

Primary sources are first-hand narratives, original documents/objects or factual accounts


that were written or made during or close to the event or period of time. They have a direct
connection to a person, time, event or place. Primary sources have not been subject to
processing, manipulation, analysis or interpretation. The following are examples of primary
sources:
• historical records, texts and original manuscript
• government records (if they have not been processed, interpreted or analysed)
• company/organization records (if they have not been processed, interpreted or analysed)
• personal documents (diaries, journals and memoirs, for example)
• recorded or transcribed speeches or interviews
• raw statistical data (if they have not been processed, interpreted or analysed)
• works of literature
• works of art
• theatrical works
• film/video
Direct personal observation
 The data is collected by the investigator personally
 He/she must be a keen observer

 He/she asks or cross-examines the informant and collects necessary information

 It is original in character
OBSERVATION METHOD

 The observational method is the most commonly used method


specially in studies related to behavioral sciences.
 In a way, we all observe things around us, but this sort

of observation is not scientific observation.


 Observation becomes a scientific tool and the method of data

collection for the researcher, when it is systematically planned and


recorded and is subjected to checks and controls on validity and
reliability.
 Under the observation method, the information is sought by the way

of investigator’s own direct observation without asking from the


respondent.
Suitability of direct personal observation

Direct personal observation is adopted in the following cases

 Where greater accuracy is needed


 Where the field of enquiry is not large
 Where confidential data are to be collected
 Where sufficient time is available
Advantages Limitations
1. Subjective bias is eliminated, if observation is 1. It is an expensive method.
done accurately.

2. The information obtained under this method 2. The information provided by this method is very
relates to what is currently happening: it is not limited.
complicated by either the past behavior or future
intentions or attitudes.

3. This method is independent of respondents’ 3. Sometimes, unforeseen factors may interfere


willingness to respond, i.e., does not require the with the observational task.
active participation of the respondents.

4. This method is particularly suitable in studies 4. At times, the fact that some people are rarely
which deal with subjects (i.e., respondents) who accessible to direct observation creates obstacle
are not capable of giving verbal reports of their for this method to collect data effectively.
feelings for one reason or the other.
OBSERVATION METHOD-TYPES

 While using this method, the researcher should keep in mind things like:
 What should be observed?
 How the observations should be recorded?
 Or how the accuracy of the observation can be ensured?
 Two types of observation:
 Structured – in case the observation is characterized by a careful definition of the units to be
observed, the style of recording the information, standardized conditions of observation and the
selection of pertinent data of observation.
 Unstructured – observation taking place without the above characteristics, to be thought of in
advance.
 Structured observation is appropriate in descriptive studies, whereas in an exploratory
study, the observational procedure is most likely to be relatively unstructured.
 Participant observation – if the observer observes by making himself, more or less, a member of the
group he is observing so that he can experience what the members of the group do.
 Non-participant observation – when the observer observes as a detached emissary without an
attempt on his part to experience through participation what others feel. When the observer is observing
in such a manner that his presence may be unknown to the people he is observing, such an observation
is described as disguised observation

Merits Demerits
1. The researcher can record the 1. The observer may lose the objectivity to the
natural behavior of the group. extent he participates emotionally.

2. The researcher can even gather information which 2. The problem of observation- control is not solved.
could not have been easily obtained in other artificial
settings.

3. The researcher can even verify the truth of 3. It may narrow-down the researcher’s range of
statements made by informants in the context of a experience.
questionnaire or a schedule.
 Uncontrolled observation
 If the observation takes place in the natural setting.
 No attempt is made to use precision instruments.
 The major aim of this type of observation is to get a
spontaneous picture or life and persons.
 It has a tendency to supply naturalness and completeness of behavior, allowing
sufficient time for observing it.
 Controlled observation
 When observation takes place according to definite pre- arranged plans, involving
experimental procedure.
 We use mechanical (or precision) instruments as aids to
accuracy and standardization.
 Such observation has a tendency to supply formalized data upon which generalizations
can be built with some degree of assurance.
OBSERVATION

Non- behaviourial observation:


• Record analysis
• Physical Condition analysis
• Process or activity analysis

Behaviourial Observation:
• Non-verbal behaviour
• Linguistic behaviour
• Extra linguistic behaviour
• Spatial Relationship
SIMULATION

Simulation is a process of a conducting experiments on a symbolic model representing a phenomenon


TYPES OF SIMULATION:
• Man simulation
• Computer simulation
• Man- Computer simulation
• Field Simulation
• Organisational Simulation

PROCESS OF SIMULATION:
• Identification of the process or system that has to be simulated
• Deciding the purpose
• Collecting the required input data
• Determining the type of simulation
• Analysing the results
INTERVIEW METHOD

 The interview method of collecting data involves


presentation of oral-verbal stimuli and reply in terms of
oral-verbal responses.
 This method can be used through personal
interviews and, if possible, through telephone
interviews.
 Personal interviews
 Telephone interviews
PERSONAL INTERVIEW
 Personal interviews
 Personal interview method requires a person known as the interviewer asking
questions generally in a face-to-face contact to the other person or persons.
 At times, the interviewee may also ask certain questions and the interviewer
responds to these, but usually the interviewer initiates the interview and
collects the information.
 Can be of two types – direct personal investigation and indirect oral
examination.
 The method of collecting information through personal interviews is usually
carried out in a structured way (structured interview) – adopted by descriptive
studies.
 Unstructured interviews, on the other hand, are characterized by a flexibility of
approach to questioning adopted by exploratory or formulative studies.

 Other types of interviews – focused interviews, clinical interview


Merits Demerits
1. More information and that too in great 1. Very expensive method, especially when large and
depth. widely spread geographical sample is chosen.

2. The resistance on part of the respondents can be 2. Possibility of biasness prevails.


overcome.

3. Greater flexibility, in case of unstructured 3. High profile interviewees may not be accessible.
interviews.

4. Observation method can also be applied. 4. More time consuming.

5. Personal information can be obtained. 5. Non-realistic answers might be provided just to impress the
interviewer.

6. Non-response is addressed. 6. Effective interview presupposes proper rapport with


7.Spontaneous and more real information can be respondents which may not always be true.
collected.

8. Ambiguities in questions can be resolved.


CONDUCTING INTERVIEWS
 Pre-requisites and basic tenets of interviewing
 For successful implementation of the interview method, interviewers would need to be
carefully selected, trained and briefed.
 They should be honest, sincere, hardworking, impartial and must possess the technical
competence and necessary practical experience.
 The interviewer must ask questions properly and intelligently and must record the responses
accurately and completely.
 The interviewers approach should be friendly, courteous, conversational and unbiased.
 If the conversation goes off the track, it is the responsibility of the interviewer to bring it back to
track.
TYPES OF INTERVIEWS

• Structured or directive interview


• Semi-structured Interview
• Unstructured or non-directive interview
• Focussed Interview
• Clinical Interview
• Depth interview
• Group Interview
• Focus group interview
• Telephonic Interview
• Projective Techiniques
7-41

PROJECTIVE TECHNIQUES

Laddering MET Association

Semantic Sentence
Mapping Data Completion
Collection
Sensory sorts Techniques Cartoons

Component Imagination Thematic


Sorts Exercises Apperception
•That makes me think of the garden.
•It is the city in the country, very much so.
•It looks like New York, with the Empire State Building right
there.
•Calming, relaxing. There's a tree there so you can see the
country-side and you've got the background with the city and
the buildings, so it's a regional focus.
Survey
 A detailed study of geographical area to gather data,
attitudes, impressions, opinions, satisfaction level etc., by
polling a section of the population

Census Survey Continuous Ad-hoc Survey


• Conducted Survey • Conducted at
regularly at large • Conducted specific times for
Types interval of time regularly and specific need
frequently • ‘as and when’
required
Survey

Merits Demerits
On small scale survey
Cover large population avoided

Time consuming
Less expensive
Information does not
penetrate deeply

Information is accurate Researcher must have


good knowledge
COLLECTION OF DATA THROUGH
QUESTIONNAIRES

Unit 5 - Data Collection Methods


 This method of data collection is quite popular
particularly in case of big enquiries.
 In this method, a questionnaire is sent to the persons concerned with a
request to answer the questions and return the questionnaire.
 A questionnaire consists of a number of questions printed or typed in a
definite order on a form or set of forms.
 The respondents have to answer the questions on their own. 49
Merits Demerits
1. Low cost even when the universe is large and is 1. Low rate of return of the duly filled in
widely spread geographically. questionnaires; bias due to non response is often
indeterminate.

2. Is free from the bias of the interviewer. The 2. Can be used only when respondents are
answers are in respondents’ own words. educated and co- operative.

3. Respondents have adequate time to give well 3. Control over questionnaire may be lost once it is
thought out answers. sent.
4. Respondents, who are not easily approachable, 4. Possibility of ambiguous replies or omission of
can also be reached conveniently. replies altogether to certain questions.

5. Large samples can be made use of and thus the 5. It is difficult to know whether willing
results can be made more dependable and reliable. respondents are truly representative.
PILOT STUDY

Before using this method, it is always


advisable to conduct ‘pilot study’ for testing
the questionnaires.
This study is the replica or rehearsal of the
main survey. It brings to the light the
weaknesses (if any) of the questionnaires
and also of the survey techniques.
MAIN ASPECTS OF THE QUESTIONNAIRE
 Main aspects of the questionnaire
 General form
 Either structured or unstructured.
 Question sequence
 Must be clear and smoothly-moving meaning that the relation between questions should be readily
apparent to the respondent, with answers that are easiest to answer placed in the beginning.
 The opening questions must be such that they arouse interest to answer further.
 Questions that put too much strain on the memory or intellect of the respondent, personal questions etc.
should generally be avoided.
 Relatively difficult questions can be put towards the end so that if there is not a response, considerable
information would have already obtained.
 Question formulation and wording
 Should be easily understood, should be simple, should be concrete and should conform to the way the
respondent thinks.
 Questionnaire should be comparatively short and simple, i.e., the size of the questionnaire should be kept a minimum.
FURTHER ESSENTIALS OF A GOOD QUESTIONNAIRE

 Questions affecting the sentiments of respondents should be avoided.


 Adequate space should be provided in the questionnaire to help editing and
tabulation.
 There should always be provisions for indications of uncertainty, e.g., “do not
know”, “no preference” and so on.
 Brief directions with regards to filling the questionnaire should be given in
the questionnaire itself.
 Finally, the physical appearance of the questionnaire also should be attractive.
Mailed questionnaires

 The questionnaires is sent to the respondents, there are blank spaces


for answers
 A covering letter is also sent along with the questionnaire,
Requesting the respondent to extend their full cooperation
 Adopted by research workers, private individuals, non-officials
agencies and government
 Appropriate in cases where informants are spread over a wide area
Mailed questionnaires

Merits
 Of all the methods, the mailed questionnaire is the most
economical
 It can be widely used, when the area of investigation is
large
 It saves money, labor and time
Demerits
 Cannot be sure about the accuracy and reliability of the data
 There is long delay in receiving questionnaires duly filled in
Data Collection Through Schedules

 Very similar to the questionnaire method

 The main difference is that a schedule is filled by the enumerator who is


specially appointed for the purpose

 Enumerator goes to the respondents, asks them the questions from the
Performa in the order listed, and records the responses in the space provided

 Enumerators must be trained in administering the schedule


CASE STUDY METHOD

 The case study method is a very popular form of qualitative


analysis and involves a careful and complete observation of a
social unit, be that unit a person, a family, an institution, a
cultural group or even the entire community.
 It is a method of study in depth rather than breadth.

 The case study places more emphasis on the full analysis of


a limited number of events or conditions and their
interrelations.
Advantages Limitations
1. Enables us to understand fully the behavior 1. Case situations are seldom comparable
pattern of the concerned unit. and as such the information gathered in case
studies is often not comparable.

2. Enables to trace out the natural history of the 2. Time consuming and expensive.
social unit and its relationship with social factors
and the forces involved in its surrounding
environment.

3. It helps in formulating relevant hypotheses 3. Case study method is based on several


along with the data which may be helpful in assumptions which may not be very realistic at
testing them. times, and as such the usefulness of case data
is always subject to doubt.

4. The researcher can use one or more of the 4. This method can be used only in a limited
several research methods depending upon the sphere. Sampling is not possible under a case
circumstances. study.
Focus Group Discussion

 Useful to further explore a topic, providing a broader


understanding of why the target group may behave or
think in a particular way

 And assist in determining the reason for attitudes and


beliefs

 Conducted with a small sample of the target group


and

 Used to stimulate discussion and gain greater insights


Focus Group Discussion
Merits
 Useful when exploring cultural values and health beliefs
 Can be used to explore complex issues
 Can be used to develop hypothesis for further research
 Do not require participants to be literate
Demerits
 Lack of privacy/anonymity
 Potential for the risk of ‘group think’
 Potential for group to be dominated by one or two people
 Group leader needs to be skilled at conducting focus groups, dealing with conflict, drawing out
passive participants
 Time consuming to conduct and analyse
Triangulation

Types (Denzin
1978)

Data Investigator Theory Methodological


Triangulatio Triangulatio Triangulatio Triangulation
n n n
TRIANGULATION

 Application and combination of several research methods in the study of the same
phenomenon-Beating the Bias

 Researchers can hope to overcome the weakness or intrinsic biases and the problems
that come from single method, single-observer and single-theory studies

 The purpose of triangulation in qualitative research is to increase the credibility and


validity of the results
Secondary Data

 Secondary data are those data which have been already collected and analysed
by some earlier agency for its own use and later the same data are used by a
different agency

Sources of
Secondary Data

Published Sources Unpublished


Sources
SECONDARY SOURCES

Secondary sources’ interpret, analyse and critique primary sources. They can provide a second-hand
version of events or an interpretation of first-hand accounts. They can tell a story one or more steps
removed from the original person, time, place or event. The following are examples of secondary sources:
• scientific debates
• analyses of clinical trials
• datasets and databases that have been processed, analysed or interpreted
• texts and books that use a variety of primary sources as evidence to back up arguments and/or
conclusions
• analyses/interpretations/critiques of previous research
• book and article reviews
• Biographies
• critiques of literary works
• critiques of art
• television documentaries or science programmes
• analyses of historical events
Published Sources
1Published Sources:
Secondary data is usually gathered from the published (printed) sources. A
few major sources of published information are mentioned below:
•Published articles of local bodies and Central and State Governments.
•Statistical synopses, census records and other reports issued by different
departments of the Government.
•Official statements and publications of the foreign Governments.
•Publications and Reports of chambers of commerce, financial institutions,
trade associations, etc.,
•Magazines, journals and periodicals.
•Publications of Government organisations like the Central Statistical
Organization (CSO), National Sample Survey Organization (NSSO).
•Reports presented by Research Scholars, Bureaus, Economists, etc.,
Unpublished Sources
Statistical data can be obtained from several unpublished
references. Some of the major unpublished sources from which
secondary data can be gathered are:
•The research works conducted by teachers, professors and
professionals.
•The records that are maintained by private and business
enterprises.
•Statistics maintained by different departments and agencies of
the Central and State Governments, Undertakings, Corporations,
etc.,
Precautions in the use of Secondary Data

Before using the secondary data, the investigators should


consider the following factors:

Suitability of data

Adequacy of data

Reliability of
Secondary Data must possess the following
characteristics
Reliability of data – may be tested by checking:
 Who collected the data?
 What were the sources of the data?
 Was the data collected properly?

Suitability of data
Data that are suitable for one enquiry may not be necessarily suitable in another
enquiry
 Objective, scope and nature of the original enquiry must be studied

Adequacy of data – data is considered inadequate, if they are related to area which may be
either narrower or wider than the area of the present enquiry
Data Processing

 The data, after collection, has to be prepared for analysis

 Collected data is raw and it must undergo some processing before analysis

 The result of the analysis are affected a lot by the form of the data

 So, proper data processing is must to get reliable result


Objectives of Data Processing

 Checking the questionnaires and schedules

 Reduction of mass data to manageable proportion

 Sum up the materials so as to prepare tables, charts, graphs and various groupings
and breakdowns for presenting the result

 Minimizing the errors which may creep in at various stage of the survey
Types of Data Processing

1. Manual Data Processing

 Involves human intervention

 Implies many chances for errors, such as delays in data capture, high amount of
operator misprints

 Implies higher labour expenses in regards to spending for equipment and supplies,
rent, etc.
Types of Data Processing

2. Mechanical Data Processing

 Different calculations and processing are performed using mechanical


machines like calculators etc.

 The use of mechanical machines makes data processing easier and less time-
consuming

 The chances of errors also become far less than manual data processing
Types of Data Processing
3. Electronic Data Processing

 Processing of data by use of computer and its programs


Types of Data Processing
4. Real Time Processing

 There is a continual input, process and output of data

 Data has to be processed in a small stipulated time period (real time)

 Eg, when a bank customer withdraws a sum of money from his or her account it is
vital that the transaction be processed and the account balance updated as soon as
possible
Types of Data Processing
5. Batch Processing

 In a batch processing group of transactions collected over a period of time is


collected, entered, processed and then the batch results are produced

 Batch processing requires separate programs for input, process and output

 It is an efficient way of processing high volume of data

 Eg, Payroll system, examination system and billing system


Important Steps in Data Processing

The processing of data involves activities such as

QUESTIONNAIRE
EDITING CODING CLASSIFICATION
CHECKING

GRAPHICAL
DATA ADJUSTING DATA CLEANING TABULATION
REPRESENTATION
Questionnaire Checking
 When the data is collected through questionnaires, the first steps of
data process is to check the questionnaires if they are accepted or not

Not accepted if:


 Gives the impression that respondent could not understand
the questions
 Incomplete partially or fully
 Answered by a person who has
inadequate knowledge
Data Editing

 Process of examining the data collected in


questionnaires/schedules
 to detect errors and omissions
 to correct these when possible
 to make sure the schedules are ready for tabulation
Data Editing

 Editor is responsible for seeing that the data are;


 Accurate as possible
 Consistent with other facts secured
 Uniformly entered
 As complete as possible
Acceptable for tabulation and arranged to facilitate coding
tabulation
Types of Editing

• Data form complete


Editing for quality • Free of bias, errors,
inconsistency and dishonesty

• Modification to facilitate
Editing for
tabulation,
tabulation • Ignoring extremely high/low

• Translating or rewriting
Field editing

• Wrong and replacement


Central editing
Necessity of Editing
 To gather information

 To make data relevant and appropriate for analysis

 To find errors and modify them

 To ensure that the information provided is accurate

 To establish the consistency of data

 To determine whether or not the data are complete

 To obtain the best possible data available


Coding of Data

 Process of assigning numerals or other symbols to answers so that responses can


be put into limited number of categories or classes
 Translating answers into numerical values or assigning numbers to the various
categories of a variable to be used in data analysis
 Coding is done by using a code book, code sheet, and a computer card
 Coding is done on the basis of the instructions given in the codebook
 The codebook gives a numerical code for each variable
Codebook
• A codebook contains coding instructions and the necessary
information about variables in the data set

• A codebook generally contains the following information:


- column number
- record number
- variable number
- variable name
- question number
- instructions for coding

72
Necessity of Coding

 To organize data code

 To form structure for coding

 For interpretation of data

 For conclusions of data coded

 To translating answers into numerical values

 To assign no. to the various categories for data analysis

 It is necessary for efficient analysis


Classification of Data

 The process of arranging the primary data in a definite pattern and presenting it
in a systematic way

 The crude data obtained from experiment or survey is classified according to


their properties

 Classification can be done qualitatively or quantitatively


Objectives of classification

 The classified data is more easily understood

 It presents the facts into a simpler form

 It facilitates quick comparison

 It helps for further statistical treatment such as average, dispersion etc.

 It detects the error easily


Types of classification

Qualitative classification Quantitative classification

Geographical classification Discrete classification

Chronological classification Continuous classification

Qualitative classification
Qualitative Classification

Geographical Classification
 Data are classified by location of occurrence (i.e. area, region) eg No of students who
cleared NEET . district wise

Chronological classification
 Data are classified by time of occurrence of the observations, events
The categories are arranged in chronological order eg, no. of Corona
patients who recovered -recorded from March to September 2020
Qualitative Classification
Qualitative classification (Classification according to attributes)
 Data are classified according to some quality such as religion, literacy, sex,
occupation etc.

 Simple classification
 Classification is made into 2 classes, such as classification by male or female

 Manifold classification
 2 or more than 2 attributes are studied simultaneously
 Eg. Classification according to sex, again marital status and again literacy
Tabulation

 Process of systematic organization and recording of long series of data for


further analysis and interpretation into rows and columns

 It is concise, logical & orderly arrangement of data in a columns & rows


Usefulness of Tabulation
 It presents an overall view of findings in a simpler way

 To identify trends

 It displays relationships in a comparable way between parts of the findings

 It conserves space and reduces explanatory and descriptive statement to a


minimum

 It facilitates the process of comparison

 It provides a basis for various statistical computations


PARTS OF A TABLE

 Table number
 Title of the table
 Caption and stubs
 Body
 Prefatory or head note
 Footnotes
Kinds of Tables

According to According to
According to
Purpose
Originality
Construction

General Special Simple or


Complex
Purpose Purpose Original Table Derived Table One-
Table
Table Table Way
Table

Double or
Manifold
Two-Way Treble Table Table
Table
TABULATION

 Tabulation may be classified as simple and complex tabulation.


 Simple tabulation gives information about one or more groups of independent questions,
whereas complex tabulation shows the division of data into two or more categories and as
such is designed to give information concerning one or more sets of inter-related
questions.
 Simple tabulation generally results in one-way tables which supply answers to questions
about one characteristic of data only.
 Complex tabulation usually results in two-way tables (which give information about two
interrelated characteristics of data), three-way tables (giving information about three
interrelated characteristics of data) or still higher ordered tables, also known as manifold
tables, which supply information about several interrelated characteristics of data.
 Two-way, three-way and manifold tables are all examples of what is sometimes described
as cross-tabulation.
Faculties Number of Users
Science 50

Commerce 70

Arts 90

Total 210
Numbers of User
Faculties
Girls Boys Total

Science 20 30 50
Commerce 30 40 70
Arts 35 55 90
Total 85 125 210
Numbers of User
Total
Faculties Girls Boys (1)+(2)

I Sem II Sem Total (1) I Sem II Sem Total (2)

Science 15 20 35 20 30 50 85

Commerce 35 30 65 45 40 85 150

Arts 25 35 60 35 55 90 150
GUIDELINES FOR TABULATION
• Numbering
First, assign a number to the table for its identification and reference in future.
Such number should be put at the top of the table.
• Heading
Then give a proper heading or title to the table keeping in vie the nature of the
data the table is going to present. Such title should be given in bold and
prominent letters just below the number of the table, For this, the title should be
as short as possible without losing clarity. It should be able to speak what exactly
the table exhibits.
• Abbreviations
No abbreviation should be used in the titles and subtitles.
• Ditto Marks
No ditto marks should be used, as at times it creates confusion on the part of the
observers.
• Clarity
The table should be drawn clearly and completely so that it can be easily
GUIDELINES FOR TABULATION
• Units
 The units of the data presented such as ‘price in Rs.’ Or ‘weights in tonnes’ etc. should be
clearly but briefly stated under the prefatory heading immediately below the title line of the
table. If the different data have different units they should be stated at the top of the
respective columns.
• Fixing the Number of Columns and Rows
Keeping in view the nature and types of data to be presented, the number of rows and
columns should be carefully fixed. Any mistake this respect will vitiate the whole efforts made
in the tabulation.
• Size
The length and width of different columns and rows and those of the table as a whole should
be fixed keeping in mind the size of the paper available and the quantum of data to be
exhibited.
• Marking
Every column and row of the table should be marked with number in a serial order so that it
can be readily referred to as and when needed.
• Overcrowding
Overcrowding of the table with large number of data should be avoided . In such cases, the
Minimization of Main Headings
The number of main headings should be few in order that the main points of the table may be ea
grasped. However, the number of sub-headings may be large.
Self-Explanatory
The caption (column headings), and the studs (row headings) should be self explanatory with
leaving any room for further clarification.
Vicinity
The columns to be compared with each other should be kept close to each other. Similarly,
columns of percentages, averages, etc. should be kept close to the columns of the data.
Approximation
 Figures to be put in the body of the table should be approximated first
Totality
The totals of the rows should be shown in the extreme right column while the totals of the colum
should be shown in the last row of the table.
Arrangement
The items should be arranged in some logical order viz. alphabetical, chronolohical, size, importance
causal relationship to facilitate comparison and analysis of the data.
Indicating the Emphasis
When certain figures need emphasis, they should be shown in boxes or circles or between two
thick bars
Logical Sequence
In the preparation of the table logical sequence must be maintained. The table should be
simple, but compact and should be free from overlapping and ambiguity.
Suitability
The table must be so prepared that it suits the purpose of the enquiry.
Aesthetics:
The table should be drawn with an attractive get up so that it would be appealing to one’s eyes
and mind and one can understand it without much strain. For this, the adjacent rows and
columns should be separated by single, double or thick lines keeping in view the broad classes
and sub-classes used.
Explicity
The data should be tabulated in an explicit fashion without leaving any room for implicit
meaning. The expression ‘etc’. should be avoided as it is likely to create confusion in the mind
of an observe. Similarly, to put a zero data is not available, it should be put rather than the
word zero. When any data is not available, it should be indicated by the abbreviation N.A. (Not
Graphical Representation

 Graphs help to understand the data easily

 A single picture is worth a thousand words-so goes a common saying

 The non statistical minded people also easily understands the data and compares
them

 Most common graphs are bar charts and pie charts in qualitative study and
histogram in quantitative study
Graphical Representation

Advantages
 It is easier to read
 Can show relationship between 2 or more sets of observations in one look
 Universally applicable
 Has high communication power
 Simplifies complex data
 Has more lasting effect on brain
Graphical Representation

Presentation of Qualitative data

1. Bar Diagram
• Consists of equally spaced vertical (or horizontal) rectangular bars of equal
width placed on a common horizontal (or vertical) base line

• The categories are placed on X-axis and their frequencies on Y-axis


Graphical Representation

Health Program at IOM


400
NO. OF STUDENTS

300
200
100
0
BPH
Component Bar diagram
MBBS

B.Optom
Simple Bar diagram
B.Pharma
HEA
LTH
PRO
GRA
M

Multiple Bar diagram


Graphical Representation

2. Pie Chart
• Circular diagram divided into segments and each
segment represent frequency in a category
Graphical Representation
Line diagram
Pictogram

Production of health manpower


yearly

Cartogram
Graphical Representation

Presentation of Quantitative Data


1.Histogram
• Graphical representation of a set of contiguously drawn bars
• Most popular graph for continuous variable
Graphical Representation

Frequency Curve
Frequency Polygon

Scatter Diagram Time Plot


Data Cleaning

 Includes consistency checks and treatment of missing responses

 Although preliminary consistency checks have been made during editing, the
checks at this stage are more thorough and extensive, because they are made by
computer

 Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to
identify out-of-range values for each variable
Data Adjusting

 If any correction needs to be done for the statistical analysis, the data is
adjusted accordingly

 Data adjusting is not always necessary but it may improve the quality of
analysis sometimes

Data Analysis

You might also like