0% found this document useful (0 votes)
4 views

Session 3 Data Collection Analysis and Interpretation

The document outlines the processes and methods of data collection, analysis, and interpretation, emphasizing the importance of data quality and various collection techniques such as surveys, interviews, and observations. It also discusses common challenges faced during data collection, including issues with data accuracy, consistency, and the abundance of data. Finally, it highlights key steps and best practices for effective data collection and analysis, including the use of qualitative data analysis methods like ANOVA and MANOVA.

Uploaded by

Jahnvi Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Session 3 Data Collection Analysis and Interpretation

The document outlines the processes and methods of data collection, analysis, and interpretation, emphasizing the importance of data quality and various collection techniques such as surveys, interviews, and observations. It also discusses common challenges faced during data collection, including issues with data accuracy, consistency, and the abundance of data. Finally, it highlights key steps and best practices for effective data collection and analysis, including the use of qualitative data analysis methods like ANOVA and MANOVA.

Uploaded by

Jahnvi Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DATA COLLECTION,

ANALYSIS,
AND
INTERPRETATION

PROF. PRAVIN KUMAR


DEPT. OF MECHANICAL ENGINEERING
DELHI TECHNOLOGICAL UNIVERSITY
DELHI-110042
DATA COLLECTION

• Data collection is the process of collecting and evaluating


information or data from multiple sources to find answers
to research problems, answer questions, evaluate
outcomes, and forecast trends and probabilities. It is an
essential phase in all types of research, analysis, and
decision-making.
IMPORTANT POINTS FOR DATA
COLLECTION

• What’s the goal or purpose of this research?


• What kinds of data are they planning on gathering?
• What methods and procedures will be used to collect, store,
and process the information?
DIFFERENT DATA COLLECTION METHODS

1. Primary Data Collection


• Surveys and Questionnaires: Researchers design structured questionnaires
or surveys to collect data from individuals or groups. These can be conducted
through face-to-face interviews, telephone calls, mail, or online platforms.
• Interviews: Interviews involve direct interaction between the researcher and
the respondent. They can be conducted in person, over the phone, or through
video conferencing. Interviews can be structured (with predefined questions),
semi-structured (allowing flexibility), or unstructured (more conversational).
DIFFERENT DATA COLLECTION METHODS

• Observations: Researchers observe and record behaviors, actions, or events in their


natural setting. This method is useful for gathering data on human behavior,
interactions, or phenomena without direct intervention.
• Experiments: Experimental studies involve manipulating variables to observe their
impact on the outcome. Researchers control the conditions and collect data to
conclude cause-and-effect relationships.
• Focus Groups: Focus groups bring together a small group of individuals who discuss
specific topics in a moderated setting. This method helps in understanding the
opinions, perceptions, and experiences shared by the participants.
DIFFERENT DATA COLLECTION METHODS

2. Secondary Data Collection


• Published Sources: Researchers refer to books, academic journals, magazines,
newspapers, government reports, and other published materials that contain relevant
data.
• Online Databases: Numerous online databases provide access to a wide range of
secondary data, such as research articles, statistical information, economic data, and
social surveys.
• Government and Institutional Records: Government agencies, research institutions,
and organizations often maintain databases or records that can be used for research
purposes.
DIFFERENT DATA COLLECTION METHODS

• Publicly Available Data: Data shared by individuals, organizations,


or communities on public platforms, websites, or social media can
be accessed and utilized for research.
• Past Research Studies: Previous research studies and their
findings can serve as valuable secondary data sources.
Researchers can review and analyze the data to gain insights or
build upon existing knowledge.
DATA COLLECTION TOOLS

• Word Association
The researcher gives the respondent a set of words and asks them what comes to mind
when they hear each word.
• Sentence Completion
Researchers use sentence completion to understand the respondent's ideas. This tool
involves giving an incomplete sentence and seeing how the interviewee finishes it.
• Role-Playing
Respondents are presented with an imaginary situation and asked how they would act or
react if it were real.
DATA COLLECTION TOOLS

• In-Person Surveys
The researcher asks questions in person.
• Online/Web Surveys
These surveys are easy to accomplish, but some users may be unwilling to answer
truthfully, if at all.
• Mobile Surveys
These surveys take advantage of the increasing proliferation of mobile technology. Mobile
collection surveys rely on mobile devices like tablets or smartphones to conduct surveys
via SMS or mobile apps.
DATA COLLECTION TOOLS

• Phone Surveys
No researcher can call thousands of people at once, so they need a third party
to handle the chore. However, many people have call screening and won’t
answer.
• Observation
Sometimes, the simplest method is the best. Researchers who make direct
observations collect data quickly and easily, with little intrusion or third-party
bias. Naturally, this method is only effective in small-scale situations.
COMMON CHALLENGES IN DATA COLLECTION

• Data Quality Issues


The main threat to the broad and successful application of machine learning is poor
data quality. Data quality must be your top priority if you want to make technologies
like machine learning work for you. Let's talk about some of the most prevalent data
quality problems in this blog article and how to fix them.
• Inconsistent Data
When working with various data sources, it's conceivable that the same information
will have discrepancies between sources. The differences could be in formats, units, or
occasionally spellings. Inconsistencies in data tend to accumulate and reduce the
value of data if they are not continually resolved. Organizations that focus heavily on
data consistency do so because they only want reliable data to support their analytics.
COMMON CHALLENGES IN DATA COLLECTION
• Data Downtime
Data is the driving force behind the decisions and operations of data-driven businesses.
However, there may be brief periods when their data is unreliable or not prepared. Customer
complaints and subpar analytical outcomes are only two ways this data unavailability can
significantly impact businesses. A data engineer spends significant amount of their time
updating, maintaining, and guaranteeing the integrity of the data. To ask the next business
question, there is a high marginal cost due to the lengthy operational lead time from data
capture to insight.
• Ambiguous Data
Even with thorough oversight, some errors can still occur in massive databases or data lakes.
The issue becomes more overwhelming when data streams at a fast speed. Spelling mistakes
can go unnoticed, formatting difficulties can occur, and column heads might be deceptive. This
unclear data might cause several problems for reporting and analytics.
COMMON CHALLENGES IN DATA COLLECTION
• Duplicate Data
Streaming data, local databases, and cloud data lakes are just a few of the data sources that modern
enterprises must contend with. These sources are likely to duplicate and overlap each other quite a bit.
For instance, duplicate contact information has a substantial impact on customer experience. Marketing
campaigns suffer if certain prospects are ignored while others are engaged repeatedly. The likelihood
of biased analytical outcomes increases when duplicate data are present. It can also result in ML
models with biased training data.
• Abundance of Data
While we emphasize data-driven analytics and its advantages, a data quality problem with excessive
data exists. There is a risk of getting lost in abundant data when searching for information pertinent to
your analytical efforts. Data scientists, data analysts, and business users devote 80% of their work to
finding and organizing the appropriate data. With increased data volume, other problems with data
quality become more serious, mainly when dealing with streaming data and significant files or
databases.
COMMON CHALLENGES IN DATA COLLECTION
• Inaccurate Data
Data accuracy is crucial for highly regulated businesses like healthcare. Inaccurate information does not
provide a true picture of the situation and cannot be used to plan the best course of action. Data
inaccuracies can be attributed to several things, including data degradation, human mistakes, and data
drift. Worldwide data decay occurs at a rate of about 3% per month, which is quite concerning. Data
integrity can be compromised while transferring between different systems, and data quality might
deteriorate with time.
• Hidden Data
The majority of businesses only utilize a portion of their data, with the remainder sometimes being lost
in data silos or discarded in data graveyards. For instance, the customer service team might not receive
client data from sales, missing an opportunity to build more precise and comprehensive customer
profiles. Missing out on possibilities to develop novel products, enhance services, and streamline
procedures is caused by hidden data.
KEY STEPS IN THE DATA COLLECTION
PROCESS

• Decide What Data You Want to Gather


• Establish a Deadline for Data Collection
• Select a Data Collection Approach
• Gather Information
• Examine the Information and Apply Your Findings
DATA COLLECTION CONSIDERATIONS AND BEST
PRACTICES

• Take Into Account the Price of Each Extra Data Point


• Plan How to Gather Each Data Piece
• Think About Your Choices for Data Collecting Using Mobile Devices
 IVRS (interactive voice response technology) - Will call the respondents
and ask them questions that have already been recorded.
 SMS data collection - Will send a text message to the respondent, who can
then respond to questions by text on their phone.
 Field surveyors - Can directly enter data into an interactive questionnaire
while speaking to each respondent, thanks to smartphone apps.
DATA COLLECTION CONSIDERATIONS AND BEST
PRACTICES

• Carefully Consider the Data You Need to Gather


 What details will be helpful?
 What details are available?
 What specific details do you require?
• Remember to Consider Identifiers
Identifiers, or details describing the context and source of a survey response, are just
as crucial as the information about the subject or program that we are researching.
Adding more identifiers will enable us to pinpoint our program's successes and failures
more accurately, but moderation is the key.
A "data identifier" is a specific piece of personal information within a dataset that can be used to
directly or indirectly identify an individual, often requiring special protection under privacy regulations
QUALITATIVE DATA ANALYSIS
Test Uses/Interpretation
ANOVA • An "Analysis of Variance" (ANOVA) tests three or more groups for
mean differences based on a continuous (i.e. Ratio or interval)
response variable (dependent variable).
• The term "factor" refers to the Categorical variable that distinguishes
this group membership. Race, level of education, and treatment
condition are examples of factors.

"one-way" ANOVA compares levels (i.e. groups) of a single factor based


One-Way on single continuous response variable (e.g. comparing test score by
ANOVA
'level of education')
Company Price
City
of a product

Factor: Categorical Continuous


Variable Response Variable
A "two-way" ANOVA compares levels of two or more factors for
TWO-Way mean differences on a single continuous response variable
ANOVA

City
Company Price
of a product

2-Factor: Continuous
Categorical Variable Response Variable

Store Size
ANCOVA-ANALYSIS OF COVARIANCE

Level of Education
Test Score

Continuous
Factor: Categorical Response Variable
Variable

Number of Hour Spent

Covariance:
Continuous
Independent Variable
MANOVA- MULTIVARIATE ANALYSIS OF VARIANCE

One-Way • "one-way" MANOVA compares levels (i.e. groups) of a single factor based on multiple
MANOVA continuous response variable (e.g. comparing test score, annual income by 'level of
education')
Test Score

Continuous
Response Variable
Level of Education
Annual Income
Factor: Categorical
Variable
Two-Way • “two-way" MANOVA compares levels (i.e. groups) of twofactor based on multiple
MANOVA continuous response variable (e.g. comparing test score, annual income by 'level of
education')
Test Score
Zodiac Sign

Factor: Categorical Continuous


Variable Response Variable

Level of Education Annual Income


MANCOVA

Covariate:
Continuous
Independent Variable
Test Score
Number of Hours Studing
Continuous
Response Variable

Level of Education Annual Income

Factor: Categorical
Variable
THANKS

You might also like