0% found this document useful (0 votes)
6 views

Unit-II

The document outlines the essential steps for preparing data for multivariate analysis, including defining objectives, selecting and cleaning data, organizing it, and visualizing insights. It also discusses approaches for handling missing data, such as case deletion, mean substitution, regression computation, multiple imputation, and hot-deck methods. Lastly, it emphasizes the importance of ensuring data quality and structure to facilitate accurate analysis.

Uploaded by

953621243012
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Unit-II

The document outlines the essential steps for preparing data for multivariate analysis, including defining objectives, selecting and cleaning data, organizing it, and visualizing insights. It also discusses approaches for handling missing data, such as case deletion, mean substitution, regression computation, multiple imputation, and hot-deck methods. Lastly, it emphasizes the importance of ensuring data quality and structure to facilitate accurate analysis.

Uploaded by

953621243012
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Unit-II

PREPARING FOR
MULTIVARIATE ANALYSIS
Conceptualization of Research
model with variables
Preparing data for analysis is an important step in data science
and machine learning that ensures the data is appropriate for
analysis and that any insights are accurate and meaningful.
Steps for preparing data:
• Define objectives and questions
• Select and Collect Data
• Clean and validate data
• Organize and Structure the data
• Transform and enrich data
• Explore and visualize data
Conceptualization of Research
model with variables (Contd.,)
Define Objectives and Questions:
• Consider if the data is complete, precise, and up to date, and if it can answer
the questions.
Select and Collect Data:
• Find reliable data on public sites or buy it from private organizations
Clean and validate data:
• This is one of the most important and time-consuming steps in data analysis.
• It involves checking and fixing any errors, inconsistencies, outliers, missing
values, or duplicates in your data.
Conceptualization of Research
model with variables (Contd.,)
Organize and structure the Data:
• Organize data in a tabular format and use a specific format to make it easier to
process and analyze
• Different types of data require different forms of visualization. For example, bar
graphs are good for discrete categories, while line graphs show changes over time.
• Semi-structured data is not as organized as structured data, but it's easier to analyze
than unstructured data. Qualitative data is unstructured, so you may need
transcription software to convert audio to text for analysis.
Conceptualization of Research
model with variables (Contd.,)
Transform and enrich data:
This ensures the data is in a format that can be easily queried and manipulated. For
example, a marketing team may need to transform customer data to create targeted
marketing campaigns based on demographics or behavior.
Explore and Visualize data:
Data visualization is a crucial part of data analysis and refers to the visual
representation of data in the form of a graph, chart, bar, or any other format. The
purpose of data visualization is to represent the relationship between the data and
images.
Approaches for Dealing with
Missing Data
Approaches for Dealing with
Missing Data (Contd.,)
Case Deletion:
List wise Deletion:
The most common approach is to
remove cases with missing data
and analyze the remaining data
Pairwise deletion:
This method only removes
information when the data point
needed to test an assumption is
missing.
Approaches for Dealing with
Missing Data (Contd.,)
Analysis of the variable
Containing missing Data:
Mean:
The mean value of a variable is used to
replace missing data values for that
variable.
Regression Computation:
This method uses existing variables to
make predictions, which are then Missing Completely At Random (MCAR),
substituted as if they were actual Missing At Random (MAR), or Missing
values. This approach retains more data Not At Random (MNAR)
than list wise or pairwise deletion.
Approaches for Dealing with
Missing Data (Contd.,)
Regression Computation:
x y X^2 XY
2 3 4 6
4 7 16 28
6 5 36 30
8 10 64 80
Approaches for Dealing with
Missing Data (Contd.,)
Multiple Imputation:
• This method generates multiple
observed values for missing data,
which are then used to create multiple
datasets.
• These datasets reflect the uncertainties
associated with the missing values.
• Multiple imputation software can
make the process more accessible.
Approaches for Dealing with
Missing Data (Contd.,)
Hot-Deck :
It impute missing values within a data matrix
by using available values from the same
matrix.
Testing the Assumptions of
Multivariate Analysis
Testing the Assumptions of
Multivariate Analysis Contd.,

You might also like