0% found this document useful (0 votes)

19 views

RM-4

The document outlines the essential steps in data preparation and analysis within research methodology, emphasizing the importance of data collection, cleaning, transformation, and analysis techniques. It details processes for editing, coding, data entry, and ensuring data validity, while distinguishing between qualitative and quantitative data analysis methods. Additionally, it discusses bivariate and multivariate statistical techniques, including factor analysis, to analyze relationships between variables and improve measurement validity.

Uploaded by

bookkeyfi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

RM-4

Uploaded by

bookkeyfi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 87

DATA PREPARATION AND

ANALYSIS
• In research methodology, data preparation and analysis are crucial
steps that ensure the validity, reliability, and meaningful
interpretation of research findings.

• These steps help transform raw data into useful insights through
systematic processing and statistical evaluation.
DATA PREPARATION
Step 1: Data Collection

• Primary Data: Collected directly through experiments, surveys,

interviews, or observations.
• Secondary Data: Gathered from existing sources like journals,
government reports, and databases.
• Ensure data accuracy, completeness, and relevance before
proceeding.
Step 2: Data Cleaning

• Remove Duplicates: Eliminate repeated entries.

• Handle Missing Data:
• Remove incomplete cases if the dataset is large.
• Use mean/median imputation for missing values.
• Apply interpolation for time-series data.
• Detect and Remove Outliers:
• Use statistical methods like Z-score (>3 or <-3) or IQR method.
• Normalize Data (if needed):
• Convert values to a common scale (e.g., Min-Max Scaling, Z-score Standardization).
Step 3: Data Transformation

• Convert categorical data into numerical form (e.g., One-Hot Encoding

for categories).
• Aggregate data if needed (e.g., compute averages, totals,
percentages).
• Ensure all data is in a structured format before analysis.
DATA ANALYSIS
• Step 1: Descriptive Statistics

Used to summarize and describe data.

• Measures of Central Tendency: Mean, Median, Mode.

• Measures of Dispersion: Standard Deviation, Variance, Range.

• Visualizations: Histograms, Box Plots, Bar Charts.

Step 2: Inferential Statistics

Used to test hypotheses and draw conclusions.

A. Parametric Tests (if data follows a normal distribution)

• t-Test: Compares means between two groups.
• Example: Does teaching method A result in higher scores than method B?
• ANOVA (Analysis of Variance): Compares means across more than two groups.
• Example: Do different dosages of a drug have significantly different effects?
• Regression Analysis:
• Linear Regression: Determines relationships between dependent and independent variables.
• Multiple Regression: Used for multiple independent variables.

B. Non-Parametric Tests (if data is not normally distributed)

• Mann-Whitney U Test: Alternative to t-test (for two groups).
• Kruskal-Wallis Test: Alternative to ANOVA (for multiple groups).
Step 3: Factorial Design Analysis

• ANOVA for Factorial Design:

• Used to analyze the effects of multiple independent variables (factors) and
their interactions.
• Example: A 2 × 2 factorial design with two factors (e.g., Teaching Method &
Exam Difficulty) will have:
• Main Effects: The effect of each factor separately.
• Interaction Effect: Whether the effect of one factor depends on another.
• Regression Models:
• Used when factorial designs involve continuous variables.
Editing and Coding
Editing and Coding in Research Methods
• Editing and coding are essential steps in research methods that
ensure data accuracy, consistency, and readiness for analysis.
Editing in Research

Editing is the process of reviewing and correcting collected data to ensure completeness, accuracy, and consistency.

Types of Editing

• Field Editing – Done by researchers or field

• supervisors immediately after data collection to clarify incomplete or ambiguous responses.
• Central Editing – Conducted by data editors at a later stage to check for logical consistency, missing values, and
errors.

Key Editing Checks

• Completeness – Are all required fields filled?

• Legibility – Are responses readable and interpretable?
• Consistency – Do responses logically align? (e.g., A respondent claiming to be 15 years old but stating they have a
Ph.D. is inconsistent.)
• Accuracy – Identifying and correcting errors in responses or mis-recorded data.
Coding in Research
• Coding involves assigning numerical or symbolic values to qualitative responses to facilitate analysis.
It is commonly used in survey research and qualitative data analysis.

Types of Coding

• Pre-coding – Assigning codes before data collection, typically for structured questionnaires with fixed
response options. Example:
• Gender: Male = 1, Female = 2, Other = 3
• Post-coding – Done after data collection for open-ended responses by categorizing similar answers.
Example:
• Reasons for choosing a brand:
• Quality = 1
• Price = 2
• Availability = 3
• Thematic Coding – Used in qualitative research to identify recurring themes and patterns.
Coding Process

1.Develop a coding scheme – Define categories and assign

numbers/symbols.
2.Test the coding scheme – Apply it to a small sample and refine it if
needed.
3.Apply codes to the dataset – Ensure consistency in coding across all
responses.
4.Verify accuracy – Double-check for errors and inconsistencies.
DATA ENTRY IN RESEARCH METHODS

• Data entry is the process of inputting collected data into a digital

format for analysis.

• It is a critical step in research to ensure accuracy and reliability.

STEPS IN DATA ENTRY

Step 1: Data Collection Preparation

• Organize data (e.g., surveys, interviews, or experiment results).
• Ensure data is clean and properly formatted.
Step 2: Choosing a Data Entry Method
• Manual Entry: Typing data into a spreadsheet or database (e.g., Excel, SPSS, Google Sheets).
• Automated Entry: Using scanning technology (e.g., Optical Character Recognition (OCR) for handwritten text, or
survey software like Google Forms).
• Direct Digital Collection: Data is recorded directly via online surveys or electronic devices, minimizing human
error.
Step 3: Data Entry Execution
• Enter responses accurately in the correct format.
• Use predefined codes for categorical data (e.g., Male = 1, Female = 2).
• Apply consistency checks (e.g., ensuring numerical data fits expected ranges).
Step 4: Data Validation and Cleaning
• Double-entry method: Entering data twice and comparing for errors.
• Logical checks: Ensuring consistency (e.g., age and education level should be logical).
Data Entry Tools and Software

Basic Tools:
• Microsoft Excel / Google Sheets – Useful for small datasets and basic analysis.
• Google Forms / SurveyMonkey – Automatically collects and structures responses.

Statistical Software:
• SPSS – Used for structured data entry and advanced statistical analysis.
• Stata – Helpful for econometric and social science research.
• R / Python – For large datasets and advanced data processing.

Database Management Tools:

• Microsoft Access – Good for relational databases and structured data.
• SQL Databases (MySQL, PostgreSQL) – Useful for large datasets.
VALIDITY OF DATA IN RESEARCH

• Data validity refers to the accuracy and credibility of data in

measuring what it is intended to measure. Valid data ensures that
research findings are reliable, meaningful, and applicable.
Types of Data Validity

A. Internal Validity
• Ensures that changes in the dependent variable are due to the independent variable and not other
factors.
• Example: In a study on a new teaching method, internal validity ensures that improvements in student
performance are due to the method, not external factors like prior knowledge.

B. External Validity
• Determines if the study results can be generalized to a broader population.
• Example: A study on employee motivation in one company may not apply to all industries.

C. Content Validity
• Checks if the research covers all relevant aspects of the topic.
• Example: A job satisfaction survey should include factors like salary, work environment, and job security.
D. Construct Validity
• Ensures that a test or tool measures the intended concept (construct).
• Example: A stress questionnaire should measure psychological stress, not just
physical fatigue.

E. Criterion Validity
• Compares study results with an external standard.
• Types:
• Concurrent validity: Comparing new measures with established ones (e.g., a new IQ test vs.
an existing IQ test).
• Predictive validity: Checking if a measure predicts future outcomes (e.g., SAT scores
predicting college performance).
QUALITATIVE VS. QUANTITATIVE DATA ANALYSIS

• Data analysis in research can be classified into two main types:

qualitative and quantitative. Both approaches have different

- methods,
- purposes, and
- interpretations
1. Quantitative Data Analysis

Quantitative analysis deals with numerical data and focuses on statistical or mathematical interpretations.
Characteristics:
✅ Uses numbers and measurable data
✅ Structured and objective
✅ Often uses large sample sizes
✅ Results are generalizable

Methods of Quantitative Analysis:

• Descriptive Statistics: Mean, median, mode, percentages, and frequency distribution.
• Inferential Statistics: Hypothesis testing, regression analysis, correlation, ANOVA, chi-square tests.
• Data Visualization: Graphs, charts, histograms, scatter plots.

Examples:
• A survey measuring customer satisfaction on a scale of 1-10.
• Analyzing student test scores to compare performance across schools.
2. Qualitative Data Analysis

Qualitative analysis deals with non-numerical data such as text, audio, video, and open-ended responses.
It focuses on understanding meanings, patterns, and themes.
Characteristics:
✅ Uses words, descriptions, and subjective interpretations
✅ Exploratory and flexible
✅ Often based on small, in-depth samples
✅ Results are context-specific
Methods of Qualitative Analysis:
• Thematic Analysis: Identifying patterns and themes in textual data.
• Content Analysis: Categorizing and coding words, phrases, or sentences.
• Narrative Analysis: Studying stories and personal experiences.
• Discourse Analysis: Examining how language is used in communication.
Examples:
• Analyzing interview responses on job satisfaction.
• Studying social media comments to understand customer opinions.
Key Differences: Qualitative vs.
Quantitative
Aspect Quantitative Analysis Qualitative Analysis
Data Type Numbers, statistics Words, descriptions, themes

Measure variables and test

Objective Explore concepts and meanings
hypotheses

Approach Deductive (testing theories) Inductive (building theories)

Surveys, experiments, structured Interviews, focus groups, open-

Data Collection
observations ended surveys

Sample Size Large and representative Small and in-depth

Results Generalizable to a population Context-specific and detailed
Analysis Tools SPSS, Excel, R, Python NVivo, ATLAS.ti, manual coding
Mixed-Methods Approach

Some studies combine both qualitative and quantitative analysis,

known as mixed-methods research.

✅ Example: A study on employee engagement may use surveys

(quantitative) and interviews (qualitative) to get deeper insights.
APPLICATIONS OF BIVARIATE AND MULTIVARIATE STATISTICAL
TECHNIQUES

• Bivariate and multivariate statistical techniques are widely used in research

to analyze relationships between variables and make predictions.

Bivariate Statistical Techniques

• Bivariate analysis examines the relationship between two variables. It helps
understand how one variable influences another.

Multivariate Statistical Techniques

• Multivariate analysis involves three or more variables and helps in
understanding complex relationships.
s

Common Bivariate Techniques &

Their Applications
Technique Description Applications

Measures the strength and direction - Studying the relationship between

Correlation Analysis of the relationship between two study hours and exam scores.
variables (e.g., Pearson's - Analyzing the link between income
correlation). and spending habits.

Compares the means of two groups - Comparing male vs. female job
T-tests to check if they are significantly satisfaction levels.
different. - Testing the effectiveness of a new
drug compared to a placebo.

- Determining if gender influences

Chi-Square Test Examines the relationship between shopping preferences.
two categorical variables. - Checking if customer satisfaction is
related to product quality ratings.

- Forecasting sales based on

Simple Linear Regression Predicts the value of one variable advertising spending.
based on another (Y = a + bX). - Predicting student performance
based on attendance.
Common Multivariate Techniques & Their Applications

Technique Description Applications

- Predicting house prices based on size, location, and number
Multiple Regression Analysis Examines how multiple independent variables influence a of bedrooms.
dependent variable. - Analyzing factors affecting employee productivity (e.g.,
training, workload, motivation).

Factor Analysis Reduces a large number of variables into fewer underlying - Identifying key dimensions of customer satisfaction.
factors. - Grouping psychological traits into broader personality factors.

Groups similar observations into clusters based on shared - Market segmentation (e.g., grouping customers by
Cluster Analysis characteristics. purchasing behavior).
- Identifying different student learning styles.

Discriminant Analysis Classifies data into predefined groups based on multiple - Predicting whether a customer is likely to default on a loan.
predictors. - Classifying patients into risk categories for a disease.

- Analyzing the impact of education level on income and job

MANOVA (Multivariate Analysis of Variance) Examines differences in multiple dependent variables across satisfaction.
groups. - Studying how different diets affect weight and cholesterol
levels.
FACTOR ANALYSIS
Factor Analysis in Research

• Factor Analysis is a statistical technique used to identify underlying patterns in a dataset

by grouping correlated variables into a smaller set of factors. It helps reduce complexity
while retaining essential information.

Purpose of Factor Analysis

• ✅ Data Reduction – Simplifies large datasets by grouping related variables.

✅ Identifying Latent Constructs – Reveals hidden patterns that are not directly observable.
✅ Improving Measurement Validity – Ensures that survey or questionnaire items measure
the intended concept.
Types of Factor Analysis

A. Exploratory Factor Analysis (EFA)

• Used when researchers do not know the structure of relationships between variables.
• Helps discover underlying factors without predefined expectations.
• Example: Identifying dimensions of job satisfaction (e.g., work environment, salary, career
growth).

B. Confirmatory Factor Analysis (CFA)

• Used when researchers already have a hypothesis about factor structure.

• Confirms whether the expected factor structure fits the data.
• Example: Testing if a personality test accurately measures five personality traits.
Steps in Factor Analysis

Step 1: Data Collection

• Gather quantitative data from surveys, experiments, or datasets.
• Ensure a large sample size (preferably 5-10 responses per variable).
Step 2: Check for Suitability
• Kaiser-Meyer-Olkin (KMO) Test: Checks if data is adequate for factor analysis (values >0.6 are acceptable).
• Bartlett’s Test of Sphericity: Ensures significant correlations among variables.
Step 3: Extract Factors
• Use Principal Component Analysis (PCA) or Common Factor Analysis (CFA) to extract factors.
• Factors are chosen based on Eigenvalues (values >1 are significant).
Step 4: Factor Rotation
• Rotates factors to improve interpretation.
• Varimax Rotation: Maximizes variance for better separation.
• Oblimin Rotation: Allows factors to be correlated.
Step 5: Interpret and Name Factors
• Examine which variables load onto each factor.
• Assign meaningful names to factors based on the grouped variables.
Example of Factor Analysis

• Let’s say a company conducts a survey to measure customer

satisfaction across multiple factors such as product quality, customer
service, and pricing.

• The survey includes 10 questions, each rated on a 1-5 scale (Strongly

Disagree to Strongly Agree).
Step 1: Collect Data
Survey Question Description
Q1 The product is durable.
Q2 The product performs well.
Q3 The product has an appealing design.

Q4 Customer support is helpful.

Q5 Customer service is responsive.
Q6 Staff are friendly and professional.

Q7 The product is reasonably priced.

Q8 The pricing is fair compared to competitors.
Q9 Discounts and promotions are beneficial.

Q10 I am satisfied with my overall experience.

Step 2: Perform Factor Analysis

Using Principal Component Analysis (PCA) with Varimax Rotation, we

analyze correlations among variables.

• After running factor analysis, we get the following factor loadings

(values above 0.5 indicate strong relationships):
:

Step 2: Perform Factor Analysis

Factor 2: Customer
Survey Question Factor 1: Product Quality Factor 3: Pricing & Value
Service
Q1 (Durability) 0.78 0.22 0.14
Q2 (Performance) 0.82 0.19 0.11
Q3 (Design) 0.74 0.25 0.16
Q4 (Support) 0.18 0.81 0.24
Q5 (Service Response) 0.22 0.85 0.26
Q6 (Staff Friendliness) 0.25 0.79 0.28
Q7 (Reasonable Pricing) 0.14 0.22 0.76
Q8 (Fair Pricing) 0.15 0.24 0.81
Q9 (Discounts & Offers) 0.18 0.26 0.78
Q10 (Overall Satisfaction) 0.35 0.41 0.44
Step 3: Interpreting the Factors

Based on factor loadings:

• Factor 1 (Product Quality): Q1, Q2, Q3

• Factor 2 (Customer Service): Q4, Q5, Q6
• Factor 3 (Pricing & Value): Q7, Q8, Q9
• Q10 (Overall Satisfaction) has moderate correlations with all factors,
meaning it’s influenced by multiple factors.
Step 4: Practical Application

How the company benefits from factor analysis:

✅ Instead of analyzing all 10 questions separately, they now focus on 3

main areas.

✅ If customers are unhappy, the company can pinpoint whether it’s

product quality, service, or pricing causing the issue.

✅ Helps marketing and operations teams prioritize improvements

efficiently.
Discriminant Analysis in
Research
• Discriminant Analysis is a statistical technique used to classify observations into
predefined groups based on multiple predictor variables. It is commonly applied in
marketing, finance, medicine, and social sciences.

Purpose of Discriminant Analysis

✅ Classification: Assigns cases to categories (e.g., high-risk vs. low-risk customers).

✅ Prediction: Forecasts group membership based on independent variables.

✅ Understanding Group Differences: Identifies which variables contribute most to

distinguishing groups.
Types of Discriminant Analysis

A. Linear Discriminant Analysis (LDA)

• Used when groups are well-separated and assumptions of normality are met.
• Creates a linear boundary between groups.
• Example: Classifying students as pass or fail based on attendance, study hours, and
test scores.

B. Quadratic Discriminant Analysis (QDA)

• Used when group separation is nonlinear (more flexible but requires more data).
• Example: Distinguishing between different types of cancer based on genetic markers.
Steps in Discriminant Analysis

Step 1: Data Collection

• Collect numerical data with at least one categorical dependent

variable (group label) and several independent variables.

• Example dataset for classifying customers as loyal or non-loyal:

Step 1: Data Collection

Loyalty Status (1 =
Customer ID Annual Income Spending Score Purchase Frequency
Loyal, 0 = Non-loyal)

001 50,000 75 30 1

002 35,000 40 15 0

003 60,000 80 45 1

004 25,000 35 10 0
Step 2: Check Assumptions

• Multivariate normality: Independent variables should be normally

distributed.
• Homogeneity of variance-covariance: Variances should be similar
across groups.
• No multicollinearity: Variables should not be highly correlated.
Step 3: Perform Discriminant Analysis

(i) Calculate Discriminant Function:

D= b1X1+ b2X2+ ...+ bnXn+ C

Where:
• D= Discriminant score
• Xn= Independent variables
• bn= Coefficients
• C = Constant

(ii) Compute Centroids:

• Average discriminant scores for each group.
• Helps in classification (e.g., customers with scores above a threshold are classified as loyal).

(iiI) Assess Model Performance:

• Wilks' Lambda: Measures how well the function separates groups (values closer to 0 indicate a better model).
• Classification Accuracy: Percentage of correctly classified cases.
Applications of Discriminant
Analysis

Field Application

Predicting whether a customer will remain loyal or

Marketing
switch brands.

Identifying creditworthy vs. non-creditworthy

Finance
customers.

Medicine Classifying patients into disease risk categories.

Predicting student dropout risk based on academic

Education
performance.

Distinguishing between high-performing and low-

Human Resources
performing employees.
Example of Discriminant Analysis

A bank wants to classify customers as high-risk vs. low-risk borrowers

based on income, credit score, and debt.

1.Run Discriminant Analysis → Create a function to classify new

customers.

2.Interpret Results → Find key variables that determine credit risk.

3.Apply the Model → Automatically classify new loan applicants.

CLUSTER ANALYSIS
Cluster Analysis is a statistical technique used to group similar observations based on
their characteristics.

Unlike classification, where groups are predefined, clustering finds hidden patterns in
data.

✅ Data Segmentation – Groups similar individuals, products, or behaviors.

✅ Pattern Recognition – Identifies underlying structures in data.

✅ Unsupervised Learning – No predefined categories; the algorithm discovers clusters

automatically.
Types of Cluster Analysis

A. Hierarchical Clustering

• Creates a tree-like structure (dendrogram) that shows how data points are merged
into clusters.

• Two types:
• Agglomerative (Bottom-Up) → Starts with individual points and merges them into clusters.
• Divisive (Top-Down) → Starts with all points in one cluster and splits them into smaller
clusters.

• Example: Grouping customers based on purchasing behavior.

B. K-Means Clustering

• Divides data into 'K' clusters based on similarity.

• Finds cluster centers (centroids) and assigns data points to the closest centroid.
• Requires selecting the number of clusters K in advance.
• Example: Segmenting students into different learning styles.

C. DBSCAN (Density-Based Clustering)

• Groups points based on density rather than fixed numbers of clusters.

• Can detect arbitrarily shaped clusters and identify outliers.
• Example: Detecting fraudulent transactions in banking.
D. Fuzzy C-Means Clustering

• Unlike K-Means, where each point belongs to one cluster, this allows
points to belong to multiple clusters with probabilities.
• Example: Identifying customers who fall into multiple market
segments.
Steps in Cluster Analysis

Step 1: Prepare the Data

• Select relevant variables (e.g., age, income, spending habits).
• Standardize the data (e.g., using Z-scores) to ensure equal weighting.

Step 2: Choose Clustering Method

• Use Hierarchical Clustering for small datasets where relationships matter.
• Use K-Means for larger datasets with well-defined groups.
• Use DBSCAN if clusters have irregular shapes or contain noise.

Step 3: Determine the Number of Clusters

• Elbow Method: Plots within-cluster variation vs. number of clusters and identifies the point where adding clusters does not improve the
model significantly.
• Silhouette Score: Measures how well each point fits into its assigned cluster.

Step 4: Run Cluster Analysis

• Apply clustering algorithm (e.g., K-Means, Hierarchical).
• Visualize clusters using scatter plots, dendrograms, or heatmaps.

Step 5: Interpret and Validate Clusters

• Check cluster characteristics (e.g., demographics, behaviors).
Applications of Cluster Analysis

Field Application

Marketing Segmenting customers based on purchasing behavior.

Healthcare Identifying disease subtypes from medical data.

Education Grouping students based on learning preferences.

Finance Detecting fraudulent transactions.

Social Media Identifying communities in social networks.

Example: Customer Segmentation
with
K-Means
A retail company wants to group customers based on their shopping
behavior.

Customer ID Annual Income Spending Score

001 50,000 70

002 20,000 30

003 80,000 90

004 35,000 40
Process:

1.Run K-Means with K = 3

2.Clusters Formed:
1. Cluster 1 (High Income, High Spending) → Luxury Buyers
2. Cluster 2 (Low Income, Low Spending) → Budget Shoppers
3. Cluster 3 (Mid Income, Mid Spending) → Occasional Buyers

3.Marketing Strategy:
1. Target Cluster 1 with premium products.
2. Offer discounts to Cluster 2 to increase spending.
3. Improve engagement with Cluster 3.
MULTIPLE REGRESSION AND
CORRELATION IN RESEARCH
What is Multiple Regression?

Multiple Regression is a statistical technique used to predict the value of a dependent

variable (Y) based on multiple independent variables (X1, X2, X3, ... Xn).
✅ Helps understand how multiple factors influence an outcome.
✅ Determines the strength and direction of relationships.
✅ Used for prediction and forecasting.

Example:
• A company wants to predict employee performance (Y) based on years of
experience (X1), education level (X2), and training hours (X3).
The Multiple Regression Equation:
Y= b0+ b1X1 +b2X2 + b3X3+ ϵ

Where

• Y = Employee Performance (Dependent Variable)

• X1= Years of Experience
• X2= Education Level
• X3= Training Hours
• b0= Intercept
• b1,b2,b3= Regression Coefficients
• ϵ= Error Term
Types of Multiple Regression

A. Standard Multiple Regression

• All independent variables are entered into the model at once.
• Example: Predicting house prices based on size, location, and number of bedrooms.

B. Stepwise Regression
• Variables are entered one at a time, based on statistical significance.
• Example: Selecting the best predictors for student exam scores.

C. Hierarchical Regression
• Variables are entered in a predefined order based on theory.
• Example: Testing how social media influences consumer purchases while controlling for income.

D. Ridge & Lasso Regression (for large datasets)

• Used when independent variables are highly correlated (multicollinearity).
• Example: Predicting stock prices with many economic indicators.
CORRELATION

Correlation
• Correlation measures the strength and direction of the relationship between two variables.

Types of Correlation

1. Positive Correlation → As one variable increases, the other also increases.

1. Example: Study hours & Exam scores (More study leads to higher scores).

2. Negative Correlation → As one variable increases, the other decreases.

1. Example: Stress & Productivity (Higher stress reduces productivity).

3. No Correlation → No relationship between the variables.

1. Example: Shoe size & Intelligence.

Correlation Coefficient (r):

• r = +1 → Perfect Positive Correlation
• r = -1 → Perfect Negative Correlation
• r = 0 → No Correlation
Differences Between
Regression and Correlation

Feature Correlation Regression

Measures strength & direction of Predicts the value of one variable

Purpose
relationship based on another

No distinction between dependent One variable is dependent (Y),

Direction
& independent variable others are independent (X)

Uses regression equation

Equation No equation, just a coefficient (r) Y=b0+b1X1+ϵY = b_0 + b_1X_1 + \
epsilonY=b0+b1X1+ϵ

Can suggest causation if other

Causation No causation
conditions are met
Applications of
Multiple Regression and Correlation
Field Application

Predicting sales based on advertising spend, pricing,

Marketing
and competition.
Analyzing how GDP, interest rates, and inflation affect
Finance
stock prices.
Studying the impact of diet, exercise, and genetics on
Healthcare
heart disease.
Determining the effect of teacher experience, class
Education size, and student background on academic
performance.
Multidimensional Scaling (MDS) in
Research
What is Multidimensional Scaling (MDS)?

Multidimensional Scaling (MDS) is a statistical technique used to visualize similarities or dissimilarities

among data points in a lower-dimensional space (typically 2D or 3D).

✅ Transforms complex relationships into an easy-to-interpret spatial representation.

✅ Used when distances or dissimilarities between objects are known but not explicit features.

✅ Finds hidden structures in data by preserving relative distances.

Example:
• A marketing team wants to understand how customers perceive different brands. MDS can create a
map where brands placed closer together are more similar in perception.
Types of MDS

A. Classical MDS (Metric MDS)

• Uses known distances (Euclidean distances or correlation-based dissimilarities).
• Assumes a linear relationship between distances and spatial representation.
• Example: Mapping cities based on actual travel distances.

B. Non-Metric MDS
• Works with ranked similarities or dissimilarities instead of exact distances.
• Focuses on ordinal relationships (preserving order rather than exact values).
• Example: Mapping people's perceptions of different smartphone brands based
on survey rankings.
Steps in MDS Analysis

Step 1: Collect Similarity or Dissimilarity Data

• Gather a distance matrix representing the perceived or actual dissimilarities
between objects.
.
• Interpretation: A and B are closer in perception than A and D.
Brand A B C D

A
0 3 6 9
B
3 0 5 8
C
6 5 0 4
D 9 8 4 0
Step 2: Choose Number of Dimensions (Typically 2 or 3)
• Fewer dimensions = easier visualization but potential loss of
accuracy.
• More dimensions = more accuracy but harder to interpret.

Step 3: Apply MDS Algorithm

• Classical MDS → Uses eigenvalue decomposition.
• Non-Metric MDS → Uses stress function minimization.
Step 4: Plot the MDS Map

• Objects closer together are more similar.

• Objects farther apart are more dissimilar.

Step 5: Interpret the Map

• Identify clusters or patterns.

• Check axes meaning (if applicable).
Applications of MDS

Field Application

Understanding brand perception and customer

Marketing
preferences.

Psychology Mapping relationships between personality traits.

Healthcare Analyzing similarities in symptoms across diseases.

Identifying groups of stocks based on price

Finance
movements.

Sociology Understanding how people perceive social issues.

Example: Perceptual Mapping of
Soft Drinks
• A company surveys customers on their perception of different soft drinks (Coke, Pepsi, Sprite,
Fanta).
Step 1: Create Dissimilarity Matrix
• Based on survey responses where 1 = very similar, 10 = •very
. different.

Drink Coke Pepsi Sprite Fanta

Coke 0 2 7 8
Pepsi 2 0 6 7
Sprite 7 6 0 3
Fanta 8 7 3 0
Step 2: Run MDS and Generate 2D Map

• Coke & Pepsi are placed close together (high similarity).

• Sprite & Fanta form a different cluster (fruit-flavored drinks).
• Coke & Sprite are far apart (distinct taste profiles).
Conjoint Analysis in Research

• Conjoint Analysis is a statistical technique used to understand how people value different
attributes of a product or service. It helps businesses and researchers determine which
features are most important in decision-making.

✅ Simulates real-world decision-making by asking respondents to choose between different

product profiles.
✅ Identifies the most valued features to optimize product design and pricing.
✅ Used in marketing, product development, and pricing strategies.

Example:
• A smartphone company wants to know whether customers value battery life, camera
quality, or price more when buying a phone.
Example: Conjoint Analysis for
Fast Food Menu
A fast-food chain wants to launch a new burger and needs to decide on
pricing, patty type, and portion size.
• Attributes & Levels:
:
Attribute Level 1 Level 2 Level 3

Price $5 $7 $9

Patty Type Beef Chicken Plant-Based

Portion Size Small Medium Large

Survey Question Example:
"Which burger would you buy?"
• 🍔 Burger A: $5, Beef Patty, Small
• 🍔 Burger B: $7, Chicken Patty, Medium
• 🍔 Burger C: $9, Plant-Based Patty, Large
Findings from Conjoint Analysis

Attribute Importance Score

Price 40%
Patty Type 35%
Portion Size 25%

Conclusion: Customers care most about price, followed by patty type. The company should focus on offering a
competitively priced burger with preferred patty options.
Types of Conjoint Analysis

A. Full-Profile Conjoint Analysis

• Respondents evaluate multiple product profiles, each with different attribute combinations.
• Example: Choosing between laptops with different RAM, screen sizes, and prices.

B. Choice-Based Conjoint (CBC) Analysis (Most Common)

• Participants choose between two or more options, just like in a real purchase scenario.
• Example: Choosing between different car models based on fuel efficiency, price, and brand.

C. Adaptive Conjoint Analysis (ACA)

• The survey adapts based on previous responses, focusing on attributes important to each respondent.
• Example: Personalizing a vacation package based on individual preferences.

D. Rating-Based Conjoint Analysis

• Respondents rate different product profiles instead of choosing between them.
• Example: Rating different coffee blends on a scale of 1-10.
Steps in Conjoint Analysis
Step 1: Identify Key Attributes and Levels
• Choose the attributes (features) that influence decision-making.
• Define levels for each attribute.

•.
Attribute Level 1 Level 2 Level 3
Price $500 $800 $1200
Battery Life 12 hours 24 hours 36 hours
Camera Quality 12 MP 24 MP 48 MP
Step 2: Create Product Profiles
• Combine different attribute levels to generate product profiles.
• Example:

:
Product Option Price Battery Life Camera Quality
A $500 12 hours 12 MP
B $800 24 hours 24 MP
C $1200 36 hours 48 MP
Step 3: Collect Responses

• Use surveys to ask respondents which option they prefer (Choice-

Based Conjoint) or to rank/rate different profiles.
• Example survey question:
Which smartphone would you buy?
• Option A: $500, 12-hour battery, 12 MP camera
• Option B: $800, 24-hour battery, 24 MP camera
• Option C: $1200, 36-hour battery, 48 MP camera
Step 4: Analyze Data
• Use regression models to calculate part-worth utilities (importance of each feature).
• Find out which attribute influences customer choice the most.
• Interpretation: Customers care most about price, followed by battery life and camera quality.

.
Attribute Importance Score

Price 50%

Battery Life 30%

Camera Quality 20%

Applications of Conjoint Analysis

Field Application

Optimizing product features based on customer

Marketing
preferences.
Designing health insurance plans based on patient
Healthcare
preferences.
Understanding which car features customers value
Automobile
most.
Determining the best pricing strategy for new
Retail
products.
Application of Statistical Software
for Data Analysis
Statistical software is used in research and business to analyze data,
visualize trends, and test hypotheses. These tools help process large
datasets, apply statistical tests, and generate insights efficiently.

✅ Automates data analysis & reduces errors

✅ Performs complex statistical tests (Regression, ANOVA, Factor
Analysis, etc.)
✅ Creates data visualizations (graphs, heatmaps, etc.)
Popular Statistical Software & Their
Applications
Software Best For Common Uses
Regression, ANOVA, Factor
SPSS (IBM) Social sciences, business
Analysis
Machine learning, time-series
R Data science, academic research
analysis
Python (Pandas, NumPy,
AI, data science Predictive modeling, visualization
Statsmodels, Scikit-learn)
Descriptive stats, correlation,
Excel (with Analysis ToolPak) Basic statistical analysis
regression
Panel data analysis, survival
STATA Economics, public health
analysis
Predictive analytics, big data
SAS Healthcare, finance
handling
Minitab Manufacturing, quality control Six Sigma, process improvement
EViews Economics, finance Time series forecasting
Applications of Statistical Software
in Data Analysis
A. Data Preparation & Cleaning
B. Descriptive Statistics
C. Hypothesis Testing
✅ Software Used: SPSS, R, Python, STATA
✅ Common Tests:
• T-tests (compare two groups)
• Chi-Square Test (association between categorical variables)
• ANOVA (compare more than two groups)

D. Correlation & Regression Analysis

✅ Software Used: SPSS, Excel, Python, R
✅ Tasks:
• Find relationships between variables
• Predict outcomes with multiple regression
E. Factor Analysis & Principal Component Analysis (PCA)
✅ Software Used: SPSS, R, Python (Scikit-learn)
✅ Used For:
• Reducing large datasets into key components
• Identifying hidden patterns
🔹 Example: In SPSS, conducting factor analysis:
• Click Analyze → Dimension Reduction → Factor

F. Cluster Analysis (Segmentation)

✅ Software Used: R, Python, SPSS
✅ Used For:
• Customer segmentation
• Grouping data points based on similarity
• 🔹 Example: In Python, K-Means clustering:
G. Time Series Analysis & Forecasting
✅ Software Used: R, Python (Statsmodels), EViews
✅ Used For:
• Predicting sales, stock prices, weather trends
• Identifying seasonal trends

H. Data Visualization
✅ Software Used: Tableau, Python (Matplotlib, Seaborn), R (ggplot2)
✅ Used For:
• Creating charts, heatmaps, histograms
• Making data insights easier to interpret
Choosing the Right Statistical
Software
Need Best Software

Basic Analysis (Mean, Median, T-tests, Regression) SPSS, Excel, R

Big Data Analysis & Machine Learning Python, R, SAS

Survey & Market Research SPSS, Qualtrics, STATA

Time Series & Forecasting EViews, R, Python

Business Intelligence & Dashboards Tableau, Power BI

Neil J. Salkind - Encyclopedia of Research Design (2010, SAGE Publications, Inc) PDF
92% (12)
Neil J. Salkind - Encyclopedia of Research Design (2010, SAGE Publications, Inc) PDF
1,644 pages
Exploratory Factor Analysis
100% (10)
Exploratory Factor Analysis
170 pages
Consumer Ethnocentrism: A Test of Antecedents and Moderators
No ratings yet
Consumer Ethnocentrism: A Test of Antecedents and Moderators
13 pages
2017 - OPUS Quant Advanced PDF
100% (1)
2017 - OPUS Quant Advanced PDF
205 pages
Research Methods 4
No ratings yet
Research Methods 4
39 pages
Unit - 7 - New of New -02
No ratings yet
Unit - 7 - New of New -02
79 pages
Research Proposal Components-Methodology
No ratings yet
Research Proposal Components-Methodology
27 pages
Topic Demo
No ratings yet
Topic Demo
25 pages
Search For
No ratings yet
Search For
19 pages
Unit Iv BRM
No ratings yet
Unit Iv BRM
15 pages
Assign02 Ques03
No ratings yet
Assign02 Ques03
7 pages
Assign02 Ques03
No ratings yet
Assign02 Ques03
7 pages
Q4 WEEK 1 LESSON 1 Data Analysis Method - Discussion
No ratings yet
Q4 WEEK 1 LESSON 1 Data Analysis Method - Discussion
54 pages
Conducting and Gathering Information From Surveys, Experiments or Observations
No ratings yet
Conducting and Gathering Information From Surveys, Experiments or Observations
53 pages
Gathers Information From Survey Experiments or Observation
No ratings yet
Gathers Information From Survey Experiments or Observation
20 pages
BRM Unit-4
No ratings yet
BRM Unit-4
18 pages
III-Topic 5 Finding The Answers To The Research Questions (Data Analysis Method)
100% (2)
III-Topic 5 Finding The Answers To The Research Questions (Data Analysis Method)
34 pages
Iii Las 10 1
No ratings yet
Iii Las 10 1
11 pages
English - 3is - Q2 - LP 9
No ratings yet
English - 3is - Q2 - LP 9
12 pages
Group1 III
No ratings yet
Group1 III
30 pages
Research Method Unit 4
No ratings yet
Research Method Unit 4
27 pages
RM Module 1
No ratings yet
RM Module 1
63 pages
Stats Note
No ratings yet
Stats Note
8 pages
RESEARCH Chapter 6
No ratings yet
RESEARCH Chapter 6
5 pages
Business Business Research Research Methods Methods
100% (1)
Business Business Research Research Methods Methods
40 pages
EAPP
No ratings yet
EAPP
41 pages
Data Analysis
No ratings yet
Data Analysis
12 pages
Reviewer 3
No ratings yet
Reviewer 3
5 pages
Data Collection Methods
No ratings yet
Data Collection Methods
44 pages
Q4 Module 1.1
No ratings yet
Q4 Module 1.1
22 pages
Unit 5
No ratings yet
Unit 5
63 pages
Gathering Information From Surveys Experiments and Observations
No ratings yet
Gathering Information From Surveys Experiments and Observations
8 pages
New Week 3 4
No ratings yet
New Week 3 4
15 pages
Understanding Data and Ways To Systematically Collectdata
No ratings yet
Understanding Data and Ways To Systematically Collectdata
33 pages
Understanding Data and Ways To Systematically Collectdata
No ratings yet
Understanding Data and Ways To Systematically Collectdata
33 pages
Note 1
No ratings yet
Note 1
14 pages
SUMMARY-PR2-2nd-Q
No ratings yet
SUMMARY-PR2-2nd-Q
5 pages
BRM Unit-4
No ratings yet
BRM Unit-4
116 pages
LAS-1 III Q4 Data-Analysis-Method
No ratings yet
LAS-1 III Q4 Data-Analysis-Method
16 pages
3is Q4 M1 LESSON 1.1
No ratings yet
3is Q4 M1 LESSON 1.1
48 pages
Lesson 18 Quantitative Data Collection Methods
No ratings yet
Lesson 18 Quantitative Data Collection Methods
20 pages
Data Preparation and Preliminary Data Analysis
No ratings yet
Data Preparation and Preliminary Data Analysis
31 pages
m
No ratings yet
m
3 pages
3i's - 4th Quarter Reviewer
100% (1)
3i's - 4th Quarter Reviewer
5 pages
Finding The Answers To The Research Questions Lesson 1-Data Analysis Method
No ratings yet
Finding The Answers To The Research Questions Lesson 1-Data Analysis Method
4 pages
Data For Research
No ratings yet
Data For Research
73 pages
RESEARCH METHODOLOGY
No ratings yet
RESEARCH METHODOLOGY
46 pages
TOPIC 1 - Obtaining Data
No ratings yet
TOPIC 1 - Obtaining Data
26 pages
Research Process 2.0
No ratings yet
Research Process 2.0
17 pages
Data Preparation and Analysis Final
No ratings yet
Data Preparation and Analysis Final
14 pages
Research Methodology: Lecture No
No ratings yet
Research Methodology: Lecture No
36 pages
3is Module 5 Lecture Notes
No ratings yet
3is Module 5 Lecture Notes
4 pages
BRM-Module 2
No ratings yet
BRM-Module 2
8 pages
Research Method - Unit 4
No ratings yet
Research Method - Unit 4
55 pages
Data Preparation and Processing
No ratings yet
Data Preparation and Processing
30 pages
What Is a Research Design - Role
No ratings yet
What Is a Research Design - Role
3 pages
Econ 656 - Research Methods v-
No ratings yet
Econ 656 - Research Methods v-
87 pages
Research Methodoly 151 298
No ratings yet
Research Methodoly 151 298
148 pages
Cha 3 Data Analysis and Techniques New
No ratings yet
Cha 3 Data Analysis and Techniques New
60 pages
Data Preparation & Univariate Analysis
No ratings yet
Data Preparation & Univariate Analysis
18 pages
Data Preparation and Analysis 3
No ratings yet
Data Preparation and Analysis 3
182 pages
ICT Tools in Research - 10-11june2020
No ratings yet
ICT Tools in Research - 10-11june2020
40 pages
Mixed Methods Research: Applying AI Tools for Effective Writing and Publishing
From Everand
Mixed Methods Research: Applying AI Tools for Effective Writing and Publishing
Krishna Bista
No ratings yet
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet
Structural Equation Modeling with Mplus Basic Concepts Applications and Programming 1st Edition Barbara M. Byrne all chapter instant download
100% (8)
Structural Equation Modeling with Mplus Basic Concepts Applications and Programming 1st Edition Barbara M. Byrne all chapter instant download
50 pages
Psychometrics Workbook
No ratings yet
Psychometrics Workbook
140 pages
Marketing Research: PGDM 2017-19
No ratings yet
Marketing Research: PGDM 2017-19
40 pages
BESAAManual
No ratings yet
BESAAManual
12 pages
Group1 Final Report
No ratings yet
Group1 Final Report
16 pages
Dimensions of Organizational Commitment
No ratings yet
Dimensions of Organizational Commitment
24 pages
Measuring Technology Self Efficacy
No ratings yet
Measuring Technology Self Efficacy
9 pages
Curry Và Sinclair (2002
No ratings yet
Curry Và Sinclair (2002
5 pages
(Sici) 1522-7162 (1996) 4 1 31 Aid-Depr3 3.0.co 2-I PDF
No ratings yet
(Sici) 1522-7162 (1996) 4 1 31 Aid-Depr3 3.0.co 2-I PDF
3 pages
Measuring Innovative Work Behavior: Article
No ratings yet
Measuring Innovative Work Behavior: Article
15 pages
Meu Ams-C1
No ratings yet
Meu Ams-C1
29 pages
Analysis of Fatigue Properties of Unmodified and Polyethylene Terephthalate Modified Asphalt Mixtures Using Response Surface Methodology
No ratings yet
Analysis of Fatigue Properties of Unmodified and Polyethylene Terephthalate Modified Asphalt Mixtures Using Response Surface Methodology
11 pages
ProjectReport UM18219
No ratings yet
ProjectReport UM18219
11 pages
Saino 2019
No ratings yet
Saino 2019
16 pages
Kashdanetal Fivedimensionalcuriosityscalerevised PAID
No ratings yet
Kashdanetal Fivedimensionalcuriosityscalerevised PAID
11 pages
Dream Inventory
No ratings yet
Dream Inventory
27 pages
Unit 12
No ratings yet
Unit 12
15 pages
CVSCALE
No ratings yet
CVSCALE
13 pages
asscher-glikson-2021-human-evaluations-of-machine-translation-in-an-ethically-charged-situation (1)
No ratings yet
asscher-glikson-2021-human-evaluations-of-machine-translation-in-an-ethically-charged-situation (1)
21 pages
Academic Advising Does It Really Impact Student Success
No ratings yet
Academic Advising Does It Really Impact Student Success
15 pages
Factor Spesific Approach To Customer Satisfaction With Construction
No ratings yet
Factor Spesific Approach To Customer Satisfaction With Construction
14 pages
The Intensity of The Implementation of High-Performance Work Practices in Selected Sri Lankan Companies
No ratings yet
The Intensity of The Implementation of High-Performance Work Practices in Selected Sri Lankan Companies
28 pages
(Ebook) IBM SPSS Statistics 25 Step by Step: A Simple Guide and Reference by Darren George; Paul Mallery ISBN 9781138491045, 1138491047 instant download
100% (3)
(Ebook) IBM SPSS Statistics 25 Step by Step: A Simple Guide and Reference by Darren George; Paul Mallery ISBN 9781138491045, 1138491047 instant download
56 pages
Enhance Green Purchase Green Perceived Value, Risk, Green Trust
No ratings yet
Enhance Green Purchase Green Perceived Value, Risk, Green Trust
20 pages
Bharadwaja, M., & Tripathi, N. (2020) - Linking Empowering Leadership and Job Attitudes The Role of Psychological Empowerment
No ratings yet
Bharadwaja, M., & Tripathi, N. (2020) - Linking Empowering Leadership and Job Attitudes The Role of Psychological Empowerment
18 pages
T. Mannarini - M. Legittimo - C.Talò: Determinants of Social and Political Participation Among Youth. A Preliminary Study
No ratings yet
T. Mannarini - M. Legittimo - C.Talò: Determinants of Social and Political Participation Among Youth. A Preliminary Study
23 pages

RM-4

Uploaded by

RM-4

Uploaded by

DATA PREPARATION AND

• Primary Data: Collected directly through experiments, surveys,

• Remove Duplicates: Eliminate repeated entries.

• Convert categorical data into numerical form (e.g., One-Hot Encoding

Used to summarize and describe data.

• Measures of Central Tendency: Mean, Median, Mode.

• Measures of Dispersion: Standard Deviation, Variance, Range.

• Visualizations: Histograms, Box Plots, Bar Charts.

Used to test hypotheses and draw conclusions.

A. Parametric Tests (if data follows a normal distribution)

B. Non-Parametric Tests (if data is not normally distributed)

• ANOVA for Factorial Design:

• Field Editing – Done by researchers or field

Key Editing Checks

• Completeness – Are all required fields filled?

1.Develop a coding scheme – Define categories and assign

• Data entry is the process of inputting collected data into a digital

• It is a critical step in research to ensure accuracy and reliability.

Step 1: Data Collection Preparation

Database Management Tools:

• Data validity refers to the accuracy and credibility of data in

• Data analysis in research can be classified into two main types:

Methods of Quantitative Analysis:

Measure variables and test

Approach Deductive (testing theories) Inductive (building theories)

Surveys, experiments, structured Interviews, focus groups, open-

Sample Size Large and representative Small and in-depth

Some studies combine both qualitative and quantitative analysis,

✅ Example: A study on employee engagement may use surveys

• Bivariate and multivariate statistical techniques are widely used in research

Bivariate Statistical Techniques

Multivariate Statistical Techniques

Common Bivariate Techniques &

Measures the strength and direction - Studying the relationship between

- Determining if gender influences

- Forecasting sales based on

Technique Description Applications

- Analyzing the impact of education level on income and job

• Factor Analysis is a statistical technique used to identify underlying patterns in a dataset

Purpose of Factor Analysis

• ✅ Data Reduction – Simplifies large datasets by grouping related variables.

A. Exploratory Factor Analysis (EFA)

B. Confirmatory Factor Analysis (CFA)

• Used when researchers already have a hypothesis about factor structure.

Step 1: Data Collection

• Let’s say a company conducts a survey to measure customer

• The survey includes 10 questions, each rated on a 1-5 scale (Strongly

Q4 Customer support is helpful.

Q7 The product is reasonably priced.

Q10 I am satisfied with my overall experience.

Using Principal Component Analysis (PCA) with Varimax Rotation, we

• After running factor analysis, we get the following factor loadings

Step 2: Perform Factor Analysis

Based on factor loadings:

• Factor 1 (Product Quality): Q1, Q2, Q3

How the company benefits from factor analysis:

✅ Instead of analyzing all 10 questions separately, they now focus on 3

✅ If customers are unhappy, the company can pinpoint whether it’s

✅ Helps marketing and operations teams prioritize improvements

Purpose of Discriminant Analysis

✅ Classification: Assigns cases to categories (e.g., high-risk vs. low-risk customers).

✅ Prediction: Forecasts group membership based on independent variables.

✅ Understanding Group Differences: Identifies which variables contribute most to

A. Linear Discriminant Analysis (LDA)

B. Quadratic Discriminant Analysis (QDA)

Step 1: Data Collection

• Collect numerical data with at least one categorical dependent

• Example dataset for classifying customers as loyal or non-loyal:

• Multivariate normality: Independent variables should be normally

(i) Calculate Discriminant Function:

D= b1​X1​+ b2​X2​+ ...+ bn​Xn​+ C

(ii) Compute Centroids:

(iiI) Assess Model Performance:

Predicting whether a customer will remain loyal or

Identifying creditworthy vs. non-creditworthy

Medicine Classifying patients into disease risk categories.

D= b1X1+ b2X2+ ...+ bnXn+ C