100% found this document useful (1 vote)
238 views25 pages

Factor Hair Revised PDF

The passage discusses the challenges of summarizing long documents in a concise manner. It notes that automatically generating summaries requires identifying the most important concepts and events while removing unnecessary details. The goal is to produce a high-level overview that conveys the key topics and storyline in just a few sentences.

Uploaded by

Sampada Halve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
238 views25 pages

Factor Hair Revised PDF

The passage discusses the challenges of summarizing long documents in a concise manner. It notes that automatically generating summaries requires identifying the most important concepts and events while removing unnecessary details. The goal is to produce a high-level overview that conveys the key topics and storyline in just a few sentences.

Uploaded by

Sampada Halve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

8/30/2019

Project Report: Advance


Statistics
Factor Hair Revised

Submitted By: Sampada Sumant


Table of Content

A. Factor Hair Revised Case Study Background.

B. Questions to be solved.

1. Perform exploratory data analysis on the dataset. Showcase some

chart, graphs. Check for Outliers and missing value.

2. Is there evidence of multicollinearity? Showcase your analysis.

3. Perform simple linear regression for the dependent variable with

every independent variable.

4. Perform PCA/Factor Analysis by extracting 4 factors. Interpret the

output and name the factors.

5. Perform multiple linear regression with customer satisfaction as

dependent variable and the four factor as independent variables.

Comment on the model output and validity. Your remark should

make it meaningful for everybody.


1. Perform exploratory data analysis on the dataset. Showcase some
chart, graphs. Check for Outliers and missing value.
setwd("C:/Users/Sampada/Desktop/BABI/Assignment/Factor-Hair-Revised")

factor_hair = read.csv("Factor-Hair-Revised.csv", header = TRUE)


summary(factor_hair)
ID ProdQual Ecom TechSup CompRes
Min. : 1.00 Min. : 5.000 Min. :2.200 Min. :1.300 Min. :2.600
1st Qu.: 25.75 1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250 1st Qu.:4.600
Median : 50.50 Median : 8.000 Median :3.600 Median :5.400 Median :5.450
Mean : 50.50 Mean : 7.810 Mean :3.672 Mean :5.365 Mean :5.442
3rd Qu.: 75.25 3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625 3rd Qu.:6.325
Max. :100.00 Max. :10.000 Max. :5.700 Max. :8.500 Max. :7.800
Advertising ProdLine SalesFImage ComPricing WartyClaim
Min. :1.900 Min. :2.300 Min. :2.900 Min. :3.700 Min. :4.100
1st Qu.:3.175 1st Qu.:4.700 1st Qu.:4.500 1st Qu.:5.875 1st Qu.:5.400
Median :4.000 Median :5.750 Median :4.900 Median :7.100 Median :6.100
Mean :4.010 Mean :5.805 Mean :5.123 Mean :6.974 Mean :6.043
3rd Qu.:4.800 3rd Qu.:6.800 3rd Qu.:5.800 3rd Qu.:8.400 3rd Qu.:6.600
Max. :6.500 Max. :8.400 Max. :8.200 Max. :9.900 Max. :8.100
OrdBilling DelSpeed Satisfaction
Min. :2.000 Min. :1.600 Min. :4.700
1st Qu.:3.700 1st Qu.:3.400 1st Qu.:6.000
Median :4.400 Median :3.900 Median :7.050
Mean :4.278 Mean :3.886 Mean :6.918
3rd Qu.:4.800 3rd Qu.:4.425 3rd Qu.:7.625
Max. :6.700 Max. :5.500 Max. :9.900
attach(factor_hair)

names(factor_hair)
[1] "ID" "ProdQual" "Ecom" "TechSup" "CompRes" "Advert
ising"
[7] "ProdLine" "SalesFImage" "ComPricing" "WartyClaim" "OrdBilling" "DelSp
eed"
[13] "Satisfaction"

dim(factor_hair)
[1] 100 13
str(factor_hair)
'data.frame': 100 obs. of 13 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
$ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
$ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
$ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
$ Advertising : num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
$ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
$ SalesFImage : num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
$ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
$ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
$ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
$ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...
$ Satisfaction: num 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7 5.5 ...
head(factor_hair)
ID ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPricing Wart
ycl aim
1 1 8.5 3.9 2.5 5.9 4.8 4.9 6.0 6.8
4.7
2 2 8.2 2.7 5.1 7.2 3.4 7.9 3.1 5.3
5.5
3 3 9.2 3.4 5.6 5.6 5.4 7.4 5.8 4.5
6.2
4 4 6.4 3.3 7.0 3.7 4.7 4.7 4.5 8.8
7.0
5 5 9.0 3.4 5.2 4.6 2.2 6.0 4.5 6.8
6.1
6 6 6.5 2.8 3.1 4.1 4.0 4.3 3.7 8.5
5.1
OrdBilling DelSpeed Satisfaction
1 5.0 3.7 8.2
2 3.9 4.9 5.7
3 5.4 4.5 8.9
4 4.3 3.0 4.8
5 4.5 3.5 7.1
6 3.6 3.3 4.7
tail(factor_hair)
ID ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPricing Wart
yClaim OrdBilling DelSpeed Satisfaction
95 95 9.3 3.8 4.0 4.6 4.7 6.4 5.5 7.4
5.3 3.6 3.4 7.7
96 96 8.6 4.8 5.6 5.3 2.3 6.0 5.7 6.7
5.8 4.9 3.6 7.3
97 97 7.4 3.4 2.6 5.0 4.1 4.4 4.8 7.2
4.5 4.2 3.7 6.3
98 98 8.7 3.2 3.3 3.2 3.1 6.1 2.9 5.6
5.0 3.1 2.5 5.4
99 99 7.8 4.9 5.8 5.3 5.2 5.3 7.1 7.9
6.0 4.3 3.9 6.4
100 100 7.9 3.0 4.4 5.1 5.9 4.2 4.8 9.7
5.7 3.4 3.5 6.4

anyNA(factor_hair)
[1] FALSE
boxplot(factor_hair[,-1], las=2)

From box plot it can be viewed that outliers are present in E-

Commerce, Sales Force Image, Order & Billing & Delivery Speed.

library(reshape2)
> dat.m = melt(factor_hair, id.vars = "ID", measure.vars = names(factor_hair[,-1]))
> ggplot(dat.m)+geom_boxplot(aes(x=ID, y=value, color=variable))

corrplot(cor(factor_hair[2:12]))
hist(factor_hair$Satisfaction, breaks = (0:10), labels = TRUE, main = "Histogram of sa
tisfaction", col = "Brown")

par(mfrow = c(3,4))
hist(factor_hair$Satisfaction, breaks = (0:10), labels = TRUE, main = "Histogram of sa
tisfaction", col = "Brown")

hist(factor_hair$ProdQual, breaks = (0:10), labels = TRUE, main = "Histogram of Produc


t Quality", col = "Brown")

> hist(factor_hair$Ecom, breaks = (0:10), labels = TRUE, main = "Histogram of Ecommerc


e", col = "Brown")

> hist(factor_hair$TechSup, breaks = (0:10), labels = TRUE, main = "Histogram of Techn


ial Support", col = "Brown")

> hist(factor_hair$CompRes, breaks = (0:10), labels = TRUE, main = "Histogram of Compl


aint Resoultion", col = "Brown")

> hist(factor_hair$Advertising, breaks = (0:10), labels = TRUE, main = "Histogram of A


dvertising", col = "Brown")

> hist(factor_hair$ProdLine, breaks = (0:10), labels = TRUE, main = "Histogram of Prod


uct Line", col = "Brown")

> hist(factor_hair$SalesFImage, breaks = (0:10), labels = TRUE, main = "Histogram of S


ales Force Image", col = "Brown")

> hist(factor_hair$ComPricing, breaks = (0:10), labels = TRUE, main = "Histogram of Co


mpetitive Pricing", col = "Brown")

> hist(factor_hair$WartyClaim, breaks = (0:10), labels = TRUE, main = "Histogram of Wa


rranty & Claims", col = "Brown")

> hist(factor_hair$OrdBilling, breaks = (0:10), labels = TRUE, main = "Histogram of Or


der & Billing", col = "Brown")

> hist(factor_hair$DelSpeed, breaks = (0:10), labels = TRUE, main = "Histogram of Deli


very Speed", col = "Brown")
par(mar = rep(2,4))
> plot(ProdQual, Satisfaction, main = "Plot of Product Quality against Satisfaction",
col = "Red")
> plot(Ecom, Satisfaction, main = "Plot of E-commerce against Satisfaction", col = "Re
d")
> plot(TechSup, Satisfaction, main = "Plot of Technical Support against Satisfaction",
col = "Red")
> plot(CompRes, Satisfaction, main = "Plot of Complaint Resolution against Satisfactio
n", col = "Red")
> plot(Advertising, Satisfaction, main = "Plot of Advertising against Satisfaction", c
ol = "Red")
> plot(ProdLine, Satisfaction, main = "Plot of Product Line against Satisfaction", col
= "Red")
> plot(SalesFImage, Satisfaction, main = "Plot of Sales Force Image against Satisfacti
on", col = "Red")
> plot(ComPricing, Satisfaction, main = "Plot of Competitive Pricing against Satisfact
ion", col = "Red")
> plot(WartyClaim, Satisfaction, main = "Plot of Warranty & Claim against Satisfaction
", col = "Red")
> plot(OrdBilling, Satisfaction, main = "Plot of Order & Billing against Satisfaction"
, col = "Red")
> plot(DelSpeed, Satisfaction, main = "Plot of Delivery Speed against Satisfaction", c
ol = "Red")
2. Is there evidence of multicollinearity? Showcase your analysis.
Fh.core = round(cor(myfactor_hair[,1:12]),2)

Fh.core

myfactor_hair = data.frame(factor_hair[,-1])
summary(myfactor_hair)
ProdQual Ecom TechSup CompRes Advertising P
rodLine SalesFImage ComPricing
Min. : 5.000 Min. :2.200 Min. :1.300 Min. :2.600 Min. :1.900 Min.
:2.300 Min. :2.900 Min. :3.700
1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250 1st Qu.:4.600 1st Qu.:3.175 1st
Qu.:4.700 1st Qu.:4.500 1st Qu.:5.875
Median : 8.000 Median :3.600 Median :5.400 Median :5.450 Median :4.000 Medi
an :5.750 Median :4.900 Median :7.100
Mean : 7.810 Mean :3.672 Mean :5.365 Mean :5.442 Mean :4.010 Mean
:5.805 Mean :5.123 Mean :6.974
3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625 3rd Qu.:6.325 3rd Qu.:4.800 3rd
Qu.:6.800 3rd Qu.:5.800 3rd Qu.:8.400
Max. :10.000 Max. :5.700 Max. :8.500 Max. :7.800 Max. :6.500 Max.
:8.400 Max. :8.200 Max. :9.900
For further analysis out of 13 variables, we have nullified ID column
which is not required for the analysis

fhmodel = lm(Satisfaction~.,myfactor_hair)

summary(fhmodel)

Call:
lm(formula = Satisfaction ~ ., data = myfactor_hair)

Residuals:
Min 1Q Median 3Q Max
-1.43005 -0.31165 0.07621 0.37190 0.90120
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.66961 0.81233 -0.824 0.41199
ProdQual 0.37137 0.05177 7.173 2.18e-10 ***
Ecom -0.44056 0.13396 -3.289 0.00145 **
TechSup 0.03299 0.06372 0.518 0.60591
CompRes 0.16703 0.10173 1.642 0.10416
Advertising -0.02602 0.06161 -0.422 0.67382
ProdLine 0.14034 0.08025 1.749 0.08384 .
SalesFImage 0.80611 0.09775 8.247 1.45e-12 ***
ComPricing -0.03853 0.04677 -0.824 0.41235
WartyClaim -0.10298 0.12330 -0.835 0.40587
OrdBilling 0.14635 0.10367 1.412 0.16160
DelSpeed 0.16570 0.19644 0.844 0.40124
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5623 on 88 degrees of freedom


Multiple R-squared: 0.8021, Adjusted R-squared: 0.7774
F-statistic: 32.43 on 11 and 88 DF, p-value: < 2.2e-16

fhmodel
Call:
lm(formula = Satisfaction ~ ., data = myfactor_hair)
Coefficients:
(Intercept) ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage
-0.66961 0.37137 -0.44056 0.03299 0.16703 -0.02602 0.14034 0.80611
ComPricing WartyClaim OrdBilling DelSpeed
-0.03853 -0.10298 0.14635 0.16570

vif(fhmodel)
ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPricing
1.635797 2.756694 2.976796 4.730448 1.508933 3.488185 3.439420 1.635000
WartyClaim OrdBilling DelSpeed
3.198337 2.902999 6.516014

The above analysis shows the low VIF value in respect of the variables.
However, the VIF value for Delivery Speed is 6.51 which can be
considered for this analysis
3. Perform simple linear regression for the dependent variable with
every independent variable.
#simple linear regression of dependent variable with independent
variable
#SLM for Product quality

plot(ProdQual,Satisfaction,col="Red", abline(lm(Satisfaction~ProdQual),col="Blue"))

> cor(Satisfaction, ProdQual)


[1] 0.486325
> Model1 = lm(Satisfaction~ProdQual, data = myfactor_hair)
> summary(Model1)

Call:
lm(formula = Satisfaction ~ ProdQual, data = myfactor_hair)

Residuals:
Min 1Q Median 3Q Max
-1.88746 -0.72711 -0.01577 0.85641 2.25220
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.67593 0.59765 6.151 1.68e-08 ***
ProdQual 0.41512 0.07534 5.510 2.90e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.047 on 98 degrees of freedom


Multiple R-squared: 0.2365, Adjusted R-squared: 0.2287
F-statistic: 30.36 on 1 and 98 DF, p-value: 2.901e-07
#SLM For E-Commerce

plot(Ecom,Satisfaction,col="Red", abline(lm(Satisfaction~Ecom),col="Blue"))
> cor(Satisfaction, Ecom)
[1] 0.282745
> Model2 = lm(Satisfaction~Ecom, data = myfactor_hair)
> summary(Model2)

Call:
lm(formula = Satisfaction ~ Ecom, data = myfactor_hair)

Residuals:
Min 1Q Median 3Q Max
-2.37200 -0.78971 0.04959 0.68085 2.34580

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.1516 0.6161 8.361 4.28e-13 ***
Ecom 0.4811 0.1649 2.918 0.00437 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.149 on 98 degrees of freedom
Multiple R-squared: 0.07994, Adjusted R-squared: 0.07056
F-statistic: 8.515 on 1 and 98 DF, p-value: 0.004368
#SLM for Technical Support

plot(TechSup,Satisfaction,col="Red", abline(lm(Satisfaction~TechSup),col="Blue"))

> cor(Satisfaction, TechSup)


[1] 0.1125972

> Model3 = lm(Satisfaction~TechSup, data = myfactor_hair)

> summary(Model3)
Call:
lm(formula = Satisfaction ~ TechSup, data = myfactor_hair)
Residuals:
Min 1Q Median 3Q Max
-2.26136 -0.93297 0.04302 0.82501 2.85617

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.44757 0.43592 14.791 <2e-16 ***
TechSup 0.08768 0.07817 1.122 0.265
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.19 on 98 degrees of freedom


Multiple R-squared: 0.01268, Adjusted R-squared: 0.002603
F-statistic: 1.258 on 1 and 98 DF, p-value: 0.2647
#SLM for Complaint Resolution

plot(CompRes,Satisfaction,col="Red", abline(lm(Satisfaction~CompRes),col="Blue"))

> cor(Satisfaction, CompRes)


[1] 0.6032626
> Model4 = lm(Satisfaction~CompRes, data = myfactor_hair)
> summary(Model4)

Call:
lm(formula = Satisfaction ~ CompRes, data = myfactor_hair)

Residuals:
Min 1Q Median 3Q Max
-2.40450 -0.66164 0.04499 0.63037 2.70949

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.68005 0.44285 8.310 5.51e-13 ***
CompRes 0.59499 0.07946 7.488 3.09e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9554 on 98 degrees of freedom
Multiple R-squared: 0.3639, Adjusted R-squared: 0.3574
F-statistic: 56.07 on 1 and 98 DF, p-value: 3.085e-11
#SLM for Advertising

plot(Advertising,Satisfaction,col="Red", abline(lm(Satisfaction~Advertising),col="Blue
"))

> cor(Satisfaction, Advertising)


[1] 0.3046695

> Model5 = lm(Satisfaction~Advertising, data = myfactor_hair)


> summary(Model5)

Call:
lm(formula = Satisfaction ~ Advertising, data = myfactor_hair)
Residuals:
Min 1Q Median 3Q Max
-2.34033 -0.92755 0.05577 0.79773 2.53412

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.6259 0.4237 13.279 < 2e-16 ***
Advertising 0.3222 0.1018 3.167 0.00206 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.141 on 98 degrees of freedom


Multiple R-squared: 0.09282, Adjusted R-squared: 0.08357
F-statistic: 10.03 on 1 and 98 DF, p-value: 0.002056

#SLM For Product Line

plot(ProdLine,Satisfaction,col="Red", abline(lm(Satisfaction~ProdLine),col="Blue"))

> cor(Satisfaction, ProdLine)


[1] 0.5505459
> Model6 = lm(Satisfaction~ProdLine, data = myfactor_hair)
> summary(Model6)
Call:
lm(formula = Satisfaction ~ ProdLine, data = myfactor_hair)
Residuals:
Min 1Q Median 3Q Max
-2.3634 -0.7795 0.1097 0.7604 1.7373

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.02203 0.45471 8.845 3.87e-14 ***
ProdLine 0.49887 0.07641 6.529 2.95e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1 on 98 degrees of freedom


Multiple R-squared: 0.3031, Adjusted R-squared: 0.296
F-statistic: 42.62 on 1 and 98 DF, p-value: 2.953e-09
#SLM for Sales Force Image
plot(SalesFImage,Satisfaction,col="Red", abline(lm(Satisfaction~SalesFImage),col="Blue
"))

> cor(Satisfaction, SalesFImage)


[1] 0.5002053
> Model7 = lm(Satisfaction~SalesFImage, data = myfactor_hair)
> summary(Model7)

Call:
lm(formula = Satisfaction ~ SalesFImage, data = myfactor_hair)

Residuals:
Min 1Q Median 3Q Max
-2.2164 -0.5884 0.1838 0.6922 2.0728
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.06983 0.50874 8.000 2.54e-12 ***
SalesFImage 0.55596 0.09722 5.719 1.16e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.037 on 98 degrees of freedom


Multiple R-squared: 0.2502, Adjusted R-squared: 0.2426
F-statistic: 32.7 on 1 and 98 DF, p-value: 1.164e-07

#SLM for Competitive Pricing

plot(ComPricing,Satisfaction,col="Red", abline(lm(Satisfaction~ComPricing),col="Blue")
)

> cor(Satisfaction, ComPricing)


[1] -0.2082957
> Model8 = lm(Satisfaction~ComPricing, data = myfactor_hair)
> summary(Model8)

Call:
lm(formula = Satisfaction ~ ComPricing, data = myfactor_hair)

Residuals:
Min 1Q Median 3Q Max
-1.9728 -0.9915 -0.1156 0.9111 2.5845
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.03856 0.54427 14.769 <2e-16 ***
ComPricing -0.16068 0.07621 -2.108 0.0376 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.172 on 98 degrees of freedom


Multiple R-squared: 0.04339, Adjusted R-squared: 0.03363
F-statistic: 4.445 on 1 and 98 DF, p-value: 0.03756

#SLM for Warranty Claim

plot(WartyClaim,Satisfaction,col="Red", abline(lm(Satisfaction~WartyClaim),col="Blue")
)

> cor(Satisfaction, WartyClaim)


[1] 0.1775448
> Model9 = lm(Satisfaction~WartyClaim, data = myfactor_hair)
> summary(Model9)
Call:
lm(formula = Satisfaction ~ WartyClaim, data = myfactor_hair)
Residuals:
Min 1Q Median 3Q Max
-2.36504 -0.90202 0.03019 0.90763 2.88985

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.3581 0.8813 6.079 2.32e-08 ***
WartyClaim 0.2581 0.1445 1.786 0.0772 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.179 on 98 degrees of freedom


Multiple R-squared: 0.03152, Adjusted R-squared: 0.02164
F-statistic: 3.19 on 1 and 98 DF, p-value: 0.0772
#SLM for Order & Billing

plot(OrdBilling,Satisfaction,col="Red", abline(lm(Satisfaction~OrdBilling),col="Blue")
)

> cor(Satisfaction, OrdBilling)


[1] 0.5217319
> Model10 = lm(Satisfaction~OrdBilling, data = myfactor_hair)
> summary(Model10)

Call:
lm(formula = Satisfaction ~ OrdBilling, data = myfactor_hair)

Residuals:
Min 1Q Median 3Q Max
-2.4005 -0.7071 -0.0344 0.7340 2.9673
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0541 0.4840 8.377 3.96e-13 ***
OrdBilling 0.6695 0.1106 6.054 2.60e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.022 on 98 degrees of freedom


Multiple R-squared: 0.2722, Adjusted R-squared: 0.2648
F-statistic: 36.65 on 1 and 98 DF, p-value: 2.602e-08

#SLM for Delivery Speed

plot(DelSpeed,Satisfaction,col="Red", abline(lm(Satisfaction~DelSpeed),col="Blue"))

> cor(Satisfaction, DelSpeed)


[1] 0.5770423
> Model11 = lm(Satisfaction~DelSpeed, data = myfactor_hair)
> summary(Model11)

Call:
lm(formula = Satisfaction ~ DelSpeed, data = myfactor_hair)

Residuals:
Min 1Q Median 3Q Max
-2.22475 -0.54846 0.08796 0.54462 2.59432
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.2791 0.5294 6.194 1.38e-08 ***
DelSpeed 0.9364 0.1339 6.994 3.30e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9783 on 98 degrees of freedom


Multiple R-squared: 0.333, Adjusted R-squared: 0.3262
F-statistic: 48.92 on 1 and 98 DF, p-value: 3.3e-10

4. Perform PCA/Factor Analysis by extracting 4 factors. Interpret the


output and name the factors.
#Initial Data Analysis
myfactorhair_pca = as.data.frame(myfactor_hair)
myfactorhair_pca$Satisfaction = NULL
Since we will not require dependent variable Satisfaction for the
PCA/ factor analysis, so we have nullified it.

summary(myfactorhair_pca)
ProdQual Ecom TechSup CompRes Advertising P
rodLine
Min. : 5.000 Min. :2.200 Min. :1.300 Min. :2.600 Min. :1.900 Min.
:2.300
1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250 1st Qu.:4.600 1st Qu.:3.175 1st
Qu.:4.700
Median : 8.000 Median :3.600 Median :5.400 Median :5.450 Median :4.000 Medi
an :5.750
Mean : 7.810 Mean :3.672 Mean :5.365 Mean :5.442 Mean :4.010 Mean
:5.805
3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625 3rd Qu.:6.325 3rd Qu.:4.800 3rd
Qu.:6.800
Max. :10.000 Max. :5.700 Max. :8.500 Max. :7.800 Max. :6.500 Max.
:8.400
SalesFImage ComPricing WartyClaim OrdBilling DelSpeed
Min. :2.900 Min. :3.700 Min. :4.100 Min. :2.000 Min. :1.600
1st Qu.:4.500 1st Qu.:5.875 1st Qu.:5.400 1st Qu.:3.700 1st Qu.:3.400
Median :4.900 Median :7.100 Median :6.100 Median :4.400 Median :3.900
Mean :5.123 Mean :6.974 Mean :6.043 Mean :4.278 Mean :3.886
3rd Qu.:5.800 3rd Qu.:8.400 3rd Qu.:6.600 3rd Qu.:4.800 3rd Qu.:4.425
Max. :8.200 Max. :9.900 Max. :8.100 Max. :6.700 Max. :5.500

anyNA(myfactorhair_pca)
[1] FALSE

dim(myfactorhair_pca)
[1] 100 11

str(myfactorhair_pca)
'data.frame': 100 obs. of 11 variables:
$ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
$ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
$ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
$ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
$ Advertising: num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
$ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
$ SalesFImage: num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
$ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
$ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
$ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
$ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...

#Check for normality


shapiorfactorhair = sapply(myfactorhair_pca,shapiro.test)
> shapiorfactorhair
ProdQual Ecom TechSup
statistic 0.9497214 0.9585167 0.9862636
p.value 0.000795287 0.003156537 0.3900381
method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" "Shapiro-Wilk no
rmality test"
data.name "X[[i]]" "X[[i]]" "X[[i]]"
CompRes Advertising ProdLine
statistic 0.9864587 0.9762572 0.9869227
p.value 0.4022672 0.06769457 0.4324464
method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" "Shapiro-Wilk no
rmality test"
data.name "X[[i]]" "X[[i]]" "X[[i]]"
SalesFImage ComPricing WartyClaim
statistic 0.974033 0.9675841 0.9909419
p.value 0.04533896 0.01448414 0.7403655
method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" "Shapiro-Wilk no
rmality test"
data.name "X[[i]]" "X[[i]]" "X[[i]]"
OrdBilling DelSpeed
statistic 0.9740518 0.9816082
p.value 0.04549239 0.1770483
method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
data.name "X[[i]]" "X[[i]]"

#Run barlett sphericity test for checking the possibility of data


dimension reduction
cortest.bartlett(myfactorcor, nrow(myfactorhair_pca))
$chisq
[1] 619.2726
$p.value
[1] 1.79337e-96
$df
[1] 55

P value is low can be process for dimension reduction


#kaiser meyer test
myfactor.kmo = KMO(cor(myfactorhair_pca))
> myfactor.kmo
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = cor(myfactorhair_pca))
Overall MSA = 0.65
MSA for each item =
ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage C
omPricing
0.51 0.63 0.52 0.79 0.78 0.62 0.62
0.75
WartyClaim OrdBilling DelSpeed
0.51 0.76 0.67
> A = eigen(myfactorcor)
> ev = A$values
> ev
[1] 3.42697133 2.55089671 1.69097648 1.08655606 0.60942409 0.55188378 0.40151815 0.24
695154
[9] 0.20355327 0.13284158 0.09842702
> plot(ev, type = "l",main = "Screeplot", xlab = "Principal Components", ylab = "Eigen
Values")

Factor Analysis using method PA


myfactor.solution1 = fa(r = cor(myfactorhair_pca), nfactors = 4, rotate = "none",
fm = "pa")

myfactor.solution1
Factor Analysis using method = pa
Call: fa(r = cor(myfactorhair_pca), nfactors = 4, rotate = "none",
fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 PA4 h2 u2 com
ProdQual 0.20 -0.41 -0.06 0.46 0.42 0.576 2.4
Ecom 0.29 0.66 0.27 0.22 0.64 0.362 2.0
TechSup 0.28 -0.38 0.74 -0.17 0.79 0.205 1.9
CompRes 0.86 0.01 -0.26 -0.18 0.84 0.157 1.3
Advertising 0.29 0.46 0.08 0.13 0.31 0.686 1.9
ProdLine 0.69 -0.45 -0.14 0.31 0.80 0.200 2.3
SalesFImage 0.39 0.80 0.35 0.25 0.98 0.021 2.1
ComPricing -0.23 0.55 -0.04 -0.29 0.44 0.557 1.9
WartyClaim 0.38 -0.32 0.74 -0.15 0.81 0.186 2.0
OrdBilling 0.75 0.02 -0.18 -0.18 0.62 0.378 1.2
DelSpeed 0.90 0.10 -0.30 -0.20 0.94 0.058 1.4
PA1 PA2 PA3 PA4
SS loadings 3.21 2.22 1.50 0.68
Proportion Var 0.29 0.20 0.14 0.06
Cumulative Var 0.29 0.49 0.63 0.69
Proportion Explained 0.42 0.29 0.20 0.09
Cumulative Proportion 0.42 0.71 0.91 1.00

Mean item complexity = 1.9


Test of the hypothesis that 4 factors are sufficient.

The degrees of freedom for the null model are 55 and the objective function was 6.5
5
The degrees of freedom for the model are 17 and the objective function was 0.33

The root mean square of the residuals (RMSR) is 0.02


The df corrected root mean square of the residuals is 0.03

Fit based upon off diagonal values = 1


Measures of factor score adequacy
PA1 PA2 PA3 PA4
Correlation of (regression) scores with factors 0.98 0.97 0.95 0.88
Multiple R square of scores with factors 0.96 0.95 0.91 0.78
Minimum correlation of possible factor scores 0.92 0.90 0.82 0.56
summary(myfactor.solution1)

Factor analysis with Call: fa(r = myfactorhair_pca, nfactors = 4, rotate = "none",


fm = "pa")

Test of the hypothesis that 4 factors are sufficient.


The degrees of freedom for the model is 17 and the objective function was 0.33
The number of observations was 100 with Chi Square = 30.27 with prob < 0.024
The root mean square of the residuals (RMSA) is 0.02
The df corrected root mean square of the residuals is 0.03
Tucker Lewis Index of factoring reliability = 0.921
RMSEA index = 0.096 and the 10 % confidence intervals are 0.032 0.139
BIC = -48.01

fa.diagram(myfactor.solution1)

Explore loading if factor can be balanced


myfactor.solution2 = fa(r = myfactorhair_pca, nfactors = 4, rotate = "varimax", fm = "
pa")
myfactor.solution2
Factor Analysis using method = pa
Call: fa(r = myfactorhair_pca, nfactors = 4, rotate = "varimax",
fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 PA4 h2 u2 com
ProdQual 0.02 -0.07 0.02 0.65 0.42 0.576 1.0
Ecom 0.07 0.79 0.03 -0.11 0.64 0.362 1.1
TechSup 0.02 -0.03 0.88 0.12 0.79 0.205 1.0
CompRes 0.90 0.13 0.05 0.13 0.84 0.157 1.1
Advertising 0.17 0.53 -0.04 -0.06 0.31 0.686 1.2
ProdLine 0.53 -0.04 0.13 0.71 0.80 0.200 1.9
SalesFImage 0.12 0.97 0.06 -0.13 0.98 0.021 1.1
ComPricing -0.08 0.21 -0.21 -0.59 0.44 0.557 1.6
WartyClaim 0.10 0.06 0.89 0.13 0.81 0.186 1.1
OrdBilling 0.77 0.13 0.09 0.09 0.62 0.378 1.1
DelSpeed 0.95 0.19 0.00 0.09 0.94 0.058 1.1

PA1 PA2 PA3 PA4


SS loadings 2.63 1.97 1.64 1.37
Proportion Var 0.24 0.18 0.15 0.12
Cumulative Var 0.24 0.42 0.57 0.69
Proportion Explained 0.35 0.26 0.22 0.18
Cumulative Proportion 0.35 0.60 0.82 1.00
Mean item complexity = 1.2
Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the null model are 55 and the objective function was 6.5
5 with Chi Square of 619.27
The degrees of freedom for the model are 17 and the objective function was 0.33

The root mean square of the residuals (RMSR) is 0.02


The df corrected root mean square of the residuals is 0.03

The harmonic number of observations is 100 with the empirical chi square 3.19 with
prob < 1
The total number of observations was 100 with Likelihood Chi Square = 30.27 with p
rob < 0.024
Tucker Lewis Index of factoring reliability = 0.921
RMSEA index = 0.096 and the 90 % confidence intervals are 0.032 0.139
BIC = -48.01
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2 PA3 PA4
Correlation of (regression) scores with factors 0.98 0.99 0.94 0.88
Multiple R square of scores with factors 0.96 0.97 0.88 0.78
Minimum correlation of possible factor scores 0.93 0.94 0.77 0.55

summary(myfactor.solution2)
Factor analysis with Call: fa(r = myfactorhair_pca, nfactors = 4, rotate = "varimax",
fm = "pa")
Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the model is 17 and the objective function was 0.33
The number of observations was 100 with Chi Square = 30.27 with prob < 0.024

The root mean square of the residuals (RMSA) is 0.02


The df corrected root mean square of the residuals is 0.03

Tucker Lewis Index of factoring reliability = 0.921


RMSEA index = 0.096 and the 10 % confidence intervals are 0.032 0.139
BIC = -48.01
#solution 2 is giving better balanced loading.
str(myfactor_hair)
'data.frame': 100 obs. of 12 variables:
$ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
$ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
$ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
$ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
$ Advertising : num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
$ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
$ SalesFImage : num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
$ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
$ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
$ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
$ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...
$ Satisfaction: num 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7 5.5 ...
dim(myfactor_hair)
[1] 100 12
print(myfactor.solution2$scores, digits = 2)

myfactordata = cbind(myfactor_hair[,12], myfactor.solution2$scores)

summary(myfactordata)
V1 PA1 PA2 PA3 PA4
Min. :4.700 Min. :-2.55956 Min. :-2.0373 Min. :-2.20200 Min. :-1.42
620
1st Qu.:6.000 1st Qu.:-0.61566 1st Qu.:-0.4663 1st Qu.:-0.73427 1st Qu.:-0.83
402
Median :7.050 Median : 0.07914 Median :-0.2038 Median : 0.09067 Median : 0.03
373
Mean :6.918 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.00
000
3rd Qu.:7.625 3rd Qu.: 0.74181 3rd Qu.: 0.5719 3rd Qu.: 0.56502 3rd Qu.: 0.70
675
Max. :9.900 Max. : 1.99193 Max. : 2.8326 Max. : 2.08285 Max. : 2.15
737

head(myfactordata)
PA1 PA2 PA3 PA4
[1,] 8.2 -0.1338871 0.9175166 -1.719604873 0.09135411
[2,] 5.7 1.6297604 -2.0090053 -0.596361722 0.65808192
[3,] 8.9 0.3637658 0.8361736 0.002979966 1.37548765
[4,] 4.8 -1.2225230 -0.5491336 1.245473305 -0.64421384
[5,] 7.1 -0.4854209 -0.4276223 -0.026980304 0.47360747
[6,] 4.7 -0.5950924 -1.3035333 -1.183019401 -0.9591357

colnames(myfactordata) = c("Satisfaction", "Sales Support", "Sales & Marketing", "Post


Sales Service", "Quality Check")

head(myfactordata)
Satisfaction Sales Support Sales & Marketing Post Sales Service Quality Check
[1,] 8.2 -0.1338871 0.9175166 -1.719604873 0.09135411
[2,] 5.7 1.6297604 -2.0090053 -0.596361722 0.65808192
[3,] 8.9 0.3637658 0.8361736 0.002979966 1.37548765
[4,] 4.8 -1.2225230 -0.5491336 1.245473305 -0.64421384
[5,] 7.1 -0.4854209 -0.4276223 -0.026980304 0.47360747
[6,] 4.7 -0.5950924 -1.3035333 -1.183019401 -0.95913571

5. Perform multiple linear regression with customer satisfaction as


dependent variable and the four factor as independent variables.
Comment on the model output and validity. Your remark should
make it meaningful for everybody.
write.csv(myfactordata, "myfactordata.csv")

mymldata = read.csv("myfactordata.csv")
attach(mymldata)
cor(myfactordata)
Satisfaction SalesSupport Sales&Marketing PostSalesService QualityCheck
Satisfaction 1.00000000 0.49682808 0.49823377 0.07472834 0.44430972
Sales Support 0.49682808 1.00000000 0.01268190 -0.00169571 0.02848518
Sales & Marketing 0.49823377 0.01268190 1.00000000 0.01404754 -0.04695702
Post Sales Service0.07472834-0.00169571 0.01404754 1.00000000 0.05184746
Quality Check 0.44430972 0.02848518 -0.04695702 0.05184746 1.00000000
mlmodel = lm(Satisfaction~Sales.Support+Sales...Marketing+Post.Sales.Service+Quality.C
heck, data = mymldata)

summary(mlmodel)

Call:
lm(formula = Satisfaction ~ Sales.Support + Sales...Marketing +
Post.Sales.Service + Quality.Check, data = mymldata)
Residuals:
Min 1Q Median 3Q Max
-1.7125 -0.4708 0.1024 0.4158 1.3483

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.91800 0.06696 103.317 < 2e-16 ***
Sales.Support 0.57963 0.06857 8.453 3.32e-13 ***
Sales...Marketing 0.61978 0.06834 9.070 1.61e-14 ***
Post.Sales.Service 0.05692 0.07173 0.794 0.429
Quality.Check 0.61168 0.07656 7.990 3.16e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6696 on 95 degrees of freedom


Multiple R-squared: 0.6971, Adjusted R-squared: 0.6844
F-statistic: 54.66 on 4 and 95 DF, p-value: < 2.2e-16
vif(mlmodel)
Sales.Support Sales...Marketing Post.Sales.Service Quality.Check
1.001021 1.002683 1.002981 1.005848
For any given level of Sales Support, Sales & Marketing & Post Sales
Service, improving one point of quality check satisfaction will be
increased by 0.61. Correspondingly improving sales support, Post Sales
Service and quality check improving sales & marketing by one point
will increase the satisfaction by 0.61. Likewise Sales Support, Sales &
marketing & quality check, improving one point of Post Sales Service,
will increase satisfaction level by 0.05 which is very low. Also
improving Sales Support will increase satisfaction level by 0.57.
The adjusted R-squared is 0.68, which is considerably high. Also the
multiple R-squared is 0.69. Whereas F stat is 54.66 and P value is
highly significant
.

You might also like