SlideShare a Scribd company logo
Likert-Scale Survey Question Analysis Demonstration
Anthony Castellani
This is a demonstration of a basic analysis of the responses to one question of a
hypothetical survey. This will show the use of R, as well as R packages dplyr, ggplot2,
gridExtra, and RColorBrewer.
1. Load the packages.
library(dplyr)
library(ggplot2)
library(gridExtra)
library(RColorBrewer)
2. Use the set.seed function for repeatability.
set.seed(1)
3. Create a dummy data set. This data set will simulate 1,000 responses to five survey
questions, with columns for Likert-scale responses (with a range of possibilities of 1 to
5), and the generic binary demographic attribute. The final product will be an object of
tbl class (from the package dplyr).
demographic <- round(runif(n = 1000, min = 0, max = 1))
question.1 <- round(runif(n = 1000, min = 1, max = 5))
question.2 <- round(runif(n = 1000, min = 1, max = 5))
question.3 <- round(runif(n = 1000, min = 1, max = 5))
question.4 <- round(runif(n = 1000, min = 1, max = 5))
question.5 <- round(runif(n = 1000, min = 1, max = 5))
my.df <- data.frame(matrix(data = c(demographic, question.1,
question.2, question.3,
question.4, question.5), ncol = 6))
colnames(my.df) <- c("dem", "Q1", "Q2", "Q3", "Q4", "Q5")
my.tbl <- tbl_df(my.df)
• The first few rows of the data set look like this:
Source: local data frame [1,000 x 6]
dem Q1 Q2 Q3 Q4 Q5
(dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 0 4 4 4 2 2
2 1 1 1 3 5 4
3 1 1 2 2 5 2
4 1 4 3 2 4 5
5 1 2 2 1 3 5
6 1 3 4 2 3 3
7 0 3 4 5 3 1
8 1 2 2 4 5 5
9 1 2 5 5 5 5
10 0 4 5 5 5 2
.. ... ... ... ... ... ...
4. Using the package dplyr, isolate the survey responses to one question, reduce the data
to the number of responses per Likert-scale response option, and convert the data to a
percentage of total responses.
select(my.tbl, Q1) %>%
count(Q1) %>%
mutate(n = (n / length(my.df$Q1)) * 100)
• The result will look like this:
Source: local data frame [5 x 2]
Q1 n
(dbl) (dbl)
1 1 12.2
2 2 25.5
3 3 25.4
4 4 23.6
5 5 13.3
5. This outcome, produced with dplyr, dovetails nicely with ggplot2. By appending the
end of the dplyr code with the piping operator (%>%), one can continue directly with
plotting:
select(my.tbl, Q1) %>%
count(Q1) %>%
mutate(n = (n / length(my.df$Q1)) * 100) %>%
ggplot(aes(x = Q1, y = n)) +
geom_bar(stat = "identity")
6. While this does indeed produce a result, it’s a little lacking in style. Plus, some of the
annotations that carry over with no further modification don’t tell much of a story
(e.g., What does “n” mean?, etc.). With a few extra lines of code, it’s trivial to add some
color, modify the background, and give the axes some meaning. By making the
assumption that this particular survey question had specific meanings attached to the
Likert scale, the x-axis scale can also be meaningfully modified.
select(my.tbl, Q1) %>%
count(Q1) %>%
mutate(n = (n / length(my.df$Q1)) * 100) %>%
ggplot(aes(x = Q1, y = n, fill = Q1)) +
scale_fill_gradientn(colors = brewer.pal(n = 5, name = "RdBu")) +
guides(fill = "none") +
geom_bar(stat = "identity") +
theme_bw() +
xlab(NULL) +
ylab("Percent of Responses") +
scale_x_discrete(breaks = c("1", "2", "3", "4", "5"),
limits = c(1:5),
labels = c("Strongly disagree",
"Disagree",
"Neutral",
"Agree",
"Strongly agree"))
• For part two, let us assume that questions two and three are actually parts one and
two of a two-part question. They should really be visualized together. There are a few
ways of doing this, including using the faceting feature in ggplot2. Just to change
things up, though, let’s use the gridExtra package to arrange two separate
visualizations adjacent to each other.
7. Because the function grid.arrange will plot two independent graphs, a little extra
work is necessary to ensure they both display the same scale. The y-axis could be set
manually, but setting it programmatically would be better. To do that, first we will
need an object that captures the maximum value in each chart that will be used later.
my.max <- ceiling(max(
max(select(my.tbl, Q2) %>%
count(Q2) %>%
mutate(n = (n / length(my.df$Q2)) * 100))
,
max(select(my.tbl, Q3) %>%
count(Q3) %>%
mutate(n = (n / length(my.df$Q2)) * 100))
)) + 1
8. Plotting with grid.arrange is similar to the previous efforts, but with two blocks of
dplyr and ggplot2 code separated by a comma inside of grid.arrange(). In order to
show some different color options, assume that the previous color scheme was found
to be inappropriate, and so we’ll use a simple red and blue scheme here. Aside from
that, there are only a few little tweaks needed to do things like remove the x-axis
labeling in the upper chart before we arrive at a side-by-side comparison of the two-
part question.
grid.arrange(
select(my.tbl, Q2) %>%
count(Q2) %>%
mutate(n = (n / length(my.df$Q2)) * 100) %>%
ggplot(aes(x = Q2, y = n)) +
geom_bar(stat = "identity", fill = "red") +
theme_bw() +
xlab(NULL) +
ylab("Percent of Responsesnto Question 2") +
scale_y_continuous(limits = c(0,my.max)) +
scale_x_discrete(breaks = c("1", "2", "3", "4", "5"),
limits = c(1:5),
labels = NULL)
,
select(my.tbl, Q3) %>%
count(Q3) %>%
mutate(n = (n / length(my.df$Q3)) * 100) %>%
ggplot(aes(x = Q3, y = n)) +
geom_bar(stat = "identity", fill = "blue") +
theme_bw() +
xlab(NULL) +
ylab("Percent of Responsesnto Question 3") +
scale_y_continuous(limits = c(0,my.max)) +
scale_x_discrete(breaks = c("1", "2", "3", "4", "5"),
limits = c(1:5),
labels = c("Strongly disagree",
"Disagree",
"Neutral",
"Agree",
"Strongly agree"))
)
9. For the last part of this demonstration, we will look at the effect that demographics has
on one of the questions. Looking at question 4, let’s find out the difference in opinion
between demographic group A (which is represented by zeros in the demographic
column of the data table) and demographic group B (which is represented by ones in
the demographic column). This effort will be almost identical to the previous dual-
graph visualization. The only difference will be the addition of the dplyr filter
function at the beginning of each block of code inside of grid.arrange. Also, the color
scheme will be adjusted to make this graph stand out from the previous ones.
my.max2 <- ceiling(max(
max(filter(my.tbl, dem == 0) %>%
select(Q4) %>%
count(Q4) %>%
mutate(n = (n / length(my.df$Q4)) * 100))
,
max(filter(my.tbl, dem == 1) %>%
select(Q4) %>%
count(Q4) %>%
mutate(n = (n / length(my.df$Q4)) * 100))
)) + 1
grid.arrange(
filter(my.tbl, dem == 0) %>%
select(Q4) %>%
count(Q4) %>%
mutate(n = (n / length(my.df$Q4)) * 100) %>%
ggplot(aes(x = Q4, y = n)) +
geom_bar(stat = "identity", fill = "purple") +
theme_bw() +
xlab(NULL) +
ylab("Percent of Responsesnfrom Demographic A") +
scale_y_continuous(limits = c(0,my.max2)) +
scale_x_discrete(breaks = c("1", "2", "3", "4", "5"),
limits = c(1:5),
labels = NULL)
,
filter(my.tbl, dem == 1) %>%
select(Q4) %>%
count(Q4) %>%
mutate(n = (n / length(my.df$Q4)) * 100) %>%
ggplot(aes(x = Q4, y = n)) +
geom_bar(stat = "identity", fill = "orange") +
theme_bw() +
xlab(NULL) +
ylab("Percent of Responsesnfrom Demographic B") +
scale_y_continuous(limits = c(0,my.max2)) +
scale_x_discrete(breaks = c("1", "2", "3", "4", "5"),
limits = c(1:5),
labels = c("Strongly disagree",
"Disagree",
"Neutral",
"Agree",
"Strongly agree"))
)
And thus concludes this demonstration of a simple analysis of Likert-scale survey questions
both with and without demographic breakdown.
Ad

More Related Content

What's hot (20)

A2 python basics_nptel_pds2_sol
A2 python basics_nptel_pds2_solA2 python basics_nptel_pds2_sol
A2 python basics_nptel_pds2_sol
MaynaShah1
 
Environmental Engineering Assignment Help
Environmental Engineering Assignment HelpEnvironmental Engineering Assignment Help
Environmental Engineering Assignment Help
Matlab Assignment Experts
 
gnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelsgnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Models
htstatistics
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
htstatistics
 
Day 4b iteration and functions for-loops.pptx
Day 4b   iteration and functions  for-loops.pptxDay 4b   iteration and functions  for-loops.pptx
Day 4b iteration and functions for-loops.pptx
Adrien Melquiond
 
Day 4a iteration and functions.pptx
Day 4a   iteration and functions.pptxDay 4a   iteration and functions.pptx
Day 4a iteration and functions.pptx
Adrien Melquiond
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
jins0618
 
From L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized ModelsFrom L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized Models
htstatistics
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in R
htstatistics
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
Marjan Sterjev
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
Max Kleiner
 
Statistical Analysis and Model Validation of Gompertz Model on different Real...
Statistical Analysis and Model Validation of Gompertz Model on different Real...Statistical Analysis and Model Validation of Gompertz Model on different Real...
Statistical Analysis and Model Validation of Gompertz Model on different Real...
Editor Jacotech
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning Programmers
Kimikazu Kato
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model
Saurab Dulal
 
11 1. multi-dimensional array eng
11 1. multi-dimensional array eng11 1. multi-dimensional array eng
11 1. multi-dimensional array eng
웅식 전
 
Operation on functions
Operation on functionsOperation on functions
Operation on functions
Jeralyn Obsina
 
4.7 graph linear functions day 1
4.7 graph linear functions   day 14.7 graph linear functions   day 1
4.7 graph linear functions day 1
bweldon
 
L2 graphs piecewise, absolute,and greatest integer
L2 graphs  piecewise, absolute,and greatest integerL2 graphs  piecewise, absolute,and greatest integer
L2 graphs piecewise, absolute,and greatest integer
James Tagara
 
Cubist
CubistCubist
Cubist
FAO
 
Data Analysis Homework Help
Data Analysis Homework HelpData Analysis Homework Help
Data Analysis Homework Help
Matlab Assignment Experts
 
A2 python basics_nptel_pds2_sol
A2 python basics_nptel_pds2_solA2 python basics_nptel_pds2_sol
A2 python basics_nptel_pds2_sol
MaynaShah1
 
gnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelsgnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Models
htstatistics
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
htstatistics
 
Day 4b iteration and functions for-loops.pptx
Day 4b   iteration and functions  for-loops.pptxDay 4b   iteration and functions  for-loops.pptx
Day 4b iteration and functions for-loops.pptx
Adrien Melquiond
 
Day 4a iteration and functions.pptx
Day 4a   iteration and functions.pptxDay 4a   iteration and functions.pptx
Day 4a iteration and functions.pptx
Adrien Melquiond
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
jins0618
 
From L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized ModelsFrom L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized Models
htstatistics
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in R
htstatistics
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
Marjan Sterjev
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
Max Kleiner
 
Statistical Analysis and Model Validation of Gompertz Model on different Real...
Statistical Analysis and Model Validation of Gompertz Model on different Real...Statistical Analysis and Model Validation of Gompertz Model on different Real...
Statistical Analysis and Model Validation of Gompertz Model on different Real...
Editor Jacotech
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning Programmers
Kimikazu Kato
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model
Saurab Dulal
 
11 1. multi-dimensional array eng
11 1. multi-dimensional array eng11 1. multi-dimensional array eng
11 1. multi-dimensional array eng
웅식 전
 
Operation on functions
Operation on functionsOperation on functions
Operation on functions
Jeralyn Obsina
 
4.7 graph linear functions day 1
4.7 graph linear functions   day 14.7 graph linear functions   day 1
4.7 graph linear functions day 1
bweldon
 
L2 graphs piecewise, absolute,and greatest integer
L2 graphs  piecewise, absolute,and greatest integerL2 graphs  piecewise, absolute,and greatest integer
L2 graphs piecewise, absolute,and greatest integer
James Tagara
 
Cubist
CubistCubist
Cubist
FAO
 

Viewers also liked (13)

Redes y mas
Redes y masRedes y mas
Redes y mas
Franjeada
 
nwood_igarss_2011_rev2.pdf
nwood_igarss_2011_rev2.pdfnwood_igarss_2011_rev2.pdf
nwood_igarss_2011_rev2.pdf
grssieee
 
Proyecto (Administración)
Proyecto (Administración)Proyecto (Administración)
Proyecto (Administración)
Claupineda
 
Robotica!!
Robotica!!Robotica!!
Robotica!!
daniela960
 
Сагнаева Айгерим-Кондитерская-Идея
Сагнаева Айгерим-Кондитерская-ИдеяСагнаева Айгерим-Кондитерская-Идея
Сагнаева Айгерим-Кондитерская-Идея
Айгерим Сагнаева
 
Epae
EpaeEpae
Epae
gisselaquinteros
 
Training class - Tecniche di comunicazione
Training class - Tecniche di comunicazioneTraining class - Tecniche di comunicazione
Training class - Tecniche di comunicazione
Massimo Parisi
 
Iftikhar Hussain CV
Iftikhar Hussain CVIftikhar Hussain CV
Iftikhar Hussain CV
Iftikhar Hussain
 
Yamagughi TPCReport.ppt
Yamagughi TPCReport.pptYamagughi TPCReport.ppt
Yamagughi TPCReport.ppt
grssieee
 
Rish seq ana
Rish seq anaRish seq ana
Rish seq ana
rishabhaks
 
Mesoamerican Virtual Center of Excellence in Forest Monitoring Infographic
Mesoamerican Virtual Center of Excellence in Forest Monitoring InfographicMesoamerican Virtual Center of Excellence in Forest Monitoring Infographic
Mesoamerican Virtual Center of Excellence in Forest Monitoring Infographic
Centro de Excelencia Virtual en Monitoreo Forestal
 
Producción
ProducciónProducción
Producción
Gladys Carrión
 
Guía de tejido adiposo y cartilaginoso
Guía  de  tejido  adiposo y cartilaginosoGuía  de  tejido  adiposo y cartilaginoso
Guía de tejido adiposo y cartilaginoso
Giuliana Tinoco
 
nwood_igarss_2011_rev2.pdf
nwood_igarss_2011_rev2.pdfnwood_igarss_2011_rev2.pdf
nwood_igarss_2011_rev2.pdf
grssieee
 
Proyecto (Administración)
Proyecto (Administración)Proyecto (Administración)
Proyecto (Administración)
Claupineda
 
Сагнаева Айгерим-Кондитерская-Идея
Сагнаева Айгерим-Кондитерская-ИдеяСагнаева Айгерим-Кондитерская-Идея
Сагнаева Айгерим-Кондитерская-Идея
Айгерим Сагнаева
 
Training class - Tecniche di comunicazione
Training class - Tecniche di comunicazioneTraining class - Tecniche di comunicazione
Training class - Tecniche di comunicazione
Massimo Parisi
 
Yamagughi TPCReport.ppt
Yamagughi TPCReport.pptYamagughi TPCReport.ppt
Yamagughi TPCReport.ppt
grssieee
 
Guía de tejido adiposo y cartilaginoso
Guía  de  tejido  adiposo y cartilaginosoGuía  de  tejido  adiposo y cartilaginoso
Guía de tejido adiposo y cartilaginoso
Giuliana Tinoco
 
Ad

Similar to Survey Demo (20)

Data visualization with R and ggplot2.docx
Data visualization with R and ggplot2.docxData visualization with R and ggplot2.docx
Data visualization with R and ggplot2.docx
kassaye4
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
Avjinder (Avi) Kaler
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
Avjinder (Avi) Kaler
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Big_Data_Ukraine
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
Dennis
 
MyStataLab Assignment Help
MyStataLab Assignment HelpMyStataLab Assignment Help
MyStataLab Assignment Help
Statistics Assignment Help
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
aulasnilda
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
jeremylockett77
 
Seminar psu 20.10.2013
Seminar psu 20.10.2013Seminar psu 20.10.2013
Seminar psu 20.10.2013
Vyacheslav Arbuzov
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
Long Nguyen
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
Khaled Al-Shamaa
 
NPTEL QUIZ.docx
NPTEL QUIZ.docxNPTEL QUIZ.docx
NPTEL QUIZ.docx
GEETHAR59
 
BasicGraphsWithR
BasicGraphsWithRBasicGraphsWithR
BasicGraphsWithR
Aureliano Bombarely
 
Perm winter school 2014.01.31
Perm winter school 2014.01.31Perm winter school 2014.01.31
Perm winter school 2014.01.31
Vyacheslav Arbuzov
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
Chu An
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
Statistics Homework Helper
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsem
Gopi Saiteja
 
R programming language
R programming languageR programming language
R programming language
Alberto Minetti
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
Mohammed El Rafie Tarabay
 
Data visualization with R and ggplot2.docx
Data visualization with R and ggplot2.docxData visualization with R and ggplot2.docx
Data visualization with R and ggplot2.docx
kassaye4
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Big_Data_Ukraine
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
Dennis
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
aulasnilda
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
jeremylockett77
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
Long Nguyen
 
NPTEL QUIZ.docx
NPTEL QUIZ.docxNPTEL QUIZ.docx
NPTEL QUIZ.docx
GEETHAR59
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
Chu An
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsem
Gopi Saiteja
 
Ad

Survey Demo

  • 1. Likert-Scale Survey Question Analysis Demonstration Anthony Castellani This is a demonstration of a basic analysis of the responses to one question of a hypothetical survey. This will show the use of R, as well as R packages dplyr, ggplot2, gridExtra, and RColorBrewer. 1. Load the packages. library(dplyr) library(ggplot2) library(gridExtra) library(RColorBrewer) 2. Use the set.seed function for repeatability. set.seed(1) 3. Create a dummy data set. This data set will simulate 1,000 responses to five survey questions, with columns for Likert-scale responses (with a range of possibilities of 1 to 5), and the generic binary demographic attribute. The final product will be an object of tbl class (from the package dplyr). demographic <- round(runif(n = 1000, min = 0, max = 1)) question.1 <- round(runif(n = 1000, min = 1, max = 5)) question.2 <- round(runif(n = 1000, min = 1, max = 5)) question.3 <- round(runif(n = 1000, min = 1, max = 5)) question.4 <- round(runif(n = 1000, min = 1, max = 5)) question.5 <- round(runif(n = 1000, min = 1, max = 5)) my.df <- data.frame(matrix(data = c(demographic, question.1, question.2, question.3, question.4, question.5), ncol = 6)) colnames(my.df) <- c("dem", "Q1", "Q2", "Q3", "Q4", "Q5") my.tbl <- tbl_df(my.df) • The first few rows of the data set look like this:
  • 2. Source: local data frame [1,000 x 6] dem Q1 Q2 Q3 Q4 Q5 (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) 1 0 4 4 4 2 2 2 1 1 1 3 5 4 3 1 1 2 2 5 2 4 1 4 3 2 4 5 5 1 2 2 1 3 5 6 1 3 4 2 3 3 7 0 3 4 5 3 1 8 1 2 2 4 5 5 9 1 2 5 5 5 5 10 0 4 5 5 5 2 .. ... ... ... ... ... ... 4. Using the package dplyr, isolate the survey responses to one question, reduce the data to the number of responses per Likert-scale response option, and convert the data to a percentage of total responses. select(my.tbl, Q1) %>% count(Q1) %>% mutate(n = (n / length(my.df$Q1)) * 100) • The result will look like this: Source: local data frame [5 x 2] Q1 n (dbl) (dbl) 1 1 12.2 2 2 25.5 3 3 25.4 4 4 23.6 5 5 13.3 5. This outcome, produced with dplyr, dovetails nicely with ggplot2. By appending the end of the dplyr code with the piping operator (%>%), one can continue directly with plotting: select(my.tbl, Q1) %>% count(Q1) %>% mutate(n = (n / length(my.df$Q1)) * 100) %>% ggplot(aes(x = Q1, y = n)) + geom_bar(stat = "identity")
  • 3. 6. While this does indeed produce a result, it’s a little lacking in style. Plus, some of the annotations that carry over with no further modification don’t tell much of a story (e.g., What does “n” mean?, etc.). With a few extra lines of code, it’s trivial to add some color, modify the background, and give the axes some meaning. By making the assumption that this particular survey question had specific meanings attached to the Likert scale, the x-axis scale can also be meaningfully modified. select(my.tbl, Q1) %>% count(Q1) %>% mutate(n = (n / length(my.df$Q1)) * 100) %>% ggplot(aes(x = Q1, y = n, fill = Q1)) + scale_fill_gradientn(colors = brewer.pal(n = 5, name = "RdBu")) + guides(fill = "none") + geom_bar(stat = "identity") + theme_bw() + xlab(NULL) + ylab("Percent of Responses") + scale_x_discrete(breaks = c("1", "2", "3", "4", "5"), limits = c(1:5), labels = c("Strongly disagree", "Disagree", "Neutral", "Agree", "Strongly agree"))
  • 4. • For part two, let us assume that questions two and three are actually parts one and two of a two-part question. They should really be visualized together. There are a few ways of doing this, including using the faceting feature in ggplot2. Just to change things up, though, let’s use the gridExtra package to arrange two separate visualizations adjacent to each other. 7. Because the function grid.arrange will plot two independent graphs, a little extra work is necessary to ensure they both display the same scale. The y-axis could be set manually, but setting it programmatically would be better. To do that, first we will need an object that captures the maximum value in each chart that will be used later. my.max <- ceiling(max( max(select(my.tbl, Q2) %>% count(Q2) %>% mutate(n = (n / length(my.df$Q2)) * 100)) , max(select(my.tbl, Q3) %>% count(Q3) %>% mutate(n = (n / length(my.df$Q2)) * 100)) )) + 1
  • 5. 8. Plotting with grid.arrange is similar to the previous efforts, but with two blocks of dplyr and ggplot2 code separated by a comma inside of grid.arrange(). In order to show some different color options, assume that the previous color scheme was found to be inappropriate, and so we’ll use a simple red and blue scheme here. Aside from that, there are only a few little tweaks needed to do things like remove the x-axis labeling in the upper chart before we arrive at a side-by-side comparison of the two- part question. grid.arrange( select(my.tbl, Q2) %>% count(Q2) %>% mutate(n = (n / length(my.df$Q2)) * 100) %>% ggplot(aes(x = Q2, y = n)) + geom_bar(stat = "identity", fill = "red") + theme_bw() + xlab(NULL) + ylab("Percent of Responsesnto Question 2") + scale_y_continuous(limits = c(0,my.max)) + scale_x_discrete(breaks = c("1", "2", "3", "4", "5"), limits = c(1:5), labels = NULL) , select(my.tbl, Q3) %>% count(Q3) %>% mutate(n = (n / length(my.df$Q3)) * 100) %>% ggplot(aes(x = Q3, y = n)) + geom_bar(stat = "identity", fill = "blue") + theme_bw() + xlab(NULL) + ylab("Percent of Responsesnto Question 3") + scale_y_continuous(limits = c(0,my.max)) + scale_x_discrete(breaks = c("1", "2", "3", "4", "5"), limits = c(1:5), labels = c("Strongly disagree", "Disagree", "Neutral", "Agree", "Strongly agree")) )
  • 6. 9. For the last part of this demonstration, we will look at the effect that demographics has on one of the questions. Looking at question 4, let’s find out the difference in opinion between demographic group A (which is represented by zeros in the demographic column of the data table) and demographic group B (which is represented by ones in the demographic column). This effort will be almost identical to the previous dual- graph visualization. The only difference will be the addition of the dplyr filter function at the beginning of each block of code inside of grid.arrange. Also, the color scheme will be adjusted to make this graph stand out from the previous ones.
  • 7. my.max2 <- ceiling(max( max(filter(my.tbl, dem == 0) %>% select(Q4) %>% count(Q4) %>% mutate(n = (n / length(my.df$Q4)) * 100)) , max(filter(my.tbl, dem == 1) %>% select(Q4) %>% count(Q4) %>% mutate(n = (n / length(my.df$Q4)) * 100)) )) + 1 grid.arrange( filter(my.tbl, dem == 0) %>% select(Q4) %>% count(Q4) %>% mutate(n = (n / length(my.df$Q4)) * 100) %>% ggplot(aes(x = Q4, y = n)) + geom_bar(stat = "identity", fill = "purple") + theme_bw() + xlab(NULL) + ylab("Percent of Responsesnfrom Demographic A") + scale_y_continuous(limits = c(0,my.max2)) + scale_x_discrete(breaks = c("1", "2", "3", "4", "5"), limits = c(1:5), labels = NULL) , filter(my.tbl, dem == 1) %>% select(Q4) %>% count(Q4) %>% mutate(n = (n / length(my.df$Q4)) * 100) %>% ggplot(aes(x = Q4, y = n)) + geom_bar(stat = "identity", fill = "orange") + theme_bw() + xlab(NULL) + ylab("Percent of Responsesnfrom Demographic B") + scale_y_continuous(limits = c(0,my.max2)) + scale_x_discrete(breaks = c("1", "2", "3", "4", "5"), limits = c(1:5), labels = c("Strongly disagree", "Disagree", "Neutral", "Agree", "Strongly agree")) )
  • 8. And thus concludes this demonstration of a simple analysis of Likert-scale survey questions both with and without demographic breakdown.