SAMENA Data Visualization Training - Full Slide Deck (for Distribution)
SAMENA Data Visualization Training - Full Slide Deck (for Distribution)
Visualization Training
August 29th to 31st, 2023
Stephen Konah, Jack Hazerjian, & Shoba Ramachandran
1
2
3
4
5
Introductions
6
Results from Pre-Training Survey (1)
Survey Participation Top 5 Charts Cohort is Most Comfortable Purpose for Creating Data
Creating Visualizations
34
45
= 76% 1. Column or bar (33)
2. Pie (31) Reports 33
3. Line (29)
Previous Instruction in Data 4. Line and bar combo (16)
Visualization 5. Stacked column or bar (16)
Presentations 32
Proficiency in Data Visualization
Had 23
previous
instruction
Explore data 23
29%
No
previous 6
instruction 5 Dashboards 18
71%
7
Results from Pre-Training Survey (2)
Software Most Used for Data DHIS2 Experience Top 6 Charts Most Interested in
Visualization Learning
Excel
DHIS2 1. Sunburst (8)
30 experience
2. Cluster column or bar (8)
24%
3. Area (7)
Power BI 10
4. Sparklines or sparkbars (5)
5. Pyramid (5)
Google Charts 8
6. Funnel (5)
No DHIS2
Tableau 5 experience
76%
8
Content & Flow of Training
13
Powerful Data Visualization that Led to
Establishment of Causality
14
Other Effective Data Visualizations
15
Other Effective Data Visualizations
16
Other Effective Data Visualizations
17
Other Effective Data Visualizations
18
Other Effective Data Visualizations
19
Characteristics of Great Data Visualizations
20
Our Brain & Data
Visualizations
21
History of Data Visualization
22
Power of Visual Processing
23
Cognitive Science Considerations for
Visualizing Data
Appeal to pre-attentive Reduce time to insight Optimize data-Ink ratio
processing
24
Memory in Data Visualization
• Super fast
• Unconscious
Iconic
• Tuned to a set of pre-attentive
attributes
• Limited capacity
Short-term • Able to process few pieces of
information at the same time
25
Understanding
Your Audience
26
Audience Composition
Technical
Donors? Executives?
experts?
Ministry of
General
Health Mix of all?
public?
officials?
27
Questions about Audience
How much time does my audience have? Is my audience interested in the full
Is my audience What decisions data story or just top headlines?
internal to my does my What kind of context What level Are
organization or audience have does my audience of detail decimals
external? agency over? need? does my needed for
audience this
Are there comparisons Which dissemination expect? audience?
that are particularly useful format best suits my
audience? Should present aggregated
for this audience?
or disaggregated results?
Which dissemination Is my audience
technical or What decisions is my audience
format best suits my
non-technical? looking to make?
audience?
28
Top Line or Details?
70% 70%
60%
Percentage achievement
Percentage achievement
50%
55%
37%
35%
30%
20% 20%
25%
20%
15%
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
29
Useful Comparisons?
100% 100%
90% 90%
80% 80%
70% 70%
Target
60% 60%
50% 50%
40% 40%
30% 30%
20% 20%
10% 10%
0% 0%
30
Broad or Singular Interest?
Project A Project A
Project B Project B
Project C Project C
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
31
Ultimate Format?
32
Checklist for Audience Analysis
❑ List types of audience members and ❑ Decide how many points in time are
star 2-3 of highest priority audience needed to be shared with audience
types (strategic vs. operational information)
33
Choosing the
Right Chart
34
Cheat Sheets
35
36
37
38
39
40
41
42
43
44
45
Let’s Go On a
Chart Tour!
46
Exploration
Projects
1-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 Projects Indicator Projects Aug Sept Oct Nov Sparkline
years years years years years years years years years years
Project 1 40% Project 1 10% 53% 79% 5%
Project 1 50% 53% 53% 63% 65% 71% 75% 90% 95% 96%
Project 2 48% Project 2 48% 70% 57% 63%
Project 2 48% 52% 57% 63% 70% 73% 80% 91% 95% 98% Project 3 58% 49% 58% 68%
Project 3 30%
Project 3 58% 49% 58% 68% 74% 72% 84% 92% 96% 97% Project 4 50% 80% 60% 65%
Project 4 50%
Project 4 50% 59% 60% 65% 73% 76% 86% 94% 98% 99% Project 5 100% 61% 60% 99%
Project 5 99%
Project 5 60% 61% 60% 64% 73% 77% 84% 95% 98% 95% Project 6 70% 72% 66% 77%
Project 6 20%
Project 6 70% 72% 76% 78% 80% 83% 87% 90% 94% 94%
Project 7 30%
Project 7 76% 75% 78% 79% 85% 84% 86% 90% 93% 94%
Project 8 60% Sparkbars
Project 8 70% 72% 71% 76% 78% 82% 87% 93% 98% 99%
Project 9 71% 74% 73% 78% 75% 81% 88% 92% 99% 96%
Project 9 40% Projects Aug Sept Oct Nov Sparkbar
Project 10 75% 76% 78% 77% 76% 79% 88% 96% 98% 97%
Project 10 55% Project 1 10% 53% 79% 5%
Project 11 70% 75% 78% 79% 79% 80% 81% 90% 93% 98% Project 11 50% Project 2 48% 70% 57% 63%
Project 12 95% 93% 94% 95% 98% 98% 97% 98% 99% 99% Project 12 95% Project 3 58% 49% 58% 68%
Project 13 73% 79% 80% 83% 81% 81% 84% 90% 98% 99% Project 13 40% Project 4 50% 80% 60% 65%
Project 14 72% 72% 76% 78% 79% 83% 87% 88% 95% 94% Project 14 33% Project 5 100% 61% 60% 99%
Project 15 73% 74% 71% 74% 76% 78% 83% 89% 95% 95% Project 15 22% Project 6 70% 72% 66% 77%
47
Parts of a Whole
Pie chart Donut chart Sunburst chart
48
Parts of a Whole
District A
District B
District C
District D
49
Distribution
50
Correlations
95%
90%
85%
80%
75%
70%
65%
60%
55%
50%
0 200 400 600 800 1,000
51
Progress Towards Targets
Column and line combo chart Overlapping bar chart
100 100
80 80
60 60
40 40
20 20
0 0
Category 1 Category 2 Category 3 Category 4 Indicator 1 Indicator 2 Indicator 3 Indicator 4
52
Before & After
53
Geospatial
Choropleth map Tile map Tile trendline map
JK LEH
PB HP UK SK AR
CH HR DL UP BR WB AS NL
GJ RJ MP CG OR JH ML MN
DNH
MH TG TR MZ
DD
GA AP
KA PY
LD KL TN AN
54
Moving Through A Process
Counseled = 60
Tested = 40
Tested HIV+ = 20
On ART = 10
55
Key Questions for Chart Choosing
56
What Charts Can We Create for this Data Set?
District A 81 83 80 77
District B 71 75 69 68
District C 45 51 75 103
District D 78 71 66 55
57
Different Chart Options for Same Data Set
For comparisons within districts For comparisons across districts For cumulative comparisons across years
100 300
100
250
80 80
200
60 60
150
40 40
100
20 20 50
0 0
0
District A District B District C District D 2020 2021 2022 2023
2020 2021 2022 2023
58
Day 1 Closing
brain process c) What do you want the audience to do after they see this
visualization?
visualizations d) What is your audience’s familiarity and knowledge level with
the topic at hand?
o Understanding our
e) Are there any comparisons that would be useful to include
audience to enhance the audience’s understanding?
o Choosing our charts 3. Choose the best chart that fits the data set and explain why this is the
best chart.
wisely 4. Post your responses to the questions in the meeting chat.
59
SAMENA Data
Visualization Training
Day 2
Stephen Konah, Jack Hazerjian, & Shoba Ramachandran
Share 1 new learning (big or
small) from yesterday.
Day 2
65
Creating Sunburst Charts
66
Creating Heat Tables
1-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49
1. Select all cells of your data table.
Projects
years years years years years years years years years years
Project 1 50% 53% 53% 63% 65% 71% 75% 90% 95% 96%
2. Go to Home tab in ribbon.
Project 2 48% 52% 57% 63% 70% 73% 80% 91% 95% 98%
Project 3 58% 49% 58% 68% 74% 72% 84% 92% 96% 97% 3. Select “Conditional Formatting.”
Project 4 50% 59% 60% 65% 73% 76% 86% 94% 98% 99%
Project 5 60% 61% 60% 64% 73% 77% 84% 95% 98% 95%
4. In the dropdown menu that pops up, select
Project 6 70% 72% 76% 78% 80% 83% 87% 90% 94% 94%
“Color Scales.”
Project 7 76% 75% 78% 79% 85% 84% 86% 90% 93% 94%
5. Select the color scheme that makes the most
Project 8 70% 72% 71% 76% 78% 82% 87% 93% 98% 99%
sense for your visualization.
Project 9 71% 74% 73% 78% 75% 81% 88% 92% 99% 96%
Project 10 75% 76% 78% 77% 76% 79% 88% 96% 98% 97%
Project 11 70% 75% 78% 79% 79% 80% 81% 90% 93% 98%
Project 12 95% 93% 94% 95% 98% 98% 97% 98% 99% 99%
Project 13 73% 79% 80% 83% 81% 81% 84% 90% 98% 99%
Project 14 72% 72% 76% 78% 79% 83% 87% 88% 95% 94%
Project 15 73% 74% 71% 74% 76% 78% 83% 89% 95% 95%
67
Creating Sparklines & Sparkbars
Sparklines
1. Highlight a single line of data to be visualized.
Projects Aug Sept Oct Nov Sparkline
Project 1 10% 53% 79% 5% 2. Go to the Insert tab in the ribbon.
Project 2 48% 70% 57% 63%
Project 3 58% 49% 58% 68%
3. Select , under the spark lines section, choose
Project 4 50% 80% 60% 65% “lines” or “columns.”
Project 5 100% 61% 60% 99%
Project 6 70% 72% 66% 77% 4. In the dialog box that pops up, choose the cell in
which you want the sparkline or sparkbars to
Sparkbars show up.
Projects Aug Sept Oct Nov Sparkbar
5. Hit ok.
Project 1 10% 53% 79% 5%
Project 2 48% 70% 57% 63% 6. Repeat these steps for the rest of lines of data or
Project 3 58% 49% 58% 68%
drag the far left corner of sparkline/ sparkbar cell
Project 4 50% 80% 60% 65%
Project 5 100% 61% 60% 99%
to the bottom of your data set to replicate the
Project 6 70% 72% 66% 77% process.
68
68
Creating Funnel Charts
69
Creating Waterfall Charts
1. For those figures that represent losses in your
dataset, covert the numbers to be negative.
2. Select all cells of your data table.
3. Go to Insert tab in ribbon.
4. Select “Recommended Charts.”
5. In the dialog box that pops up, select the “All
Charts” tab.
6. Under the chart options given, select
“Waterfall.”
7. Right click on the first bar only and select “Set
as Total.” Do the same thing for the last bar.
8. Color your chart bars such that the gains are
shown in green and the losses are shown in red.
70
Creating Choropleth Maps
71
Creating Overlapping Bars Charts
1. Select all cells of your data table.
350
2. Go to Insert tab in ribbon.
4. In the dialog box that pops up, select the “All Charts” tab.
250
5. Under the chart options given, select “Combo.” On the right side, you should
see some options for combo charts appear. Select the second option entitled
200 “Clustered Column – Line on Secondary Axis.” At the bottom of the same
window, ensure that clustered column is selected for both indicators for Chart
Type. The secondary axis should already be selected for the second indicator.
150
6. In the chart that pops up, ensure that both y-axes are set to the same scale.
You can do this by right clicking on one of the axes and setting the minimum
100 and maximum values identical to the other axis.
7. Once you have made both axes identical, you can actually delete the second
50 axis by clicking on it and hitting the “Delete” button.
8. Right click on the bars that represent the target and in the “Formal Data
0
Series” window that shows up, set the gap width to 80%. This will widen the
Province 1 Province 2 Province 3 Province 4 Province 5 target bars so that you can see them behind the achievement/results bars.
Pregnant women in catchment area 9. Color your target bars grey and your achievement/results bars in a bright,
Reached with ANC services saturated color that provides a contrast against the grey.
72
Creating Diverging Bars Charts
1. Select all cells of your data table and go to the Insert tab in ribbon. Here, select “Recommended Charts.” In the dialog
box that pops up, select the “All Charts” tab. Under the chart options given, select “Bar.” On the right side, you should
Other SRH see some options for bar charts appear. Select the second option entitled “Stacked Bar.”
2. For the data that you want to appear in the right side of your y-axis, convert the numbers to be negative. If you have
Pregnancy a lot of data, you can do this easily by first putting a “-1” in a free cell on the side of your table. Copy the “-1.” Next,
select all of the cells in your table that you want to convert to be a negative value and go to the Home tab of the
ribbon. In the far left corner of the ribbon on the Home tab, select the small arrow below “Paste.” This should open
up a drop down menu with numerous pasting options. Select the very last option of “Paste Special.” In the Paste
HIV Special dialog box that opens up, select the radio button next to “Multiply” in Operation section and click “Ok.” All of
the numbers should have been multiplied by -1 and been converted to negative numbers.
Other STI 3. To put the y-axis labels in far left of the chart (as opposed to the middle), right click on the y-axis labels and select
“Format Axis” in the pop-up menu. In the Format Axis window that opens in the side, ensure you are in the Axis
options section. Under the “Label” category, flip the arrow down and select “Low” in the drop down menu to the right
of Label Position.
Abortion
4. Next, convert the negative labels on the x-axis by right clicking on the axis and selecting “Format Axis” in the pop-up
menu. In the Format Axis window that opens in the side, ensure that you are in the Axis Options section. Under the
MNCH
“Number” category, flip the arrow down and select “Custom” in the drop down menu under Category. In the Format
Code section, clear any pre-written text and type in “0;0;0;” and click on “Add.”
5. To ensure that your categories are showing up in the correct order within each bar, select any of the bar segments in
Contraception the chart and go to the “Chart Design” tab in the ribbon. Click on “Select Data” and in the dialog box that pops up,
click on the down arrow once in the “Legend Entries” section. Exit the dialog box by clicking “Ok.”
10 5 0 5 10 6. To order your data, select your entire data table and go to the “Data” tab in the ribbon. Select “Sort” and choose the
Dissatisfied Strongly dissatisfied column that you want to sort on and the order that you want your data to be organized in.
Satisfied Strongly Satisfied 7. Finally, change the colors of your chart by selecting dark red for the “Strongly Dissatisfied” category, light red for the
“Dissatisfied” category, light green for the “Satisfied” category, and dark green for the “Strongly Satisfied” category.
73
Decluttering
& Ordering
74
Remove Borders, Gridlines, & Background Fills
75
Outline Shapes in White
76
Get Rid of 3D Effects and Leader Lines
100 100
90 90
80 80
70 70
60 60
50 50
40 40
30 30
20 20
20
10
0 10
0 20
60
60
50
50
10
10
77
Minimize Use of 2 Y-Axes, Horizontal or Diagonal
Text, & Unnecessary Decimals
45 120% 70.00 70
40
100%
35
30 80% 60.00 60
25
60%
20
15 40% 50.00 50
10
20%
5
0 0% 40.00 40
PY18 PY19 PY20 PY21 PY22 PY23
Indicator 4
Indicator 1
Indicator 2
Indicator 3
20%
5 1 2 3 4
0 0%
PY18 PY19 PY20 PY21 PY22 PY23 PY18 PY19 PY20 PY21 PY22 PY23
78
Remove Redundant Axes or Labels
90 90
80
70
80 80 80 80 80
60
50
70 70 70
40
30
60 20
10
50 50 50 0
40 40 40 40
80 80 80 80
70 70
30
50 50
20 20 20 20 40 40 40
15
10 10 10 10 20 20 20
15
10 10 10
79
Place Labels as Close to Data as Possible
80
Exploit Conventional Order
81
Clarifying with
Color & Text
82
Use Pathfinder’s Color Scheme
83
Ensure Legibility for Colorblindness
84
Ensure Legibility for Grayscale
85
Color Code to Enhance Meaning
86
Use Saturated Colors to Draw Attention
87
Use Pathfinder’s Font Scheme
88
Annotate Using Text and Color Together
89
Make Your Title Count
90
Day 2 Closing
91
SAMENA Data
Visualization Training
Day 3
Stephen Konah, Jack Hazerjian, & Shoba Ramachandran
Share 1 new
learning (big
or small) from
yesterday.
Day 3
4,000 3653
3388 3419
3255
3,000 2808
2354 2445
2196
1923 2003
2,000 1623 1686
1477 13571349 1427
97
Give this chart a makeover!
Details about the audience:
Multiple stakeholders are present in the 6,000
98
Give this chart a makeover!
6,000
Number of clients provided with FP services
4,000 3653
3388 3419
3255
3,000 2808
2354 2445
2196
1923 2003
2,000 1623 1686
1477 13571349 1427
99
Focus on what the audience is interested in!
5,000
Number of clients provided with FP services
4613
4,500
4,000 3653
3388 3419
3,500 3255
3,000 2808
2,500 2354
2196
1923
2,000
1623
1477 1357 1427
1,500
1,000 515 670 224
636
500 254 305
153
0
District 1 District 2 District 3 District 4 District 5 District 6 District 7 District 8 District 9 District 10
Before COVID-19 During COVID-19
100
Don’t make the audience work for the message!
2,000
Number of clients provided with FP services
District 10 769
District 9 71
District 8 365
District 7 721
District 6 845
District 5 446
District 4 1358
District 3 1065
District 2 261
District 1 1765
0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,000
Difference between clients reached pre-COVID and intra-COVID
102
Get rid of chart outlines!
District 10
District 9
District 8
District 7
District 6
District 5
District 4
District 3
District 2
District 1
0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,000
Difference between clients reached pre-COVID and intra-COVID
103
Order the data!
District 1
District 4
District 3
District 6
District 10
District 7
District 5
District 8
District 2
District 9
0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,000
Difference between clients reached pre-COVID and intra-COVID
104
Use Pathfinder colors and font!
District 1
District 4
District 3
District 6
District 10
District 7
District 5
District 8
District 2
District 9
0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,000
105
Give a meaningful title and subtitle!
Pathfinder reached more clients with FP services during the COVID-19 pandemic
Pathfinder’s programs redoubled their efforts during the pandemic to provide a greater number of clients with FP services during the intra-pandemic period as
compared to the pre-pandemic period across all 10 districts. The largest difference was seen in District 1, which saw approximately 1,700 more clients have
access to FP services.
District 1
District 4
District 3
District 6
District 10
District 7
District 5
District 8
District 2
District 9
0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,000
COVID-19 pandemic
Pathfinder’s programs redoubled their efforts to provide a greater number of clients
5,000 47864860 with FP services during the intra-pandemic period as compared to the pre-pandemic
4568 4613 period across all 10 districts. The largest difference was observed in District 1, which saw
approximately 1,700 more clients have access to FP services.
4,000 3653
3388 3419 District 1
3255
District 4
3,000 2808
2354 2445 District 3
2196
1923 2003 District 6
2,000 1623 1686
1477 1357
1349 1427 District 10
889 District 7
1,000 515
498 636 670 224
254 305 153 408 District 5
District 8
0
District 2
District 9
Before COVID-19 During COVID-19 After COVID-19 Difference between clients reached
pre-COVID and intra-COVID
107
Why alternative isn’t as good
5,000
4,000
3,000
2,000
1,000
0
District 9 District 2 District 8 District 7 District 10 District 5 District 1 District 3 District 6 District 4
Before COVID-19 During COVID-19
108
A Tour of Data
Visualization
Tools
109
Introduction
110
Question to the Group
111
What to Look for When Choosing A Tool
So how do you choose the right one for you?
When selecting a data visualization solution for your project/assignment, check:
Pricing
✓ Can we afford the tool?
✓ Is the tool a good value for us?
✓ Is there a cheaper tool that can provide the same value?
Future-Compatibility
✓ Can the tool handle a higher volume of data?
✓ Can we use the tool to track our key performance indicator (KPI) results?
✓ Will we still use the tool one year from now?
112
Microsoft Excel
PROS
■ Quick visualization production
■ Simple to learn and use
■ Ideal for traditional charts
■ Relatively cheap
CONS
■ Lack of built-in automation features can prolong the process of
creating reports, particularly when working with large data sets.
Brief about the tool ■ Not ideal for relational datasets
Microsoft Excel is the oldest (and likely best known)
■ Not ideal for non-traditional charts
program on this list. Launched by Microsoft in 1987,
Excel allows you to create tables, charts, and roughly ■ Can be slow when processing large datasets
20 other visualizations.
113
Power BI
PROS
▪ Data Visualization: Power BI provides rich and interactive data visualization capabilities,
allowing you to create compelling charts, graphs, maps, and dashboards to present data in
a visually appealing manner.
▪ User-Friendly Interface: The user interface is intuitive and easy to use, making it accessible
for both technical and non-technical users to create reports and dashboards without
extensive coding knowledge.
▪ Integration: Power BI integrates seamlessly with other Microsoft products like Excel, Azure,
and SQL Server, as well as a wide range of third-party data sources, databases, and cloud
services.
▪ Real-Time Dashboards: You can create real-time dashboards that provide up-to-the-
minute insights by connecting to live data sources and refreshing data at regular intervals.
▪ Customization: Power BI offers a high degree of customization, allowing you to customize
visuals, themes, and layouts to match your organization's branding and reporting
requirements.
CONS
Brief about the tool
▪ Cost: While Power BI offers a free version, advanced features and capabilities are available
Microsoft Power BI is a collection of apps, software through paid licensing, which might be costly for some organizations.
services and connectors that come together to turn ▪ Learning Curve: While the interface is user-friendly, mastering the more advanced features
unrelated data into visually impressive and interactive and techniques might require some learning, especially for users who are new to data
insights. Power BI can work with simple data sources like analysis.
Microsoft Excel and complicated ones like cloud-based or ▪ Limited Offline Access: Power BI reports heavily rely on live data connections, which can
on-premises hybrid Data warehouses. Power BI has the limit offline access to reports and dashboards.
capabilities to easily connect to your data sources, ▪ Data Security: Sharing data externally could pose security concerns if not properly
managed, requiring careful consideration of data access and permissions.
visualize and share and publish your findings with anyone ▪ Licensing Complexity: The licensing model can be intricate, with different pricing tiers and
and everyone. options, making it important to choose the right plan for your organization's needs.
114
Tableau
PROS
▪ Ease of Use: Tableau provides a user-friendly, drag-and-drop interface that makes it
accessible to users with varying levels of technical expertise.
▪ Rapid Visualization: Tableau allows for quick creation of interactive visualizations and
dashboards without requiring extensive coding.
▪ Wide Range of Data Sources: Tableau offers extensive data source connectivity,
allowing users to connect to various databases, spreadsheets, cloud services, and
other data repositories.
▪ Interactive Dashboards: Tableau's interactive dashboards enable users to explore
data in real-time, allowing for better insights and decision-making.
▪ Data Blending: Tableau's data blending capabilities allow users to combine data from
multiple sources without the need for complex data transformations.
CONS
▪ Cost: Tableau can be costly, especially for larger organizations or advanced features.
▪ Learning Advanced Features: While basic visualization creation is intuitive,
mastering more advanced features and customizations might require additional
Brief about the tool
learning.
Tableau is an excellent data visualization and business ▪ Customization Constraints: While Tableau offers customization, it might have
intelligence tool used for reporting and analyzing vast limitations compared to coding-based solutions like R or Python
volumes of data. It helps users create different charts, ▪ Dependency on GUI: Users might feel constrained by the GUI-driven approach,
especially if they are accustomed to coding-based data analysis.
graphs, maps, dashboards, and stories for visualizing ▪ Data Transformation Complexity: Complex data transformations might require
and analyzing data, to help in making business data preparation outside of Tableau before visualization
decisions.
115
R
PROS
● Statistical Analysis: R is widely known for its strong statistical analysis capabilities,
making it a preferred choice for researchers, statisticians, and data analysts dealing
with complex statistical models.
● Data Manipulation: The dplyr and tidyr packages in R provide intuitive and efficient
tools for data manipulation and transformation.
● Data Visualization: R offers a wide range of visualization libraries like ggplot2, which
allows for the creation of highly customizable and publication-quality plots.
● Community and Packages: R has a vibrant community and a vast ecosystem of
packages for various data analysis and visualization tasks, enabling users to access
specialized tools for their needs.
● Statistical Modeling: R provides extensive support for building and interpreting
advanced statistical models, making it suitable for advanced data analysis.
CONS
Brief about the tool ● Learning Curve: R can have a steeper learning curve, particularly for users who are
R is a programming language for statistical computing and new to programming or statistical analysis.
graphics supported by the R Core Team and the R Foundation ● Performance: R can be slower than other programming languages for certain tasks,
for Statistical Computing. Created by statisticians Ross Ihaka especially when dealing with large datasets or computationally intensive operations.
and Robert Gentleman, R is used among data miners, ● Memory Management: Memory management in R can sometimes be challenging,
bioinformaticians and statisticians for data analysis and particularly when working with larger datasets.
developing statistical software. ● Limited GUI for Visualization: While RStudio provides an interactive environment, it
doesn't have a drag-and-drop GUI tool for visualization creation like Power BI/Tableau.
● Fragmented Ecosystem: The large number of packages available for R can lead to
fragmentation, with various packages providing similar functionality but with different
approaches.
116
Python
PROS
● Versatility: Python is a general-purpose programming language with a vast ecosystem
of libraries and tools for data analysis, making it suitable for a wide range of tasks beyond
just visualization.
● Open Source: Python is open-source and free to use, making it accessible to individuals
and organizations with varying budgets.
● Rich Libraries: Python offers powerful libraries like pandas for data manipulation,
NumPy for numerical computations, machine learning, AI, and Matplotlib/Seaborn for
data visualization, providing a robust toolkit for data analysis.
● Customization: Python's visualization libraries allow for deep customization, enabling
you to create highly tailored and specialized visualizations.
● Integration: Python can be easily integrated into existing data workflows, pipelines, and
other tools, regardless of the technology stack.
CONS
● Visualization Learning Curve: While libraries like Matplotlib and Seaborn offer robust
visualization capabilities, creating complex visualizations might require a deeper understanding
of their intricacies.
Brief about the tool ● Limited GUI for Visualization: While Jupyter notebooks and libraries provide interactive
Python is a computer programming language often used to visualizations, Python does not have a dedicated GUI tool like Power BI or Tableau for drag-and-
build websites and software, automate tasks, and conduct data drop visualization creation.
analysis.
● Data Security: Proper data access controls and security measures need to be in place when using
Python is a general-purpose language, meaning it can be used
Python for data analysis to prevent unauthorized access to sensitive information.
to create a variety of different programs and isn’t specialized for
any specific problems. This versatility, along with its beginner- ● Fragmentation: The multitude of available libraries can lead to a fragmented ecosystem, with
friendliness, has made it one of the most-used programming different libraries offering similar functionalities, making it challenging to choose the right one.
languages today. ● Learning Curve: While Python is considered relatively easy to learn, mastering its more advanced
features and libraries might take some time, especially for those new to programming.
117
Ethical
Considerations in
Data Visualizations
118
Honesty & Accuracy
50% 100%
90%
48%
80%
46%
70%
44%
60%
42% 50%
40%
40%
30%
38%
20%
36%
10%
34% 0%
Year 1 Year 2 Year 3 Year 4 Year 1 Year 2 Year 3 Year 4
119
Proper Contextualization & Fairness
Average Number of Performance Deficiencies Noted during Monthly
Contextualization: Providing proper context
Supervisory Visits to 12 Clinicians Conducted July-December 2022
is essential. Failing to do so can lead to
misinterpretation or misrepresentation of 0.0
data. Clearly explain the data source, Jul-22 Aug-22 Sep-22 Oct-22 Nov-22 Dec-22
This line graph uses a negative y-axis and its monthly data of noted deficiencies is
shown as negative values -- which is unusual and may lead to confusion.
Explanation is required to indicate that positive results are shown in the decrease
of these monthly values, whereby a monthly value of “0” would be ideal.
120
Transparency, Privacy and Confidentiality
Bubbles should reflect differences in values by their comparative areas, not their diameters
Month
1,650 1,650 staffed by 8
Month
about them. clinicans
clinicians
Average Number of
Facilities Staffed by Number of Facilities
Clients Seen per facilities facilities
Number of Clinicians in Province
Month staffed by 5 820 staffed by 5 820
clinicans clinicians
Staffed by 5 Clinicians 126 820
Staffed by 8 Clinicians 42 1,650
Staffed by 12 Clinicians 18 2,560
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
Number of Facilities in Province X Number of Facilities in Province X
Privacy and Confidentiality: Avoid
disclosing sensitive or personally
identifiable information through
visualizations. Anonymize data and
follow best practices for data privacy
and protection.
121
Conflicts of Interest
Monthly Number of Adolescent Women Reported by Facilities in Province X who Have Had an
Conflict of Interest: Disclose Abortion and those who were Seen at FP Free Day, from January 2021 through June 2023
55 700
any conflicts of interest that
Province X
interests or affiliations with 25 300
20
particular organizations. 15 200
10 100
5
0 0
To limit appearance of
conflict of interest: Monthly Number of Adolescent Women Reported by Facilities in Province X who Have Had an
• admit when data results 55
Abortion and those who were Seen at FP Free Day, from January 2021 through June 2023 700
are inconclusive
Adolescent Women at Facilities in
25 300
clearly show patterns 20
200
• present all available data 15
10 100
– even if some are 5
contrary to your interests 0 0
122
Representation and Diversity
Individuals visited at Home by Village Health Worker in XYZ District, July
through December 2022
Representation and Diversity:
3000
Ensure that your visualizations
fairly represent all relevant groups 2500 Does the selection
within the data, and avoid of these colors
suggest that some
2000
marginalizing or misrepresenting age groups are
given more visual
any group. This includes gender, 1500 priority than the
race, ethnicity, socioeconomic others?
1000
status, etc.
500
0
Jul-22 Aug-22 Sep-22 Oct-22 Nov-22 Dec-22
123
Cultural Sensitivity
Cultural Sensitivity: Be
aware of cultural
differences and
sensitivities – such as
with colors – that might
affect how data is
interpreted.
Visualizations should be
designed in a way that
respects cultural norms
and values.
124
Feedback & Correction
Feedback and
Correction: Be open
to feedback and
corrections. If errors
or issues are
identified in your
visualizations, take
appropriate
measures to rectify
them.
125
Additional
Resources &
Closing 126
Additional Resources
127
Parting Thoughts
128
"Until you dig a hole, you plant a
tree, you water it and make it
survive, you haven't done a thing.
You are just talking."
― Wangari Maathai,
Nobel Peace Prize laureate,
environmentalist,
political activist
129