SlideShare a Scribd company logo
Data Analysis for Everyone
Martin Monkman 
• Provincial Statistician & Director, BC Stats 
• been getting paid to do data analysis in one 
form or another since the mid-1980s 
• B.Sc. and M.A. in Geography (UVic) 
• member of SABR 
• bayesball.blogspot.ca
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
1. Start with a question 
ALWAYS! 
And don’t start with data! 
• Five Ws
Some examples of questions 
• What was the population of Victoria in 
1996? And what will the population of 
Victoria be in 2029? 
• What are the demographics of Victoria? 
• What do Victoria residents think about 
infrastructure investment?
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
2. Get some data 
Remember: after your research question has 
been asked! 
Two sources: 
• Third party data 
• Collect your own
Sources of third party data 
Open Data 
• Social data: Statistics Canada 
• The Census of Canada 
• National Household Survey 
• www.statcan.gc.ca 
• DataBC 
• www.data.gov.bc.ca
Collect your own data 
Administrative sources 
• Registration information 
• Transactions 
Original data collection 
• Survey
Surveys 
From the Twenty Questions: 
• Who is your population? 
• How are you going to reach them? 
• What do you already know about them?
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
3. Data Analysis 
• Differences 
• Distributions 
• Magnitude 
• Patterns 
• Proportions 
• Relationships 
• Trends
Data Analysis: How-to 
• MOOCs 
• google “Making Sense of Data” 
• Coursera 
• https://www.coursera.org/course/introstats 
• https://www.coursera.org/course/dataanalysis 
• https://www.class-central. 
com/mooc/388/coursera-computing-for-data- 
analysis
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
4. Data Visualization 
“Graphics are instruments for reasoning about 
quantitative information.” 
(Edward R. Tufte) 
Purposes 
• Exploratory Data Analysis 
• Narrative
“The greatest value of a picture is when it 
forces us to notice what we never 
expected to see.” – John Tukey
Anscombe’s Quartet 
STATISTICAL MEASURES OF 
EACH OF THE FOUR DATA SETS 
Mean of x = 9 (exact) 
Variance of x = 11 (exact) 
Mean of y = 7.50 
Variance of y = 4.122 or 4.127 
Correlation between x and y = 0.816 
Regression equation: 
y = 3.00 + 0.500x
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Population pyramid
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
http://cran.r-project.org/
Capital Regional District, population by municipality, 2013 
Data source: Statistics Canada & BC Stats
Capital Regional District, population by municipality and region, 2013 
Data source: Statistics Canada & BC Stats
Capital Regional District population, 1996-2013 
Data source: Statistics Canada & BC Stats
Year-over-year population change, Capital Regional District 
Data source: Statistics Canada & BC Stats
Census tracts 
Data source: Statistics Canada & BC Stats
Victoria CMA – median after-tax income (2005), by Census Tract 
Data source: Statistics Canada & BC Stats
Data source: Statistics Canada
Mapping 
Source: Harvard Dialect Survey / Joshua Katz
How can I improve my data 
visualizations? 
• Work with data 
• Experiment 
• Get feedback from others 
• Look for good examples 
• Look for bad examples
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Five Degrees of Obfuscation 
Debris 
Garbage 
Rubbish 
Trash 
Waste 
25 
20 
15 
10 
5 
0 
Trash Debris Rubbish Waste Garbage 
Units 
Five Columns of Clarity
Foreshortened circles
An illusion of distance and volume
No 3D. Ever.
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
martin.monkman@gmail.com 
@monkmanmh 
bayesball.blogspot.ca

More Related Content

Similar to Net2Vic: Effective Data Analysis for Everyone (October 23, 2014) (20)

PDF
Measuring and Mapping Population
hantsga
 
PDF
march2023.pdf
Data For Good Regina
 
PPTX
Understanding disparities using the American Community Survey - Sean Green, M...
Seattle DAML meetup
 
PPTX
Social Science Students: Making Census Data Work for You
Celia Emmelhainz
 
PPTX
The Five Largest Foreign-Born Groups in Massachusetts
Instituto Diáspora Brasil (IDB)
 
PPTX
Presentation - Using open data to develop statistical literacy in schools - U...
celiamac58
 
PPT
The Real Census informs Neighbourhood Research in Canada
Communication and Media Studies, Carleton University
 
PPTX
Sources and Applications for Open Economic Data
Renaine Julian
 
PDF
Making MAGIC with Your Data: Methods for Visualizing Data using 2010 Census Data
University of Connecticut Libraries Map and Geographic Information Center - MAGIC
 
PPTX
Celia Russell - Brexit data
Open Data Manchester
 
PDF
Datasets slidesrachel kotarski
Robin Saklatvala
 
PDF
Immigration Research: Numbers and Findings
borderzine
 
PDF
The numbers are in... 2010 Census for Connecticut
University of Connecticut Libraries Map and Geographic Information Center - MAGIC
 
PPTX
Teaching Students How (Not) to Lie with Statistics
Lynette Hoelter
 
PPTX
Homogeneity of Community Areas in Chicago
Raed Mansour
 
PPTX
Data For Regional Planning
RPO America
 
Measuring and Mapping Population
hantsga
 
march2023.pdf
Data For Good Regina
 
Understanding disparities using the American Community Survey - Sean Green, M...
Seattle DAML meetup
 
Social Science Students: Making Census Data Work for You
Celia Emmelhainz
 
The Five Largest Foreign-Born Groups in Massachusetts
Instituto Diáspora Brasil (IDB)
 
Presentation - Using open data to develop statistical literacy in schools - U...
celiamac58
 
The Real Census informs Neighbourhood Research in Canada
Communication and Media Studies, Carleton University
 
Sources and Applications for Open Economic Data
Renaine Julian
 
Making MAGIC with Your Data: Methods for Visualizing Data using 2010 Census Data
University of Connecticut Libraries Map and Geographic Information Center - MAGIC
 
Celia Russell - Brexit data
Open Data Manchester
 
Datasets slidesrachel kotarski
Robin Saklatvala
 
Immigration Research: Numbers and Findings
borderzine
 
Teaching Students How (Not) to Lie with Statistics
Lynette Hoelter
 
Homogeneity of Community Areas in Chicago
Raed Mansour
 
Data For Regional Planning
RPO America
 

More from NetSquared Victoria (10)

PPTX
Net2Vic: How to Use Domain Names to Build and Protect your Brand Online
NetSquared Victoria
 
PPTX
Net2Vic: Branding as a non-profit: Putting your brand to work for you
NetSquared Victoria
 
PDF
Net2Vic: How to have fun with Facebook ads
NetSquared Victoria
 
PPTX
Net2Vic: Resolving the Design & Content Challenge: Automated and Curated News...
NetSquared Victoria
 
PDF
Net2Vic: Subject, opens, clicks - oh my! An email discussion panel
NetSquared Victoria
 
PDF
Net2Vic: How to Choose a Content Management System for Your New Website
NetSquared Victoria
 
PPTX
Net2Vic: Where do all the People Live: Choosing the Right Tools to Track Your...
NetSquared Victoria
 
PPTX
Net2Vic: Good for Nothing presentation (January 2014)
NetSquared Victoria
 
PPT
How non profits can assess and evaluate privacy risks (net2vic october 2013)
NetSquared Victoria
 
PDF
Growing Your Digital Fundraising — E-Blasting 101 (Net2Vic August 2013)
NetSquared Victoria
 
Net2Vic: How to Use Domain Names to Build and Protect your Brand Online
NetSquared Victoria
 
Net2Vic: Branding as a non-profit: Putting your brand to work for you
NetSquared Victoria
 
Net2Vic: How to have fun with Facebook ads
NetSquared Victoria
 
Net2Vic: Resolving the Design & Content Challenge: Automated and Curated News...
NetSquared Victoria
 
Net2Vic: Subject, opens, clicks - oh my! An email discussion panel
NetSquared Victoria
 
Net2Vic: How to Choose a Content Management System for Your New Website
NetSquared Victoria
 
Net2Vic: Where do all the People Live: Choosing the Right Tools to Track Your...
NetSquared Victoria
 
Net2Vic: Good for Nothing presentation (January 2014)
NetSquared Victoria
 
How non profits can assess and evaluate privacy risks (net2vic october 2013)
NetSquared Victoria
 
Growing Your Digital Fundraising — E-Blasting 101 (Net2Vic August 2013)
NetSquared Victoria
 
Ad

Recently uploaded (20)

PDF
Item # 1a - June 23, 2025 CCM Minutes.pdf
ahcitycouncil
 
PDF
Carta de la Autoridad de las APP a LUMA Energy
MetroPuertoRico
 
PPTX
Presentation of the European Youth Foundation grants
EuropeanYouthFoundation
 
PDF
IIED's Environmental Sustainability Impact Report 2023-24
IIED
 
PDF
La Chine après le communisme - Hudson Institute
EdouardHusson
 
PDF
Item # 4 & 5 - 415 & 423 Evans Avenue (replat)
ahcitycouncil
 
PDF
UK email opt-in changes - what do they mean for charities?
More Onion
 
PDF
PPT Item # 4 & 5 - 415 & 423 Evans Avenue
ahcitycouncil
 
PDF
Item # 2&3 - 212.216 & 220 Routt, 325 & 329 Kampmann, 5501 N. New Braunfels
ahcitycouncil
 
PPTX
Bloodborne_Pathogens_PPT_v-03-01-17.pptx
chadrickkeller
 
PDF
PPT Item # 2&3 - 212.216 & 220 Routt, 325 & 329 Kampmann, 5501 N. Bew Braunfe...
ahcitycouncil
 
PDF
UGANDA NATIONAL ANTHEM IN LUGANDA - LYRICS.pdf
MIKE SSENDIKWANAWA
 
PDF
Female Student Internship Program - Cohort VI Open Call
Excellence Foundation for South Sudan
 
PPTX
Aspire Leaders Project ( an app to schedule monthly medication deliveries for...
biancaleao5
 
PDF
Review Senator Tom Harkin on Farm Programs
Brad Wilson
 
PDF
Shree Shakti Seva Kendra – Child Development NGO in Mehsana
shreeshaktisevakendr
 
PPT
1406185655868-HOER Rules 13.1.11.ppt irtm
DeepakKumar311204
 
PPTX
原版Winchester毕业证文凭温彻斯特大学成绩单水印办理流程办文凭
e7nw4o4
 
PDF
ST/SC startup schemes, by angc group pvt ltd.
ANGC Group India Private Limited
 
PDF
Indivisible Upstate SC Members Meeting July 26, 2025
indivisibleupstatesc
 
Item # 1a - June 23, 2025 CCM Minutes.pdf
ahcitycouncil
 
Carta de la Autoridad de las APP a LUMA Energy
MetroPuertoRico
 
Presentation of the European Youth Foundation grants
EuropeanYouthFoundation
 
IIED's Environmental Sustainability Impact Report 2023-24
IIED
 
La Chine après le communisme - Hudson Institute
EdouardHusson
 
Item # 4 & 5 - 415 & 423 Evans Avenue (replat)
ahcitycouncil
 
UK email opt-in changes - what do they mean for charities?
More Onion
 
PPT Item # 4 & 5 - 415 & 423 Evans Avenue
ahcitycouncil
 
Item # 2&3 - 212.216 & 220 Routt, 325 & 329 Kampmann, 5501 N. New Braunfels
ahcitycouncil
 
Bloodborne_Pathogens_PPT_v-03-01-17.pptx
chadrickkeller
 
PPT Item # 2&3 - 212.216 & 220 Routt, 325 & 329 Kampmann, 5501 N. Bew Braunfe...
ahcitycouncil
 
UGANDA NATIONAL ANTHEM IN LUGANDA - LYRICS.pdf
MIKE SSENDIKWANAWA
 
Female Student Internship Program - Cohort VI Open Call
Excellence Foundation for South Sudan
 
Aspire Leaders Project ( an app to schedule monthly medication deliveries for...
biancaleao5
 
Review Senator Tom Harkin on Farm Programs
Brad Wilson
 
Shree Shakti Seva Kendra – Child Development NGO in Mehsana
shreeshaktisevakendr
 
1406185655868-HOER Rules 13.1.11.ppt irtm
DeepakKumar311204
 
原版Winchester毕业证文凭温彻斯特大学成绩单水印办理流程办文凭
e7nw4o4
 
ST/SC startup schemes, by angc group pvt ltd.
ANGC Group India Private Limited
 
Indivisible Upstate SC Members Meeting July 26, 2025
indivisibleupstatesc
 
Ad

Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

  • 1. Data Analysis for Everyone
  • 2. Martin Monkman • Provincial Statistician & Director, BC Stats • been getting paid to do data analysis in one form or another since the mid-1980s • B.Sc. and M.A. in Geography (UVic) • member of SABR • bayesball.blogspot.ca
  • 6. 1. Start with a question ALWAYS! And don’t start with data! • Five Ws
  • 7. Some examples of questions • What was the population of Victoria in 1996? And what will the population of Victoria be in 2029? • What are the demographics of Victoria? • What do Victoria residents think about infrastructure investment?
  • 9. 2. Get some data Remember: after your research question has been asked! Two sources: • Third party data • Collect your own
  • 10. Sources of third party data Open Data • Social data: Statistics Canada • The Census of Canada • National Household Survey • www.statcan.gc.ca • DataBC • www.data.gov.bc.ca
  • 11. Collect your own data Administrative sources • Registration information • Transactions Original data collection • Survey
  • 12. Surveys From the Twenty Questions: • Who is your population? • How are you going to reach them? • What do you already know about them?
  • 15. 3. Data Analysis • Differences • Distributions • Magnitude • Patterns • Proportions • Relationships • Trends
  • 16. Data Analysis: How-to • MOOCs • google “Making Sense of Data” • Coursera • https://www.coursera.org/course/introstats • https://www.coursera.org/course/dataanalysis • https://www.class-central. com/mooc/388/coursera-computing-for-data- analysis
  • 18. 4. Data Visualization “Graphics are instruments for reasoning about quantitative information.” (Edward R. Tufte) Purposes • Exploratory Data Analysis • Narrative
  • 19. “The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey
  • 20. Anscombe’s Quartet STATISTICAL MEASURES OF EACH OF THE FOUR DATA SETS Mean of x = 9 (exact) Variance of x = 11 (exact) Mean of y = 7.50 Variance of y = 4.122 or 4.127 Correlation between x and y = 0.816 Regression equation: y = 3.00 + 0.500x
  • 26. Capital Regional District, population by municipality, 2013 Data source: Statistics Canada & BC Stats
  • 27. Capital Regional District, population by municipality and region, 2013 Data source: Statistics Canada & BC Stats
  • 28. Capital Regional District population, 1996-2013 Data source: Statistics Canada & BC Stats
  • 29. Year-over-year population change, Capital Regional District Data source: Statistics Canada & BC Stats
  • 30. Census tracts Data source: Statistics Canada & BC Stats
  • 31. Victoria CMA – median after-tax income (2005), by Census Tract Data source: Statistics Canada & BC Stats
  • 33. Mapping Source: Harvard Dialect Survey / Joshua Katz
  • 34. How can I improve my data visualizations? • Work with data • Experiment • Get feedback from others • Look for good examples • Look for bad examples
  • 36. Five Degrees of Obfuscation Debris Garbage Rubbish Trash Waste 25 20 15 10 5 0 Trash Debris Rubbish Waste Garbage Units Five Columns of Clarity
  • 38. An illusion of distance and volume

Editor's Notes

  • #2: Data is ubiquitous in our lives and work places, and more data can be easily collected. And those data can, if analyzed correctly, provide insights that lead to better decision making. Martin Monkman will present the key ideas for effective and meaningful data analysis, including sources of existing data, things to think about when collecting new data, analyzing results, and effective presentation and reporting of the data.
  • #4: As I started to think about this, I realized that this is not a topic for a single hour – one could spend a lifetime researching, thinking, and practicing this topic. I have been actively engaged in this topic – in one capacity or another – for (gads) 30 years. And I’m still learning! To use the metaphor of this library, we’ll visit a couple of the stacks and pull of four books. There’s a wealth of information on the topics we’ll skim … some links to further reading are in the notes. =-=-=-=-=-=-=-= Image: Cincinnati-Old Main Library that was demolished in 1955, Source: https://www.flickr.com/photos/80868612@N03/11983800566/in/photolist-aiP4WA-aSvCV2-bu7Ctj-3AEWz-aSv5ZP-bjtFJV-bhkodF-bhkoBD-jfY7p9-bujoxS-bhknQt-4VJnag-4TZ8fJ-5Av8e4-5kfMPH-59oHHj-5GzcXQ-5CowXv Creative Commons
  • #5: Why should you care about data analysis? Because you want to ensure that your limited resources – if you’re running a company, a non-profit, a government program, a household, or a car – are being used in their most effective way possible. To get your car to go 140 km/hour, all of the car’s systems need to be running at their optimal levels. The speedometer isn’t the only dial on the dashboard, and the mechanics we rely on have other diagnostic tools at their disposal. And there are things outside the car that we consider when operating it – an extreme case would be the choices that are made about racing car tires, which are weather and temperature dependent. Let’s imagine for a moment a coffee shop. And you’re the manager! The volume of customers ebbs and flows through the day – a rush first thing in the morning, and then a slow period in the middle of the afternoon before the evening crowd shows up. And there might also be fluctuations through the week; Monday might be busier than Sunday. You don’t want to have the same number of staff on at all times … it would be much more efficient to have more staff on at peak times, and fewer when the shop is less busy. Quantifying those ebbs and flows can help you make more efficient decisions about staffing your coffee shop. So where do we start?
  • #6: So today I want to touch on four ideas for you that I hope will get you started. Image source: https://www.flickr.com/photos/duncan/3782076384 Creative commons
  • #7: Five Ws - who, what, why, when, where
  • #9: So today I want to touch on four ideas for you that I hope will get you started. Image source: https://www.flickr.com/photos/holeymoon/2098042797/in/photostream/ Creative commons
  • #10: Five Ws - who, what, why, when, where
  • #12: Administrative sources – this includes things such as registrations for programs, or transaction data. Much of what is part of “big data” comes from these sorts of administrative records. They are not collected for research purposes, but for the administration of business operations. Some examples: the City of Victoria’s parking meter data – provides insights into utilization across the system, and was used to modify the rates that are charged in different parts of the city grocery stores scan bar codes to keep track of their inventory, but these data points provide information about shopping patterns (what items are being purchased together, the time of day that certain items are selling – important for deli stock!) Or you can collect your own data, through a survey. Surveys – they sound easy, but they are fraught with peril!
  • #13: Who is your population? Who should you be contacting? Are you considering households or individuals? How will you ensure that your respondents are representative of the entire population of interest? How are you going to reach them? - Web surveys sound easy
  • #14: What NOT to do. The Johnson Street bridge, a.k.a. The Blue Bridge. The City of Victoria is in the process of replacing the bridge, and early on in the planning process had a “public consultation”. One element of the consultation process was what was described as a survey of the residents of the City, to gauge their preferences for three different bridge designs. The question was framed as “Do you like Design One, Design B, or Design Three”? There was no “None of the above”, nor was there a “Why don’t we just repair the existing bridge?” As well, there was no option for “Make sure it includes designated bike and pedestrian lanes”, and there wasn’t a “If there’s no rail line on the bridge, what does that mean for the idea of light rail commuter traffic into the downtown?” Another point: the questions were posed on an open website. THIS IS NOT A SURVEY. There were no controls over who could respond (i.e. the “population” was anyone on Earth with a web connection), nor were there controls over how many times one could respond (ballot box stuffing!) I took delight in responding to the survey three times, one for each design, and in the “Do you have any comments?” box, pointing out that a) I was responding multiple times to the survey and b) I don’t live in the City of Victoria. POST-PRESENTATION COMMENTARY: COMMENT #1: My lessons re: the Blue Bridge example -- 1. avoid explicit political opinions, and 2. when using complicated examples like this make sure you've memorized all the details! The Blue Bridge online survey I referenced ran in September 2009, at a time when Victoria City Council had not yet explicitly voted to move forward with replacement. Here's the summary from the Citizen Advisory Committee: http://www.johnsonstreetbridge.com/wp-content/uploads/2009/09/consulation_Backgrounder-Sept-24-09.pdf A few months later a much more reliable and robust Ipsos Reid survey was commissioned by the City (the results were tabled in May 2010). This survey included a variety of options, including rehabilitation (http://www.johnsonstreetbridge.com/wp-content/uploads/2010/05/JSB-Residential-Presentation.pdf) It was a full year after the online “survey” that the referendum on the bridge repair / replacement was held (November 2010; http://www.johnsonstreetbridge.com/public-involvement/get-informed-provide-input/). Even as late as November 2010, the notion of repair was still viewed as an option (see this article from Focus: http://focusonline.ca/?q=node/93). So I stand by my comment that other options beyond the three design choices could have been included in the initial questions. COMMENT #2: It was suggested that tracking IP addresses of the responses would be a way to limit ballot box stuffing. While this does could be used, this approach won’t get to a way to have a reliable sample of the defined population. It doesn’t address: the issue that a single respondent could use multiple IP addresses, it would limit the opportunity for multiple individuals in the same household to legitimately submit responses And finally, there is no reliable way to handle multiple responses from a single IP. Do you accept the first? The last? The average of them all? Image source: https://www.flickr.com/photos/wolfnowl/4976067226 Marcia and Mike Nelson Pedde Creative Commons
  • #15: Data Analysis Image source: https://www.flickr.com/photos/holeymoon/1813639142 Creative commons
  • #16: Here’s everything that they teach you in statistics class. There are a lot of different statistical tests to measure the significant of differences, and Greek names for distributions, and some really elaborate ways to measure relationships … And although the title of the presentation is Data Analysis for Everyone, this is going to be a one-slide section. Although I will point out that relationships – correlations – can be spurious. http://www.tylervigen.com/ Because when it comes to a presentation, nothing beats some good graphics. Which is Idea #4 …
  • #17: Here’s some online courses for those interested in digging deeper into statistical analysis. The google course is a great introduction, which leans to the practical applications and away from detailed methodological issues. The others are more “academic” in inclination.
  • #18: Communicating your analysis. There’s a whole dimension to writing about numbers that we won’t touch on – but let’s look at data visualization, the kind of communication that works for a presentation. Image source: https://www.flickr.com/photos/chrisinplymouth/7912514318/in/photolist-bewNF-abDZkN-7GXehN-89dz9F-6vP8ua-6urLeU-c8g4oj-6BCsT1-87zdmN-7R7FyL-aXen3-6NQvnd-cojYW1-bNQPLB-cEHfdh-dWn5PZ-amh4D3-7FcJ48-6BLKFB-6svVna-6sQUjk-6zbFq7-6Mk98X-6MSd2Q-7FWVn9-6iQgGQ-eksPZa-6BmTjJ-c9UW2f-c8g4QN-6FwgzD-6BPpzG-cR2sDo-d4cGQE-nGpLo4-pdVU4-4JTPJc-cKT99w-7V2djG-dWsuaS-9q1goR-czpP7w-7haSkT-a3Fw5r-aD4nkE-9yHvzR-8mTDhz-5bg7PR-6YTtnv-cpdv7Q Creative commons
  • #19: Charts summarize data to make it more understandable That summary is often to compare different sets of data Exploratory Data Analysis - Graphics as a way to understand the data - Tukey, Cleveland Narrative - “data journalism” - telling a story with the data - answering the 5 W questions - Alberto Cairo
  • #21: http://en.wikipedia.org/wiki/Anscombe%27s_quartet 1973, Francis Anscombe These four data sets of 11 x-y pairs have virtually identical summary statistics.
  • #22: When we look at the four series in Anscombe’s Quartet, we can see that in spite of their identical summary statistics they are VERY different. Plotted in R http://www.r-bloggers.com/the-visual-difference-%e2%80%93-r-and-anscombe%e2%80%99s-quartet/ http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/anscombe.html http://www.inside-r.org/node/36103
  • #23: Minard’s (1869) famous map / data visualization shows Napoleon’s march to Moscow in the winter of 1812. Over 300,000 French troops set off – the thick brown line on the left side of the map. The brown line is much thinner by the time they got to Moscow ... And the black line, showing the size of the army on the return journey, is thinner still. The return trip: only 27,000 remained after the crossing of the Berezina River An in-depth analysis of the map is found in Edward Tufte’s “The Visual Display of Quantitative Information” and here: http://www.csiss.org/classics/content/58 Image: http://news.bbc.co.uk/2/hi/in_pictures/8206064.stm
  • #24: Population pyramid’s are one of the demographer’s greatest tools, and allow us to quickly assess the population of a region. Here, the age and gender make up of the Capital region is compared to the population of British Columbia. It’s easy to see that the Capital region has a smaller proportion of children, roughly the same proportion of people in their 20s, fewer in their 30s – 50s, and more in the 60+ ages. Link: http://population-pyramid.herokuapp.com/#/Capital%252C%2520BC/British%2520Columbia http://bcstats.gov.bc.ca/StatisticsBySubject/ExperimentalDataTools.aspx
  • #25: Even more dramatic is the population of Qualicum Beach – a town where 70 year olds feel like teenagers!
  • #26: Tool for data analysis and data visualization: R FROM http://cran.r-project.org/ R is ‘GNU S’, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Plus graphics, dynamic web display of data and charts, etc
  • #27: This quick plot shows the population of the region by the 13 municipalities, and the unincorporated areas of the western communities and the Saanich peninsula and the Gulf Islands.
  • #28: This revision groups the areas by the “core” municipalities of Victoria, Saanich, Oak Bay, and Esquimalt, the Saanich Peninsula & the Gulf Islands, and the Western Communities.* *A member of the audience pointed out that View Royal is now officially part of “the core”.
  • #29: This chart shows the population of the three parts of the CRD since 1996
  • #30: The most dramatic factor in the growth in the CRD in the past 20 years has been the fact that most of the growth in the past decade has been in the Western communities – in fact, in the past 3 years, the population of the core and Saanich peninsula / Gulf islands has fallen.
  • #31: There are 68 Census Tracts in the Victoria Census Metropolitan Area (basically, the urbanized areas of the CRD). Part A - Short definition: Area that is small and relatively stable. Census tracts usually have a population between 2,500 and 8,000 persons*. They are located in census metropolitan areas and in census agglomerations that have a core population of 50,000 or more. *average of 4,000 See: Statistics Canada http://www12.statcan.gc.ca/census-recensement/2011/ref/dict/geo013-eng.cfm
  • #32: There are 68 Census Tracts in the Victoria Census Metropolitan Area (basically, the urbanized areas of the CRD). Part A - Short definition: Area that is small and relatively stable. Census tracts usually have a population between 2,500 and 8,000 persons*. They are located in census metropolitan areas and in census agglomerations that have a core population of 50,000 or more. *average of 4,000 See: Statistics Canada http://www12.statcan.gc.ca/census-recensement/2011/ref/dict/geo013-eng.cfm
  • #33: Census Tract Profiles http://www12.statcan.gc.ca/census-recensement/2006/dp-pd/prof/92-597/index.cfm?Lang=E For 0003.01 http://www12.statcan.gc.ca/census-recensement/2006/dp-pd/prof/92-597/P3.cfm?Lang=E&CTuid=9350003.01
  • #34: Another example of R used for data visualization. Dialect map http://spark.rstudio.com/jkatz/SurveyMaps/ Carbonated beverage = Q105
  • #36: Image: http://s684.photobucket.com/user/TexasChief/media/pie-i-have-eaten-chart.jpg.html What’s the point of a pie chart? To compare the relationship of the parts out of a whole Pie charts are actually bad at this Save the pies for desert (Stephen Few) Pie is delicious but not nutritious (Peter Flom) In general, circular depictions are discouraged We have a hard time discerning differences in size and angle No fixed reference for comparison Different colour slices can deceive our eyes, too – research has shown that “emphasis” can be judged as size as well
  • #37: It’s a lot easier to see that the five categories are different sizes in the bar chart.
  • #38: Mark Kistler “Imagination Station”
  • #40: Distortion and deception In this case, the area covered by “Taxes” and its 3rd dimension is 1.58 times bigger than “Crude Costs” The two coloured components – the red and the green – are almost exactly the same area Image: http://www.pumptalk.ca/2008/05/sneak-peak---ne.html Discussion: https://ece.uwaterloo.ca/~dwharder/Presentations/Guidelines/VisualAids/ChartsGraphs/PieCharts/
  • #41: Image: http://www.apple.com/ca/apps/iwork/keynote/ Apple Keynote example Pie 3D Wood grain (what does wood grain have to do with the categories? And what are we trying to emphasize?) This from a company that, in all other things (products and marketing) uses minimalist principles. My point: even the most rigorously devoted can succumb to this. Further discussion: http://www.presentationzen.com/presentationzen/2006/01/2d_or_not_2d_th.html http://www.theguardian.com/technology/blog/2008/jan/21/liesdamnliesandstevejobs
  • #42: SOURCE: Bloomberg Business Week http://www.businessweek.com/articles/2013-04-05/who-works-and-who-doesnt-the-labor-force-by-the-numbers#r=hp-ls What’s wrong with this? Design Apparently all men, except for stay at home parents Part-time work = knitting? Colour scheme: grey scale for LF, colour for NitLF Inconsistent capitalization in captions Data representation Text caption adds to 102% (notes acknowledge rounding) but there are 101 icons (21 in top row, 20 in rest) There are 5% disabled but only 4 icons Function Hard to compare groups, eg unemployment is equal to disabled and one-third the size of retired Retired is greater than employed part time
  • #43: Irony: The expression of one's meaning by using language that normally signifies the opposite, typically for humorous or emphatic effect. But we don’t want to interact with it, play with, or interrogate it in this way. Source: http://wtfviz.net/post/60373947977/you-know-youre-in-trouble-when-your-design-firm