Chapter 3 of the document discusses data visualization, emphasizing the importance of effective design techniques such as the data-ink ratio for creating meaningful tables and charts. It covers the use of tables for precise numerical comparisons and introduces crosstabulation and PivotTables in Excel as tools for data analysis. Additionally, the chapter highlights various chart types, including scatter and line charts, for visually representing data relationships and trends.
Chapter 3 of the document discusses data visualization, emphasizing the importance of effective design techniques such as the data-ink ratio for creating meaningful tables and charts. It covers the use of tables for precise numerical comparisons and introduces crosstabulation and PivotTables in Excel as tools for data analysis. Additionally, the chapter highlights various chart types, including scatter and line charts, for visually representing data relationships and trends.
Introduction • Data visualization involves: • Creating a summary table for the data. • Generating charts to help interpret, analyze, and learn from the data. • Uses of data visualization: • Helpful for identifying data errors. • Reduces the size of your data set by highlighting important relationships and trends in the data.
Overview of Data Visualization (Slide 1 of 5) Effective Design Techniques: • Data-ink ratio: Measures the proportion of what Tufte terms “data-ink” to the total amount of ink used in a table or chart. • Edward R. Tufte first described the data-ink ratio. • Helpful for creating effective tables and charts for data visualization: • Data-ink: Ink used in a table or chart that is necessary to convey the meaning of the data to the audience. • Non-data-ink: Ink used in a table or chart that serves no useful purpose in conveying the data to the audience.
Overview of Data Visualization (Slide 5 of 5) Figure 3.4: Increasing the Data-Ink Ratio by Adding Labels to Axes and Removing Unnecessary Lines and Labels
Tables (1 of 18) Tables should be used when: 1. The reader needs to refer to specific numerical values. 2. The reader needs to make precise comparisons between different values and not just relative comparisons. 3. The values being displayed have different units or very different magnitudes.
Tables (6 of 18) Table Design Principles: • Avoid using vertical lines in a table unless they are necessary for clarity. • Horizontal lines are generally necessary only for separating column titles from data values or when indicating that a calculation has taken place.
Tables (11 of 18) Table 3.6: Quality Rating and Meal Price for 300 Los Angeles Restaurants Restaurant Quality Rating Meal Price ($) Wait Time (min) 1 Good 18 5 2 Very Good 22 6 3 Good 28 1 4 Excellent 38 74 5 Very Good 33 6 6 Good 28 5 7 Very Good 19 11 8 Very Good 11 9 9 Very Good 23 13 10 Good 13 1
Tables (12 of 18) Table 3.7: Crosstabulation of Quality Rating and Meal Price for 300 Los Angeles Restaurants Meal Price Quality Rating $10–19 $20–29 $30–39 $40–49 Total Good 42 40 2 0 84 Very Good 34 64 46 6 150 Excellent 2 14 28 22 66 Total 78 118 76 28 300 • The greatest number of restaurants in the sample (64) have a very good rating and a meal price in the $20–29 range. • Only two restaurants have an excellent rating and a meal price in the $10–19 range. • The right and bottom margins of the crosstabulation give the frequencies of quality rating and meal price separately.
Tables (15 of 18) Figure 3.10: Completed PivotTable Field List and a Portion of the PivotTable Report for the Restaurant Data (Columns H:AK Are Hidden)
Charts Scatter Charts Recommended Charts in Excel Bubble Charts Line Charts Heat Maps Bar Charts and Column Charts Additional Charts for Multiple Variables A Note on Pie Charts and PivotCharts in Excel Three-Dimensional Charts
Charts (1 of 26) • Charts (or graphs): Visual methods of displaying data. • Scatter chart: Graphical presentation of the relationship between two quantitative variables. • Trendline: A line that provides an approximation of the relationship between the variables. • Line chart: A line connects the points in the chart. • Useful for time series data collected over a period of time (minutes, hours, days, years, etc.).
Charts (4 of 26) Table 3.9: Monthly Sales Data of Air Compressors at Kirkland Industries Month Sales ($100s) Jan 150 Feb 145 Mar 185 Apr 195 May 170 Jun 125 Jul 210 Aug 175 Sep 160 Oct 120 Nov 115 Dec 120
Charts (6 of 26) Sales ($100s) Sales ($100s) Month North South Table 3.10: Regional Sales Jan 95 40 Data by Month for Air Feb 100 45 Compressors at Kirkland Mar 120 55 Industries (Kirkland Regional Data) Apr 115 65 May 100 60 Jun 85 50 Jul 135 75 Aug 110 65 Sep 100 60 Oct 50 70 Nov 40 75 Dec 40 80
Charts (8 of 26) Sparkline: Special type of line chart: • Minimalist type of line chart that can be placed directly into a cell in Excel. • Contains no axes; they display only the line for the data. • Takes up very little space and can be effectively used to provide information on overall trends for time series data.
Charts (10 of 26) • Bar Charts: Use horizontal bars to display the magnitude of the quantitative variable. • Column Charts: Use vertical bars to display the magnitude of the quantitative variable. • Bar and column charts are very helpful in making comparisons between categorical variables.
Charts (14 of 26) • Pie chart: Common form of chart used to compare categorical data. • Bubble chart: Graphical means of visualizing three variables in a two-dimensional graph that sometimes is a preferred alternative to a 3-D graph. • Heat map: A two-dimensional graphical representation of data that uses different shades of color to indicate magnitude.
Charts (16 of 26) Table 3.11: Sample Data on Billionaires per Country
Billionaires per Per Capita No. of
Country 10M Residents Income Billionaires United States 54.7 $54,600 1,764 China 1.5 $12,880 213 Germany 12.5 $45,888 103 India 0.7 $ 5,855 90 Russia 6.2 $24,850 88 Mexico 1.2 $17,881 15
Charts (19 of 26) Additional Charts for Multiple Variables: • Stacked-column chart: Allows the reader to compare the relative values of quantitative variables for the same category in a bar chart. • Clustered-column (or bar) chart: An alternative chart to stacked-column chart for comparing quantitative variables. • Scatter-chart matrix: Useful chart for displaying multiple variables.
Charts (23 of 26) Table 3.12: Data for New York City Subboroughs (cont.) Median Percentage Monthly Rent College Poverty Rate Travel Time Area ($) Graduates (%) (%) (min) Brooklyn Heights/Fort Greene 1,086 55.3 17.4 34.5 Brownsville/Ocean Hill 714 11.6 36.0 40.3 Bushwick 945 13.3 33.5 35.5 Central Harlem 665 30.6 27.1 25.0 Chelsea/Clinton/Midtown 1,624 66.1 12.7 43.7 Coney Island 786 27.2 20.0 46.3
Charts (25 of 26) PivotCharts in Excel: PivotChart: To summarize and analyze data with both a crosstabulation and charting, Excel pairs PivotCharts with PivotTables.
Advanced Data Visualization (1 of 7) Advanced Charts: • Parallel-coordinates plot: Chart for examining data with more than two variables: • Includes a different vertical axis for each variable. • Each observation is represented by drawing a line on the parallel- coordinates plot connecting each vertical axis. • The height of the line on each vertical axis represents the value taken by that observation for the variable corresponding to the vertical axis. • Treemap: Useful for visualizing hierarchical data along multiple dimensions.
Advanced Data Visualization (5 of 7) Geographic Information Systems Charts: • Geographic information system (GIS): A system that merges maps and statistics to present data collected over different geographic areas. • Helps in interpreting data and observing patterns.
Data Dashboards (1 of 3) • Data dashboard: Data-visualization tool that illustrates multiple metrics and automatically updates these metrics as new data become available. Principles of Effective Data Dashboards: • Key performance indicators (KPIs) in dashboards: • Automobile dashboard: Current speed, Fuel level, and oil pressure. • Business dashboard: Financial position, inventory on hand, customer service metrics.
Data Dashboards (2 of 3) Principles of Effective Data Dashboards (continued): • Should provide timely summary information on KPIs that are important to the user. • Should present all KPIs as a single screen that a user can quickly scan to understand the business’s current state of operations. • The KPIs displayed in the data dashboard should convey meaning to its user and be related to the decisions the user makes. • A data dashboard should call attention to unusual measures that may require attention. • Color should be used to call attention to specific values to differentiate categorical variables, but the use of color should be restrained.