0% found this document useful (0 votes)
19 views

DV Unit-I

The document discusses different types of data visualization including numerical, categorical, univariate, bivariate and multivariate data. It explains that data visualization is useful for discovering trends, providing perspectives, putting data in context, saving time and telling data stories. Common techniques for different data types are also outlined.

Uploaded by

vanya798949
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

DV Unit-I

The document discusses different types of data visualization including numerical, categorical, univariate, bivariate and multivariate data. It explains that data visualization is useful for discovering trends, providing perspectives, putting data in context, saving time and telling data stories. Common techniques for different data types are also outlined.

Uploaded by

vanya798949
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

DATA VISUALIZATION

Unit-I – Introduction
What is Data Visualization?
Data visualization is the graphical representation of information and data in a pictorial or
graphical format (Example: charts, graphs, and maps). Data visualization tools provide an
accessible to what to see and understand trends, patterns in data and outliers. Data
visualization tools and technologies are essential to analyzing massive amounts of
information and making data-driven decisions. The concept of using pictures is to understand
data that has been used for centuries.

Data visualization is the one of the steps of the Data Science process, which states that after
data has been collected, processed and modeled, it must be visualized for conclusions to be
made.

Categories of Data Visualization:


Data visualization is very critical to market research where both numerical and categorical
data can be visualized that helps in an increase in impacts of insights and also helps in reducing
risk of analysis paralysis. So, data visualization is categorized into following categories:

1. Numerical Data:
Numerical data is also known as Quantitative data. Numerical data is any data where data
generally represents amount such as height, weight, age of a person, etc. Numerical data
visualization is easiest way to visualize data. It is generally used for helping others to digest
large data sets and raw numbers in a way that makes it easier to interpret into action.
Numerical data is categorized into two categories:
 Continuous Data – It can be narrowed or categorized (Example: Height
measurements).
 Discrete Data – This type of data is not “continuous” (Example: Number of cars or
children’s a household has).

The type of visualization techniques that are used to represent numerical data visualization is
Charts and Numerical Values. Examples are Pie Charts, Bar Charts, Averages, Scorecards, etc.

2. Categorical Data:
Categorical data is also known as Qualitative data. Categorical data is any data where data
generally represents groups. It simply consists of categorical variables that are used to
represent characteristics such as a person’s ranking, a person’s gender, etc. Categorical data
visualization is all about depicting key themes, establishing connections, and lending context.
Categorical data is classified into three categories:
 Binary Data – In this, classification is based on positioning (Example: Agrees or
Disagrees).
 Nominal Data – In this, classification is based on attributes (Example: Male or Female).

 It is not measureable
 It won’t follow any order
 It is not equidistant to each other
 It has no meaningful zero
 Ordinal Data – In this, classification is based on ordering of information (Example:
Timeline or processes).
 It is not measureable
 It will follow an order
 It is not equidistant to each other
 It has no meaningful zero

The type of visualization techniques that are used to represent categorical data is Graphics,
Diagrams, and Flowcharts. Examples are Word clouds, Sentiment Mapping, Venn Diagram,
etc.

Why Visualization?
1. Data Visualization Discovers the Trends in Data:
The most important thing that data visualization does is discovering the trends in data. After
all, it is much easier to observe data trends when all the data is laid out in front of you in a
visual form as compared to data in a table. For example, The below screenshot on Tableau
demonstrates the sum of sales made by each customer in descending order. However, the
color red denotes loss while grey denotes profits. So, it is very easy to observe from this
visualization that even though some customers may have huge sales, they are still in the loss.
This would be very difficult to observe from a table.

2. Data Visualization Provides a Perspective on the Data:


Data Visualization provides a perspective on data by showing its meaning in the larger scheme
of things. It demonstrates how particular data references stand with respect to the overall
data picture. In the data visualization below, the data between sales and profit provides a
data perspective with respect to these two measures. It also demonstrates that there are very
few sales above 12K and higher sales do not necessarily mean a higher profit.
3. Data Visualization Puts the Data into the Correct Context:
It is very difficult to understand the context of the data with data visualization. Since context
provides the whole circumstances of the data, it is very difficult to grasp by just reading
numbers in a table. In the below data visualization on Tableau, a TreeMap is used to
demonstrate the number of sales in each region of the United States. It is very easy to
understand from this data visualization that California has the largest number of sales out of
the total number since the rectangle for California is the largest. But this information is not
easy to understand outside of context without data visualization.
4. Data Visualization Saves Time:
It is definitely faster to gather some insights from the data using data visualization rather than
just studying a chart. In the screenshot below on Tableau, it is very easy to identify the states
that have suffered a net loss rather than a profit. This is because all the cells with a loss are
colored red using a heat map, so it is obvious states have suffered a loss. Compare this to a
normal table where you would need to check each cell to see if it has a negative value to
determine a loss. Obviously, data visualization saves a lot of time in this situation!
5. Data Visualization Tells a Data Story:
Data visualization is also a medium to tell a data story to the viewers. The visualization can be
used to present the data facts in an easy-to-understand form while telling a story and leading
the viewers to an inevitable conclusion. This data story, like any other type of story, should
have a good beginning, a basic plot, and an ending that it is leading towards. For example, if
a data analyst has to craft a data visualization for company executives detailing the profits on
various products, then the data story can start with the profits and losses of various products
and move on to recommendations on how to tackle the losses.

What to plot (Univariate, Bivariate and Multivariate) –


1. Univariate data:
This type of data consists of only one variable. The analysis of univariate data is thus the
simplest form of analysis since the information deals with only one quantity that changes. It
does not deal with causes or relationships and the main purpose of the analysis is to describe
the data and find patterns that exist within it. The example of a univariate data can be height.

Suppose that the heights of seven students of a class is recorded, there is only one variable
that is height and it is not dealing with any cause or relationship. The description of patterns
found in this type of data can be made by drawing conclusions using central tendency
measures (mean, median and mode), dispersion or spread of data (range, minimum,
maximum, quartiles, variance and standard deviation).

Univariate analysis is conducted through several ways which are mostly descriptive in
nature –
 Frequency Distribution Tables
 Histograms
 Frequency Polygons
 Pie Charts
 Bar Charts
2. Bivariate data:
This type of data involves two different variables. The analysis of this type of data deals with
causes and relationships and the analysis is done to find out the relationship among the two
variables. Example of bivariate data can be temperature and ice cream sales in summer
season.
Suppose the temperature and ice cream sales are the two variables of a bivariate data. Here,
the relationship is visible from the table that temperature and sales are directly proportional
to each other and thus related because as the temperature increases, the sales also increase.
Thus, bivariate data analysis involves comparisons, relationships, causes and explanations.
These variables are often plotted on X and Y axis on the graph for better understanding of
data and one of these variables is independent while the other is dependent.

Scatter plot is best for bivariate data.

Bivariate analysis is conducted using –


 Correlation coefficients
 Regression analysis

3. Multivariate data:
When the data involves three or more variables, it is categorized under multivariate. Example
of this type of data is suppose an advertiser wants to compare the popularity of four
advertisements on a website, then their click rates could be measured for both men and
women and relationships between variables can then be examined. It is similar to bivariate
but contains more than one dependent variable. The ways to perform analysis on this data
depends on the goals to be achieved. Some of the techniques are regression analysis, path
analysis, factor analysis and multivariate analysis of variance (MANOVA).

Commonly used multivariate analysis technique include –


 Factor Analysis
 Cluster Analysis
 Variance Analysis
 Discriminant Analysis
 Multidimensional Scaling
Univariate Bivariate Multivariate
It only summarize It only summarize two It only summarize more than 2
single variable at a variables variables.
time.
It does not deal It does deal with causes It does not deal with causes and
with causes and and relationships and relationships and analysis is done.
relationships. analysis is done.
It does not contain It does contain only one It is similar to bivariate but it
any dependent dependent variable. contains more than 2 variables.
variable.
The main purpose The main purpose is to The main purpose is to study the
is to describe. explain. relationship among them.
The example of a The example of bivariate Example: Suppose an advertiser
univariate can be can be temperature and wants to compare the popularity of
height. ice sales in summer four advertisements on a website.
vacation.
Then their click rates could be
measured for both men and women
and relationships between variable
can be examined

Which graph to use for what purpose (Basic components of a good


chart/graph)
Purpose of Charts and Graphs
 Charts and graphs are used in business to communicate and clarify spreadsheet
information.
 Charts and graphs emphasize and categorize spreadsheet information into a format
that can be quickly and easily analyzed.
Graph:
 A graph is a feature of a chart used to plot data.
 A Graph is a pictorial representation of data.
 It includes the:
1. plot area
2. gridlines
3. and values.
 A graph is used in a chart.
Chart:

 The chart is the total package that includes the title, value labels and legend.
 A Chart is the total package that includes:
1. titles-
2. values.
3. axis labels
4. legend information
5. colors
6. and adds meaning to the graph
 A Chart is an enhancement of a graph.

Basic components of a good chart/graph

 Y-axis – the left vertical side of the graph; it contains the numerical information
 X-axis – the bottom horizontal side of the graph; it contains the category information.
 Data markers –this is used in graph to indicate data values. Data markers represent
values.
 Data Series - A collection of related values from the worksheet; one row/column on
the spreadsheet.
 Gridline – A Gridline is a horizontal or vertical line that extends across the plot area
of the graph for the purpose of adding classification to the data. It makes easier to
read and understand the values.
 Plot Area - The Plot Area is the background portion of a graph. The rectangular area
bound by the category and values axes.
 Tick marks – Tick marks is used to add clarification of the data categories.
 Legend – Legend is the object that explains the symbols, colors, or patterns used to
differentiate the data.
 Data Label – A Data label is a single value or piece of data from the data series.
 Chart Title – Describe the purpose and content.
 Axis Titles – Describe the x and y axis data.

Creating Combination Charts


A combination chart is a chart that combines two or more chart types in a single chart.
To create a combination chart, execute the following steps.

1. Select the range A1:C13.


2. On the Insert tab, in the Charts group, click the Combo symbol.

3. Click Create Custom Combo Chart.

The Insert Chart dialog box appears.

4. For the Rainy Days series, choose Clustered Column as the chart type.

5. For the Profit series, choose Line as the chart type.

6. Plot the Profit series on the secondary axis.


7. Click OK.

Result:
Discriminating Series and Category Axis
Suppose you want to project the Actual Profits made in Years 2013-2016.

Create a clustered column for this data.

As you observe, the data visualization is not effective as the years are not displayed. You can
overcome this by changing year to category.
Remove the header year in the data range.

Now, year is considered as a category and not a series. Your chart looks as follows −
We can do it in another way by customizing the data

1. Click on the chart.


2. Then chart tools will be enabled, select Chart Design. Click on select data.

3. Select data source will be appeared. Uncheck the year in the Series column and go to
edit coption of category axis.

4. Axis Labels will be appeared. Select the year data.


Click ok.

5. Now, year is considered as a category and not a series. Your chart looks as follows –

Actual profits
5000

4000

3000

2000

1000

0
2013 2014 2015 2016
Actual profits
Chart Elements and Chart Styles
Chart Elements give more descriptions to your charts, thus helping visualizing your data more
meaningfully.

 Click the Chart


Three buttons appear next to the upper-right corner of the chart −

 Chart Elements
 Chart Styles
 Chart Filters

Chart Elements
 Click Chart Elements.
 Add Axis Titiles

 Click Data Labels.


 Select Data Table

 Select trendlines. You can use Trendline to graphically display trends in data. You can
extend a Trendline in a chart beyond the actual data to predict future values.
Chart Styles

 Click Chart Styles


 Select a Style and Color that suits your data.

Data Labels

Data labels make a chart easier to understand because they show details about a data series
or its individual data points. For example, in the pie chart below, without the data labels it
would be difficult to tell that coffee was 38% of total sales. Depending on what you want to
highlight on a chart, you can add labels to one series, all the series (the whole chart), or one
data point.
Add data labels to a chart

1. Click the data series or chart. To label one data point, after clicking the series, click
that data point.

2. In the upper right corner, next to the chart, click Add Chart Element > Data
Labels.

3. To change the location, click the arrow, and choose an option.


4. If you want to show your data label inside a text bubble shape, click Data Callout.

To make data labels easier to read, you can move them inside the data points or even outside
of the chart. To move a data label, drag it to the location you want.

If you decide the labels make your chart look too cluttered, you can remove any or all of them
by clicking the data labels and then pressing Delete.
Change the look of the data labels

1. Right-click the data series or data label to display more data for, and then click Format
Data Labels.
2. Click Label Options and under Label Contains, pick the options you want.

Use cell values as data labels

You can use cell values as data labels for your chart.

1. Right-click the data series or data label to display more data for, and then click Format
Data Labels.
2. Click Label Options and under Label Contains, select the Values From Cells checkbox.
3. When the Data Label Range dialog box appears, go back to the spreadsheet and select
the range for which you want the cell values to display as data labels. When you do that,
the selected range will appear in the Data Label Range dialog box. Then click OK.
The cell values will now display as data labels in your chart.

Change the text displayed in the data labels

1. Click the data label with the text to change and then click it again, so that it's the only
data label selected.
2. Select the existing text and then type the replacement text.
3. Click anywhere outside the data label.

Quick Layout
You can use Quick Layout to change the overall layout of the chart quickly by choosing one of
the predefined layout options.

 Click the chart.


 Click the DESIGN tab under CHART TOOLS.
 Click Quick Layout.
Different possible layouts will be displayed. As you move on the layout options, the chart
layout changes to that particular option.
Select the layout based on the description you want,

Then the chart will be displayed


We can also do it for piecharts
 First select the chart
 Then in the chart tools , select quick layout

 Based on the layout description chart will be displayed


Using Pictures in Column Charts
You can create more emphasis on your data presentation by using a picture in place of
columns.
 Click on a Column on the Column Chart.
 In the Format Data Series, click on Fill.
 Select Picture.
 Under Insert picture from, provide the filename or optionally clipboard if you had
copied an image earlier.
The picture you have chosen will appear in place of columns in the chart.

We can also insert images for legend entry in pie chart

 Initially create a pie chart

 Doble click coffee in the legend, then format legend will be appeared
 In the picture source insert image or copy clipboard

Then we can only change the part of coffee into image.

You might also like