DV Unit-I
DV Unit-I
Unit-I – Introduction
What is Data Visualization?
Data visualization is the graphical representation of information and data in a pictorial or
graphical format (Example: charts, graphs, and maps). Data visualization tools provide an
accessible to what to see and understand trends, patterns in data and outliers. Data
visualization tools and technologies are essential to analyzing massive amounts of
information and making data-driven decisions. The concept of using pictures is to understand
data that has been used for centuries.
Data visualization is the one of the steps of the Data Science process, which states that after
data has been collected, processed and modeled, it must be visualized for conclusions to be
made.
1. Numerical Data:
Numerical data is also known as Quantitative data. Numerical data is any data where data
generally represents amount such as height, weight, age of a person, etc. Numerical data
visualization is easiest way to visualize data. It is generally used for helping others to digest
large data sets and raw numbers in a way that makes it easier to interpret into action.
Numerical data is categorized into two categories:
Continuous Data – It can be narrowed or categorized (Example: Height
measurements).
Discrete Data – This type of data is not “continuous” (Example: Number of cars or
children’s a household has).
The type of visualization techniques that are used to represent numerical data visualization is
Charts and Numerical Values. Examples are Pie Charts, Bar Charts, Averages, Scorecards, etc.
2. Categorical Data:
Categorical data is also known as Qualitative data. Categorical data is any data where data
generally represents groups. It simply consists of categorical variables that are used to
represent characteristics such as a person’s ranking, a person’s gender, etc. Categorical data
visualization is all about depicting key themes, establishing connections, and lending context.
Categorical data is classified into three categories:
Binary Data – In this, classification is based on positioning (Example: Agrees or
Disagrees).
Nominal Data – In this, classification is based on attributes (Example: Male or Female).
It is not measureable
It won’t follow any order
It is not equidistant to each other
It has no meaningful zero
Ordinal Data – In this, classification is based on ordering of information (Example:
Timeline or processes).
It is not measureable
It will follow an order
It is not equidistant to each other
It has no meaningful zero
The type of visualization techniques that are used to represent categorical data is Graphics,
Diagrams, and Flowcharts. Examples are Word clouds, Sentiment Mapping, Venn Diagram,
etc.
Why Visualization?
1. Data Visualization Discovers the Trends in Data:
The most important thing that data visualization does is discovering the trends in data. After
all, it is much easier to observe data trends when all the data is laid out in front of you in a
visual form as compared to data in a table. For example, The below screenshot on Tableau
demonstrates the sum of sales made by each customer in descending order. However, the
color red denotes loss while grey denotes profits. So, it is very easy to observe from this
visualization that even though some customers may have huge sales, they are still in the loss.
This would be very difficult to observe from a table.
Suppose that the heights of seven students of a class is recorded, there is only one variable
that is height and it is not dealing with any cause or relationship. The description of patterns
found in this type of data can be made by drawing conclusions using central tendency
measures (mean, median and mode), dispersion or spread of data (range, minimum,
maximum, quartiles, variance and standard deviation).
Univariate analysis is conducted through several ways which are mostly descriptive in
nature –
Frequency Distribution Tables
Histograms
Frequency Polygons
Pie Charts
Bar Charts
2. Bivariate data:
This type of data involves two different variables. The analysis of this type of data deals with
causes and relationships and the analysis is done to find out the relationship among the two
variables. Example of bivariate data can be temperature and ice cream sales in summer
season.
Suppose the temperature and ice cream sales are the two variables of a bivariate data. Here,
the relationship is visible from the table that temperature and sales are directly proportional
to each other and thus related because as the temperature increases, the sales also increase.
Thus, bivariate data analysis involves comparisons, relationships, causes and explanations.
These variables are often plotted on X and Y axis on the graph for better understanding of
data and one of these variables is independent while the other is dependent.
3. Multivariate data:
When the data involves three or more variables, it is categorized under multivariate. Example
of this type of data is suppose an advertiser wants to compare the popularity of four
advertisements on a website, then their click rates could be measured for both men and
women and relationships between variables can then be examined. It is similar to bivariate
but contains more than one dependent variable. The ways to perform analysis on this data
depends on the goals to be achieved. Some of the techniques are regression analysis, path
analysis, factor analysis and multivariate analysis of variance (MANOVA).
The chart is the total package that includes the title, value labels and legend.
A Chart is the total package that includes:
1. titles-
2. values.
3. axis labels
4. legend information
5. colors
6. and adds meaning to the graph
A Chart is an enhancement of a graph.
Y-axis – the left vertical side of the graph; it contains the numerical information
X-axis – the bottom horizontal side of the graph; it contains the category information.
Data markers –this is used in graph to indicate data values. Data markers represent
values.
Data Series - A collection of related values from the worksheet; one row/column on
the spreadsheet.
Gridline – A Gridline is a horizontal or vertical line that extends across the plot area
of the graph for the purpose of adding classification to the data. It makes easier to
read and understand the values.
Plot Area - The Plot Area is the background portion of a graph. The rectangular area
bound by the category and values axes.
Tick marks – Tick marks is used to add clarification of the data categories.
Legend – Legend is the object that explains the symbols, colors, or patterns used to
differentiate the data.
Data Label – A Data label is a single value or piece of data from the data series.
Chart Title – Describe the purpose and content.
Axis Titles – Describe the x and y axis data.
4. For the Rainy Days series, choose Clustered Column as the chart type.
Result:
Discriminating Series and Category Axis
Suppose you want to project the Actual Profits made in Years 2013-2016.
As you observe, the data visualization is not effective as the years are not displayed. You can
overcome this by changing year to category.
Remove the header year in the data range.
Now, year is considered as a category and not a series. Your chart looks as follows −
We can do it in another way by customizing the data
3. Select data source will be appeared. Uncheck the year in the Series column and go to
edit coption of category axis.
5. Now, year is considered as a category and not a series. Your chart looks as follows –
Actual profits
5000
4000
3000
2000
1000
0
2013 2014 2015 2016
Actual profits
Chart Elements and Chart Styles
Chart Elements give more descriptions to your charts, thus helping visualizing your data more
meaningfully.
Chart Elements
Chart Styles
Chart Filters
Chart Elements
Click Chart Elements.
Add Axis Titiles
Select trendlines. You can use Trendline to graphically display trends in data. You can
extend a Trendline in a chart beyond the actual data to predict future values.
Chart Styles
Data Labels
Data labels make a chart easier to understand because they show details about a data series
or its individual data points. For example, in the pie chart below, without the data labels it
would be difficult to tell that coffee was 38% of total sales. Depending on what you want to
highlight on a chart, you can add labels to one series, all the series (the whole chart), or one
data point.
Add data labels to a chart
1. Click the data series or chart. To label one data point, after clicking the series, click
that data point.
2. In the upper right corner, next to the chart, click Add Chart Element > Data
Labels.
To make data labels easier to read, you can move them inside the data points or even outside
of the chart. To move a data label, drag it to the location you want.
If you decide the labels make your chart look too cluttered, you can remove any or all of them
by clicking the data labels and then pressing Delete.
Change the look of the data labels
1. Right-click the data series or data label to display more data for, and then click Format
Data Labels.
2. Click Label Options and under Label Contains, pick the options you want.
You can use cell values as data labels for your chart.
1. Right-click the data series or data label to display more data for, and then click Format
Data Labels.
2. Click Label Options and under Label Contains, select the Values From Cells checkbox.
3. When the Data Label Range dialog box appears, go back to the spreadsheet and select
the range for which you want the cell values to display as data labels. When you do that,
the selected range will appear in the Data Label Range dialog box. Then click OK.
The cell values will now display as data labels in your chart.
1. Click the data label with the text to change and then click it again, so that it's the only
data label selected.
2. Select the existing text and then type the replacement text.
3. Click anywhere outside the data label.
Quick Layout
You can use Quick Layout to change the overall layout of the chart quickly by choosing one of
the predefined layout options.
Doble click coffee in the legend, then format legend will be appeared
In the picture source insert image or copy clipboard