0% found this document useful (0 votes)

177 views

Unit-4 DS

Data visualization is the graphical representation of information and data using visual elements like charts, graphs, and maps to see trends, outliers, and patterns. It provides an accessible way to present data to non-technical audiences. The main types of data visualization include charts, graphs, and maps such as line charts, bar graphs, and heat maps which organize large quantities of information visually. Data encoding is the process of converting data into a format for information processing like data transmission, storage, and application processing. Visual encodings map data to visual structures like position on an x and y axis or color, size, and shape known as retinal variables that humans are sensitive to in order to visually represent differences, similarities, and relationships in data.

Uploaded by

rajkumarmtech

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

177 views

Unit-4 DS

Uploaded by

rajkumarmtech

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

1.

Data Visualization Introduction

Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way to
see and understand trends, outliers, and patterns in data. Additionally, it provides an excellent
way for employees or business owners to present data to non-technical audiences without
confusion.
In the world of Big Data, data visualization tools and technologies are essential to analyze
massive amounts of information and make data-driven decisions.
Types of Data Visualization

 Chart: Information presented in a tabular, graphical form with data displayed along
two axes. Can be in the form of a graph, diagram, or map. Learn more.
 Table: A set of figures displayed in rows and columns. Learn more.
 Graph: A diagram of points, lines, segments, curves, or areas that represents certain
variables in comparison to each other, usually along two axes at a right angle.
 Geospatial: A visualization that shows data in map form using different shapes and
colors to show the relationship between pieces of data and specific locations. Learn
more.
 Infographic: A combination of visuals and words that represent data. Usually uses
charts or diagrams.
 Dashboards: A collection of visualizations and data displayed in one place to help
with analyzing and presenting data. Learn more.

2. Data for Visualization

What are the main types of data visualization?
The main types of data visualization include charts, graphs and maps in the form of line
charts, bar graphs, tree charts, dual-axis charts, mind maps, funnel charts and heatmaps.
While the main types of data visualization each offer a different approach to organizing large
quantities of complex information into visuals, all of them are designed to make large
datasets easier to present, understand and interpret.
3.Data Types

 The data type to be visualized

o One-dimensional data, such as temporal (time-series) data
o Two-dimensional data, such as geographical maps
o Multi-dimensional data, such as relational tables
o Text and hypertext, such as news articles and Web documents
o Hierarchies and graphs, such as telephone caUs and Web documents
o Algorithms and software, such as debugging operations

Types of data visualization

There are two basic types of data visualization: static and interactive.

Static visualizations: are something like an infographic, a single keyhole view of a

particular data story.

Interactive visualizations: allow you to customize your story by moving a slider or clicking
a button to enable various views of the dataset.

Examples of the main types of data visualization include:

Bar graph: also called a column graph, these types of data visualization offer numerical
values expressed in bars or rectangles of equal width. Bar graphs are used to expose large
changes over time and easily summarize large data sets.

Line charts: these types of data visualization involve connecting plotted data points with
lines to show trends over time and compare different data points. Line charts are useful
whenever you’re continuously tracking data and need to visually demonstrate trends detected
in large datasets over the course of a marketing campaign.
Dual-axis charts: these types of data visualization are used to show comparisons and offer
an easy way to see the relationships or trends between datasets. Dual-axis charts combine
visual elements such as those of a bar graph and line chart to compare sets of data accurately,
efficiently and without needing to use two separate data visualizations to show trends or draw
connections.

4.Data Encoding
What do you mean by data encoding?
Encoding is the process of converting data into a format required for a number of information
processing needs, including: Program compiling and execution. Data transmission, storage
and compression/decompression. Application data processing, such as file conversion.

5.Retinal Variables

Retinal variables

Visual implantations need retinal variables to be encoded, and retinal variables take visual
parameters. For example, a point visual implantation can be encoded using the shape of a
hollow circle and the colour blue. A line can be encoded using a solid pattern of thick size
and green color. An area can be encoded using a 20% transparent red colour and thin line
borders.

Bertin (1967) describes visual implantations as dimensionless elements with underlying

coordinates that necessitate the encoding of retinal variables in order to become visually
informative. He identifies six fundamental retinal variables:

1. Colour hue (e.g. blue, green, magenta).

2. Colour value (lightness vs. darkness).
3. Size (e.g. large, small, thick, thin).
4. Shape (e.g. circle, rectangle, diamond).
5. Orientation (e.g. angle, degrees).
6. Texture (e.g. dashed lines, polka dots).
The eye is independently sensitive to these retinal variables, which means that more than one
retinal variables can be deployed at the same time in order to encode different variation in the
data.

The following is a figure from Bertin (1967) describing the implementation of retinal
variables in conjunction with visual implantations:

Retinal variables encode visual implantations (points, lines, areas) and can be used to
represent differences (≠), similarities (≡), a quantified order (Q), or a qualitative order (O).
6.Mapping Variables to Encodings

One-Hot Encoding :

In One-Hot Encoding, each category of any categorical variable gets a new variable. It maps
each category with binary numbers (0 or 1). This type of encoding is used when the data is
nominal. Newly created binary features can be considered dummy variables. After one hot
encoding, the number of dummy variables depends on the number of categories presented in
the data.

The way to achieve this in python is illustrated below.

df=pd.DataFrame({'name':['rahul','ashok','ankit','aditya','yash','vipin','amit']})

encoder=ce.OneHotEncoder(cols='name',handle_unknown='return_nan',return_df=True,use_
cat_names=True)

#Original Data
print(df)

#Fit and transform Data

df_encoded = encoder.fit_transform(df)
print(df_encoded)
Output:

Here in the above output, we can see dummy variables for every category.

8.Visual Encodings

The visual encoding is the way in which data is mapped into visual structures, upon which we
build the images on a screen.
There are two types of visual encoding variables: planar and retinal. Humans are sensitive
to the retinal variables. They easily differentiate between various colors, shapes, sizes and
other properties. Retinal variables were introduced by Bertin (→) about 40 years ago, and this
concept has become quite popular recently. While there’s some critique about the
effectiveness of retinal variables (→), most specialists find them useful.
The goal of this article is to provide an engaging introduction to visual encoding, and to give
some hands-on examples of how it helps to present data in a meaningful way.

Data types
We’ll start with some complex things: data types (→). There are three basic types of data:
something you can count, something you can order and something you can just differentiate.
As often is the case, these types get down to three un-intuitive terms:
Quantitative
Anything that has exact numbers.
Forexample,Effortinpoints:0,1,2,3,5,8,13.
Duration in days: 1, 4, 666.
Ordered / Qualitative
Anything that can be compared and ordered.
UserStoryPriority:MustHave,Great,Good,NotSure. Bug Severity: Blocking, Average, Who
Cares.
Categorical
Everything else.
Entitytypes:Bugs,Stories,Features,TestCases.
Fruits: Apples, Oranges, Plums.

Planar and Retinal Variables

OK, we’ve got some data. Now how do we present it? We have several visual encoding
variables.
X and Y
Planar variables are known to everybody. If you’ve studied maths (which I’m sure you’d
have), you’ve been drawing graphs across the X- and Y-axis. Planar variables work for any
data type. They work great to present any quantitative data. It’s a pity that we have to deal
with the flat screens and just two planar variables. Well, we can try to use Z-axis, but 3D
charts look horrible on screen in 95.8% of cases.
So what should we do then to present three or more variables? We can use the retinal
variables!

Size
We know that size does matter. You can see the difference right away. Small is innocuous,
large is dangerous perhaps. Size is a good visualizer for the quantitative data.

Texture
Texture is less common. You can’t touch it on screen, and it’s usually less catchy than color.
So, in theory texture can be used for soft encoding, but in practice it’s better to pass on it.
Shape
Round circles ○, edgy stars ☆, solid rectangles █. We can easily distinguish dozens of
shapes. They do work well sometimes for the visual encoding of categories.

Orientation

Orientation is tricky.

While we’re able to clearly identify vertical vs. horizontal lines, it is harder to use it properly
for visual encoding.

Color Value

Any color value can be moved over a scale. Greyscale is a good example. While we can’t be
certain that#999 color is lighter than #888, still it’s a helpful technique to visualize the
ordered data.

Color Hue
Red color is alarming. Green color is calm. Blue color is peaceful. Colors are great to
separate categories.
Color in More Detail
Color is the most interesting variable, let’s dig into some details here. There are three
different scales that we can use with color. We’ve already mentioned two of them: the
categorical scale (color hue) and the sequential scale (color value).
Diverging scale is somewhat new. It encodes positive and negative values, e.g. temperatures
in range of -50 to +50 C. It would be a mistake to use any other color scales for that.

There are six primary colors:

The general rule of thumb is that you can use no more than a dozen colors to encode
categories effectively. If there’s more, it’d be hard to differentiate between categories
quickly. These are the most commonly used colors:
“Avoiding catastrophe becomes the ﬁrst principle in bringing color to information: Above all, do no
harm.”—Tufte

The next obvious question is:

How to Apply the Retinal Variables to Data?

It is quite clear that we can’t use all variables to present any data types. For example, it is
wrong to use color to represent numbers (1, 2, 3). And it is bad to use size to represent
various currencies (€, £ , ¥). Why on Earth should small circles stand for euro, and large
circles for pounds?

Here’s the retinal variables usage summary:

Note that planar variables can be applied to all the data types. Indeed, we can use the X-axis
for categories, ordered variables or numbers.
The Basic Example
OK, now let’s tap on some techniques to visualize real data. Sample data is very simple, we
just want to visualize quantity of items:

Item Type Quantity

Features 3

Bugs 5

User Stories 6

We have just two variables:

Item Types (Categorical) and Items Quantity (well, Quantitative). All the possible choices are based
on the table above:

Item Types Orientation

Color
Shape
Texture
X (or Y)

Item Orientation
Quantity Size
Value
X (or Y)

In theory, you can mix these variables as you wish. I’m going to try four combinations.
Shape + Value

Hmm, looks like a puzzle. Value doesn’t work for the quantitative data, it seems. Let’s try
something else!
Color + Size

Well, slightly better. The color coding works for entity types. For example, in TargetProcess
we’ve got green Features, red Bugs and blue User Stories. Still not very good.
A very simple rule in visualizations is to never map scalar data to circle radii. Humans do
better in comparing relative areas, so if you want to map data to a shape, you have to map it
to it’s area. (→)
Texture + Y

Almost great. But why this legend with texture? Can we just remove it? Yes! Let’s use the X
and Y planar variables.
X+Y

Now we have the best result! It turned out that X+Y works great for a simple data set with
just two variables. So, there’s no need to use retinal variables at all.
Retinal variables should be used if you need to present three or more data sources.

The Four Variables Example

Three is quite trivial, so we’ll take four variables. Say, we have bugs, stories, and tasks and
we want to visualize some properties of these entities:
 Types
 Priority
 Average Effort in Points
 Average Cycle Time in Days (→)
Here is our data:

Type Priority Average Effort Average Cycle Time

Features Must Have 30 40

Features Good 20 40

Features Nice to Have 15 20

Bugs Fix ASAP 2 2

Bugs Fix 2 8

Bugs Fix if Time 5 12

User Must Have 8 10

Stories

User Good 5 7
Stories

User Nice to Have 8 7

Stories

We need to pick four variables. Surely, there’re other choices, but here’s what I’ve selected:

Variable Type Encoding

Entity Type Categorical Color Hue

Priority Ordered Color Value

Average Effort in Points Quantitative X

Average Cycle Time in Days Quantitative Y

Now it’s easy to draw the chart. The important bugs are shown in deep red, the unimportant
ones — in light red. The same pattern applies to features and user stories
What can we say about this chart? Here are some useful observations:
 Bugs are usually are smaller than user stories, and features are the largest entities.
 Important bugs are small and get fixed quickly.
 Important features are the largest, and it takes more time to release them (interesting
information, by the way!).
 Unimportant bugs are the largest, and it takes longer to fix them.
 There’s a good correlation between effort and cycle time: it takes more time to deliver
large entities.
Of course, you can get the same info from the plain table above, but the chart is much more
fun to explore.

Data Analytics New Quantum AKTU
No ratings yet
Data Analytics New Quantum AKTU
210 pages
Unit 2 Foundations For Visualization
No ratings yet
Unit 2 Foundations For Visualization
25 pages
Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
Syllabus-Topics in Computer Vision
100% (1)
Syllabus-Topics in Computer Vision
5 pages
Unit-6: Data Visualization and Hadoop
No ratings yet
Unit-6: Data Visualization and Hadoop
96 pages
What Is Data Visualization UNIT-V
No ratings yet
What Is Data Visualization UNIT-V
24 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Unit-1 Data Visualization Notes
No ratings yet
Unit-1 Data Visualization Notes
15 pages
ccs346 Eda
No ratings yet
ccs346 Eda
2 pages
Phase 1 Project Report
No ratings yet
Phase 1 Project Report
44 pages
Module - 1 IDS
100% (1)
Module - 1 IDS
19 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
1st Unit Notes
No ratings yet
1st Unit Notes
22 pages
L-2.9 Hmac Cmac
No ratings yet
L-2.9 Hmac Cmac
14 pages
Data Generalization
No ratings yet
Data Generalization
3 pages
Big Data Unit 1 AKTU Notes
No ratings yet
Big Data Unit 1 AKTU Notes
87 pages
CS8091 Bigdata Analytics Lessonplan With Date
No ratings yet
CS8091 Bigdata Analytics Lessonplan With Date
11 pages
Indian Traditions, Cultural and Society Quantum
No ratings yet
Indian Traditions, Cultural and Society Quantum
143 pages
IVA Question Bank
No ratings yet
IVA Question Bank
8 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
Data Visualization PDF
No ratings yet
Data Visualization PDF
3 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
2 Da
100% (1)
2 Da
17 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
2.notes CS8080 - Information Retrieval Technique
No ratings yet
2.notes CS8080 - Information Retrieval Technique
164 pages
IT6006 Data Analytics
No ratings yet
IT6006 Data Analytics
12 pages
Unit 1 PPT
No ratings yet
Unit 1 PPT
72 pages
Data Science Handwritten Notes
No ratings yet
Data Science Handwritten Notes
44 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
BCA-404: Data Mining and Data Ware Housing
No ratings yet
BCA-404: Data Mining and Data Ware Housing
19 pages
Dsbda Unit 1
No ratings yet
Dsbda Unit 1
119 pages
Unit 3
No ratings yet
Unit 3
24 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
Data Mining PDF
No ratings yet
Data Mining PDF
67 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
Big Data Analytics
No ratings yet
Big Data Analytics
96 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Unit II Data Analytics
No ratings yet
Unit II Data Analytics
17 pages
DEEP LEARNING (Previous Question Papers)
No ratings yet
DEEP LEARNING (Previous Question Papers)
3 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Ethical Hacking PDF
No ratings yet
Ethical Hacking PDF
1 page
DSMP 1.0 CampusX Data Science Mentorship Program
No ratings yet
DSMP 1.0 CampusX Data Science Mentorship Program
14 pages
ML Unit-3 ppt
No ratings yet
ML Unit-3 ppt
92 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
M.Tech JNTUK ADS UNIT-2
100% (1)
M.Tech JNTUK ADS UNIT-2
20 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
50 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
20 431 Internship PPT Final
No ratings yet
20 431 Internship PPT Final
19 pages
Unit 2 Fod
No ratings yet
Unit 2 Fod
27 pages
Irs Question Papers
No ratings yet
Irs Question Papers
6 pages
Practical 3 ANN
No ratings yet
Practical 3 ANN
3 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Data Visualization New
No ratings yet
Data Visualization New
103 pages
Chapter 1 (CCNA1 - Module 1) : Introduction To Networking
No ratings yet
Chapter 1 (CCNA1 - Module 1) : Introduction To Networking
10 pages
Schedule 1-Bx
No ratings yet
Schedule 1-Bx
9 pages
Passive Devices (Loads) : Q Is Electric Charge in Tistimein I Is Electric Current in V Is Electric Potential or Voltage in
No ratings yet
Passive Devices (Loads) : Q Is Electric Charge in Tistimein I Is Electric Current in V Is Electric Potential or Voltage in
2 pages
5800DW Manual Guide V2.4
No ratings yet
5800DW Manual Guide V2.4
28 pages
Authorization Letter Converge 1
No ratings yet
Authorization Letter Converge 1
3 pages
Autonomousnetworkand AIweb
No ratings yet
Autonomousnetworkand AIweb
32 pages
1 BA Kappa 320 EN
No ratings yet
1 BA Kappa 320 EN
122 pages
Radheshyam Soni Full Stack Developer
No ratings yet
Radheshyam Soni Full Stack Developer
1 page
Intelli Vibel
No ratings yet
Intelli Vibel
20 pages
SAP Web IDE Personal Edition Setup and Create SAPUI5 Application For Beginner
No ratings yet
SAP Web IDE Personal Edition Setup and Create SAPUI5 Application For Beginner
13 pages
Christian Algorithm
No ratings yet
Christian Algorithm
8 pages
Piccolo 104 R
No ratings yet
Piccolo 104 R
7 pages
Intelligent Model To Predict Early Liver Disease Using Machine Learning Technique
No ratings yet
Intelligent Model To Predict Early Liver Disease Using Machine Learning Technique
5 pages
Honda Supra X 125 Helm in Fi PDF
100% (1)
Honda Supra X 125 Helm in Fi PDF
3 pages
SQL - ORA-01748 - Only Simple Column Names Allowed Here in Oracle - Stack Overflow
No ratings yet
SQL - ORA-01748 - Only Simple Column Names Allowed Here in Oracle - Stack Overflow
3 pages
Service Manual: CX-JE3
No ratings yet
Service Manual: CX-JE3
64 pages
END Sem Time Table April-May 2024-25 Even Sem_removed (1) (1)
No ratings yet
END Sem Time Table April-May 2024-25 Even Sem_removed (1) (1)
3 pages
Practice Assignment
No ratings yet
Practice Assignment
4 pages
Deprag Vijacniki z Elektricnim Pogonom
No ratings yet
Deprag Vijacniki z Elektricnim Pogonom
32 pages
Modbus and DNP Comparison
No ratings yet
Modbus and DNP Comparison
4 pages
National Board For Technical Education National Diploma (ND) IN Computer Science Curriculum and Course Specifications November 2004
No ratings yet
National Board For Technical Education National Diploma (ND) IN Computer Science Curriculum and Course Specifications November 2004
238 pages
Operating Instructions For Omar Lift Hydraulic Components: Uni en Iso 9001 N. 9102.OMA3
No ratings yet
Operating Instructions For Omar Lift Hydraulic Components: Uni en Iso 9001 N. 9102.OMA3
88 pages
Practical Work 4 - CMOS + Rubrics PDF
No ratings yet
Practical Work 4 - CMOS + Rubrics PDF
23 pages
ZX110to330 ELEC E PDF
100% (1)
ZX110to330 ELEC E PDF
1 page
Procedure: Linux Enterprise Desktop 11', Enter 'E', Enter The Corresponding 'Start Boot' Menu
No ratings yet
Procedure: Linux Enterprise Desktop 11', Enter 'E', Enter The Corresponding 'Start Boot' Menu
5 pages
Mad Front
No ratings yet
Mad Front
3 pages
AWS Certified Solutions Architect - Associate SAA-C03 Exam - Free Exam Q&As, Page 40 _ ExamTopics
No ratings yet
AWS Certified Solutions Architect - Associate SAA-C03 Exam - Free Exam Q&As, Page 40 _ ExamTopics
6 pages
Computer Science 1 PDF
No ratings yet
Computer Science 1 PDF
5 pages
Smart Office Proposal
No ratings yet
Smart Office Proposal
5 pages
M Tech Thesis Mechanical Engg
100% (2)
M Tech Thesis Mechanical Engg
7 pages

Unit-4 DS

Uploaded by

Unit-4 DS

Uploaded by

1.

Data Visualization Introduction

2. Data for Visualization

 The data type to be visualized

Types of data visualization

Static visualizations: are something like an infographic, a single keyhole view of a

Examples of the main types of data visualization include:

Bertin (1967) describes visual implantations as dimensionless elements with underlying

1. Colour hue (e.g. blue, green, magenta).

The way to achieve this in python is illustrated below.

#Fit and transform Data

Planar and Retinal Variables

There are six primary colors:

The next obvious question is:

How to Apply the Retinal Variables to Data?

Here’s the retinal variables usage summary:

Item Type Quantity

We have just two variables:

Item Types Orientation

The Four Variables Example

Type Priority Average Effort Average Cycle Time

Features Must Have 30 40

Features Nice to Have 15 20

Bugs Fix ASAP 2 2

Bugs Fix if Time 5 12

User Must Have 8 10

User Nice to Have 8 7

Variable Type Encoding

Entity Type Categorical Color Hue

Priority Ordered Color Value

Average Effort in Points Quantitative X

Average Cycle Time in Days Quantitative Y

You might also like