Unit-4 DS
Unit-4 DS
Chart: Information presented in a tabular, graphical form with data displayed along
two axes. Can be in the form of a graph, diagram, or map. Learn more.
Table: A set of figures displayed in rows and columns. Learn more.
Graph: A diagram of points, lines, segments, curves, or areas that represents certain
variables in comparison to each other, usually along two axes at a right angle.
Geospatial: A visualization that shows data in map form using different shapes and
colors to show the relationship between pieces of data and specific locations. Learn
more.
Infographic: A combination of visuals and words that represent data. Usually uses
charts or diagrams.
Dashboards: A collection of visualizations and data displayed in one place to help
with analyzing and presenting data. Learn more.
There are two basic types of data visualization: static and interactive.
Interactive visualizations: allow you to customize your story by moving a slider or clicking
a button to enable various views of the dataset.
Bar graph: also called a column graph, these types of data visualization offer numerical
values expressed in bars or rectangles of equal width. Bar graphs are used to expose large
changes over time and easily summarize large data sets.
Line charts: these types of data visualization involve connecting plotted data points with
lines to show trends over time and compare different data points. Line charts are useful
whenever you’re continuously tracking data and need to visually demonstrate trends detected
in large datasets over the course of a marketing campaign.
Dual-axis charts: these types of data visualization are used to show comparisons and offer
an easy way to see the relationships or trends between datasets. Dual-axis charts combine
visual elements such as those of a bar graph and line chart to compare sets of data accurately,
efficiently and without needing to use two separate data visualizations to show trends or draw
connections.
4.Data Encoding
What do you mean by data encoding?
Encoding is the process of converting data into a format required for a number of information
processing needs, including: Program compiling and execution. Data transmission, storage
and compression/decompression. Application data processing, such as file conversion.
5.Retinal Variables
Retinal variables
Visual implantations need retinal variables to be encoded, and retinal variables take visual
parameters. For example, a point visual implantation can be encoded using the shape of a
hollow circle and the colour blue. A line can be encoded using a solid pattern of thick size
and green color. An area can be encoded using a 20% transparent red colour and thin line
borders.
The following is a figure from Bertin (1967) describing the implementation of retinal
variables in conjunction with visual implantations:
Retinal variables encode visual implantations (points, lines, areas) and can be used to
represent differences (≠), similarities (≡), a quantified order (Q), or a qualitative order (O).
6.Mapping Variables to Encodings
One-Hot Encoding :
In One-Hot Encoding, each category of any categorical variable gets a new variable. It maps
each category with binary numbers (0 or 1). This type of encoding is used when the data is
nominal. Newly created binary features can be considered dummy variables. After one hot
encoding, the number of dummy variables depends on the number of categories presented in
the data.
df=pd.DataFrame({'name':['rahul','ashok','ankit','aditya','yash','vipin','amit']})
encoder=ce.OneHotEncoder(cols='name',handle_unknown='return_nan',return_df=True,use_
cat_names=True)
#Original Data
print(df)
Here in the above output, we can see dummy variables for every category.
8.Visual Encodings
The visual encoding is the way in which data is mapped into visual structures, upon which we
build the images on a screen.
There are two types of visual encoding variables: planar and retinal. Humans are sensitive
to the retinal variables. They easily differentiate between various colors, shapes, sizes and
other properties. Retinal variables were introduced by Bertin (→) about 40 years ago, and this
concept has become quite popular recently. While there’s some critique about the
effectiveness of retinal variables (→), most specialists find them useful.
The goal of this article is to provide an engaging introduction to visual encoding, and to give
some hands-on examples of how it helps to present data in a meaningful way.
Data types
We’ll start with some complex things: data types (→). There are three basic types of data:
something you can count, something you can order and something you can just differentiate.
As often is the case, these types get down to three un-intuitive terms:
Quantitative
Anything that has exact numbers.
Forexample,Effortinpoints:0,1,2,3,5,8,13.
Duration in days: 1, 4, 666.
Ordered / Qualitative
Anything that can be compared and ordered.
UserStoryPriority:MustHave,Great,Good,NotSure. Bug Severity: Blocking, Average, Who
Cares.
Categorical
Everything else.
Entitytypes:Bugs,Stories,Features,TestCases.
Fruits: Apples, Oranges, Plums.
Size
We know that size does matter. You can see the difference right away. Small is innocuous,
large is dangerous perhaps. Size is a good visualizer for the quantitative data.
Texture
Texture is less common. You can’t touch it on screen, and it’s usually less catchy than color.
So, in theory texture can be used for soft encoding, but in practice it’s better to pass on it.
Shape
Round circles ○, edgy stars ☆, solid rectangles █. We can easily distinguish dozens of
shapes. They do work well sometimes for the visual encoding of categories.
Orientation
Orientation is tricky.
While we’re able to clearly identify vertical vs. horizontal lines, it is harder to use it properly
for visual encoding.
Color Value
Any color value can be moved over a scale. Greyscale is a good example. While we can’t be
certain that#999 color is lighter than #888, still it’s a helpful technique to visualize the
ordered data.
Color Hue
Red color is alarming. Green color is calm. Blue color is peaceful. Colors are great to
separate categories.
Color in More Detail
Color is the most interesting variable, let’s dig into some details here. There are three
different scales that we can use with color. We’ve already mentioned two of them: the
categorical scale (color hue) and the sequential scale (color value).
Diverging scale is somewhat new. It encodes positive and negative values, e.g. temperatures
in range of -50 to +50 C. It would be a mistake to use any other color scales for that.
The general rule of thumb is that you can use no more than a dozen colors to encode
categories effectively. If there’s more, it’d be hard to differentiate between categories
quickly. These are the most commonly used colors:
“Avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no
harm.”—Tufte
Features 3
Bugs 5
User Stories 6
Item Types (Categorical) and Items Quantity (well, Quantitative). All the possible choices are based
on the table above:
Item Orientation
Quantity Size
Value
X (or Y)
In theory, you can mix these variables as you wish. I’m going to try four combinations.
Shape + Value
Hmm, looks like a puzzle. Value doesn’t work for the quantitative data, it seems. Let’s try
something else!
Color + Size
Well, slightly better. The color coding works for entity types. For example, in TargetProcess
we’ve got green Features, red Bugs and blue User Stories. Still not very good.
A very simple rule in visualizations is to never map scalar data to circle radii. Humans do
better in comparing relative areas, so if you want to map data to a shape, you have to map it
to it’s area. (→)
Texture + Y
Almost great. But why this legend with texture? Can we just remove it? Yes! Let’s use the X
and Y planar variables.
X+Y
Now we have the best result! It turned out that X+Y works great for a simple data set with
just two variables. So, there’s no need to use retinal variables at all.
Retinal variables should be used if you need to present three or more data sources.
Features Good 20 40
Bugs Fix 2 8
User Good 5 7
Stories
We need to pick four variables. Surely, there’re other choices, but here’s what I’ve selected:
Now it’s easy to draw the chart. The important bugs are shown in deep red, the unimportant
ones — in light red. The same pattern applies to features and user stories
What can we say about this chart? Here are some useful observations:
Bugs are usually are smaller than user stories, and features are the largest entities.
Important bugs are small and get fixed quickly.
Important features are the largest, and it takes more time to release them (interesting
information, by the way!).
Unimportant bugs are the largest, and it takes longer to fix them.
There’s a good correlation between effort and cycle time: it takes more time to deliver
large entities.
Of course, you can get the same info from the plain table above, but the chart is much more
fun to explore.