Analysing Patterns and Story Telling - Shahrukh
Analysing Patterns and Story Telling - Shahrukh
2
Session Agendas
3
4
1. Unknown Result
- The number itself is being unknown and of significance is of
use Eg: Zimbabwe has a population of this much
1. Surprising Extreme
- Extreme values that were ideally not expected
- Largest and Smallest values
Eg: Aam Aadmi party has the highest number of criminal cases against
them
upgrad.com
5
3. Surprising Comparison
- The comparison is unusual or surprising
Eg: Lucknow’s mortality rate is five times higher than the national mortality rate
4. Significant Outliers
- Unusual form of extreme; where the result is far more than what you
would normally expect
Eg: Mumbai’s call drop rate is five times higher than any city nearby
5. Abnormal Distribution
- An unusual pattern in a distribution that we did not expect
Eg: the number of students scoring 50,60,70 etc is much higher than students
scoring 49,59,69 etc (Teacher could be rounding off the scores)
upgrad.com
6
Analysis Approach
The five patterns are sufficient in categorising the insights, now we need a a few
techniques to analyse your data so that you can apply the five patterns to them
and
start generating insights.
These analyses can be categorised into two types: exploratory data analysis and
hypothesis-driven analysis.
upgrad.com
7
Statistical models often take a sample of the original data and infer from it the behaviour of the
entire population
Eg: The population of a country in a dataset could be made available in a separate sheet
and you can perform a VLOOKUP to create that column in the original dataset.
2. Calculations: You can perform a variety of calculations using the numeric columns in
your dataset.
Eg: If we have a dataset on suicide information, we can calculate the Suicide Rate (%)
from the Suicides and the Population columns.
upgrad.com
8
3. Binning: This process essentially bins a given numeric column to specific categories.
Eg: We can convert the Suicide Rate into specific bins and categorised them as High, Medium or Low.
4. Business-Specific Metrics: This part would be specific to your domain and hence the metrics or
KPIs that you might be using would be a useful additional column.
In machine learning models you run algorithms on a set of predefined data called "train data" to
formulate the model and then run it again on another set of data called "test data" to test the model's
accuracy and precision.
1. Classification
2. Clustering
3. Time Series Analysis
4. Feature Extraction
5. Sentiment Analysis
upgrad.com
9
Summarising rows:
1. Summarising the Numeric Columns Eg:
2 Grouping Eg:
upgrad.com
10
4. Machine Learning
- Apply a model and the model itself gives us some additional information
Eg: Predict the sales of one product on the basis of sales of three other products and
calculate their importance.
upgrad.com
11
upgrad.com
12
Derive new
columns
Summarise the
Rows
upgrad.com
13
upgrad.com
14
1.You save time: You get to the point immediately thereby not wasting people’s time. This
is crucial because executives have a limited time-span to devote to anyone and hence it is
important that you feed them information that they need to know. If required, do drill down
to the finer details.
2.Your message is concise: Similar to the newspaper headlines, the structure of the
insight should be such that you are able to pass on the most useful information in the
most economical way possible and the pyramid principle helps you achieve just that.
upgrad.com
15
● TIME
● PLACE
● THREE ASPECTS OF THE SAME THING
● BENEFITS
● SCALE
● COUNTER-ARGUMENTS
The usage of words is pretty critical in communicating the insights. The three pillars of using
words effectively are:
● Being concise
● Using active voice
● Using a positive tone to deliver the message.
upgrad.com
16
Make sure that you have the following information with you before you send out any form of
business communication:
upgrad.com
17
Data Storytelling
Data storytelling is the practice of building a narrative around a set of data and its
accompanying visualizations to help convey the meaning of that data in a powerful and
compelling fashion.
1. Statistics show that only 5% people remember numbers, whereas 63% people
remember the story.
2. Storytelling is more persuasive than only statistics.
● Message
● Visuals
● Structured flow
● Narrative.
upgrad.com
18
upgrad.com
19
Type of Variables
1. Qualitative Variables
N ominal: Used for labelling variables without any scale. There is no overlap or
order. Eg: Gender of a person
1. Quantitative Variables
I ntervals: Numeric measures where the underlying order and the difference is known.
Eg: Temperature
Ratios: Numeric measures which are basically interval scale variables along with
an absolute zero allowing several forms of statistical analyses to be performed.
Eg: Height
upgrad.com
20
Scatter plot enables you to analyse the data in the following ways:
Eg:
upgrad.com
21
2. Line Chart
upgrad.com
22
Histograms are useful for visualising the distribution of numerical data and make
inferences from them.
Eg:
upgrad.com
23
In the chart, each category would be represented by a different colour. The sector
covered by that category would represent the percentage share it occupies with respect
to the data that is being analysed.
Eg:
upgrad.com
24
● When we further divide the rectangles in a bar graph into specific categories, the
visualization is referred to as a stacked bar chart. This is useful when the given bar
chart is unable to offer insight at a sub-category level.
Eg:
upgrad.com
25
● Trade-off between Accuracy and Precision: This is done to reduce the complexity of the
figures and numbers that are being presented and thereby reduce the amount of
attention needed on the part of the audience to grasp the information.
● Drawing attention: This can be done in several ways - by highlighting the significant
figures, using different fonts, boldening the key values, etc.
upgrad.com
26
Any Queries?
Thank
You!
upgrad.com