Data Blending For Dummies
Data Blending For Dummies
Data Blending
Alteryx Special Edition
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Publishers Acknowledgments
Some of the people who helped bring this book to market include the following:
Senior Project Editor: Zo Wykes
Acquisitions Editor: Amy Fandrei
Editorial Manager: Rev Mengle
Business Development Representative:
Kimberley Schumacker
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
About This Book......................................................................... 1
Foolish Assumptions.................................................................. 2
Icons Used inThis Book............................................................. 2
Beyond theBook......................................................................... 3
Where toGo fromHere.............................................................. 3
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
iv
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Foolish Assumptions
Assumptions often cause problems, but still they have their
placeand if they werent useful, people wouldnt make
them. I sometimes have to make assumptions too, and writing this book is a prime example. Mainly, I assume that you
know something about data, perhaps have heard of Big Data
and the cloud, and also know something about data analysis.
Hopefully, you also know what data analysts are, the tools
they sometimes use, and the important work they do. As
such, this book is written primarily for data analysts and the
business folks they support.
This icon points out things youll be glad I mentioned later on.
This is the stuff you want to remember when you start using the
material on your own.
This icon points out pieces of sage wisdom that I wish someone had told me when I was learning this subject.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
Beyond theBook
Alteryx Analytics provides data analysts with an intuitive
workflow for data blending and advanced analytics that leads
to deeper business insights in hours rather than the days or
weeks required of most analytics solutions. To learn more
about Alteryx Analytics, visit www.alteryx.com.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Sales
Marketing
Operations
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Supporting theBusiness
Decision Maker
Questions from an organizations decision makers are what
drive data analyst requirements. The business decision
makers are the executives and senior managers who determine what products and services to develop and sell, which
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Utilizing reports and analytics that present data in a business context rather than from an IT perspective
Having the ability to share their insights with other business colleagues and run their queries again with different
input values
The demands of business decision makers are high because
fast-moving markets, fleeting opportunities, tight profit margins, and fierce competition dont allow for delays or incorrect
results. Data analysts must continue to evolve their tools and
processes to meet these increasing demands.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IT staff knows technology, but their knowledge of business rules is relatively weak. This necessitates long and
complex Scope of Work (SOW) documents, and if input
from data analysts is lacking, mistakes can be made.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
10
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 2
Understanding Data
Blending
In This Chapter
Discussing the forms of data in the real world and how it naturally
exists
aking the right business decisions is crucial to success for any profitable company. The best decisions
are arrived at by identifying all the right data, applying the
correct analytical methodology to that data, and correctly
interpreting the results to form the decision. In business, this
is a collaborative effort of data analysts and business decision
makers working together.
Basing your decision on the right data is critical. Without the
right data, you may have incomplete results on which to make
your decisions. In a worst-case scenario where you have the
wrong data, you get incorrect results that send you down the
path of a disastrous decision. Knowing where the right data
exists and how to bring all that data from different sources
into your decision-making process is at the core of good decision making in business.
This chapter looks at what data blending is and how it is used
in decision making.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
12
Exploring theWide
WorldofData
In the real world, data comes in all different shapes, sizes, and
formats. Increasingly, data comes from everywhere, ranging
from corporate databases that are decades old to information
generated on smartphones during the time it takes to read
this sentence.
Data exists in several different formats:
Unstructured: Data that is not text based, such as pictures, images, or sound files generated by devices or
posted on social media. Unstructured data is a challenge
to manage because it is large in size, difficult to catalog
and index, and problematic to store in databases.
In addition to characterizing data based on its format, a useful
way to view data is based on its nature. Data naturally falls
into these three categories:
Traditional: Data that lives in existing or legacy databases and is often in a well-structured format. Rows in
an Excel spreadsheet, records in an accounting database
table, or account information in an insurance mainframe
database are examples of traditional data. More modern
examples include information in data warehouses and
cloud applications.
Enrichment: Data that is industry specific or special purpose and used to supplement (or enrich) existing data.
For example, spatial grid coordinates identifying where
customers like to shop would enrich sales information, or
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
13
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
14
Accessing Multiple
Data Sources
Bringing data together from different sources is how data
adds value to the decision-making process. In the real world,
the data you need usually wont sit neatly in a predefined
database inside a fully prepared data table just waiting for you
to access it. In most cases, data must be obtained from different sources to add the depth and wide scope necessary for
the best possible analysis and decision making.
The simple fact is that the best decisions will be made only
when all the relevant data is available for analysis. The key
data almost always comes from multiple data sources and
often comes in different formats.
Data analysts face the following challenges when attempting
to access data from different data sources:
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
15
Once data is obtained, differing data formats, characteristics, and amounts of irrelevant data greatly add to the
time and complexity required to cleanse and prepare
the data for analytical processing. This data cleanup and
preparation is one of the most time-consuming and complex steps performed by data analysts.
Despite these challenges, the rewards of accessing multiple
data sources far outweigh the challenges. Given these challenges, solutions to improve the situation for data analysts
include the following:
Data cleansing and preparation tools that must be powerful yet intuitive and easily usable by the data analyst.
Because these steps have historically been complex and
time consuming, special emphasis should be placed to
use tools that reduce the complexity while speeding up
the process for the analyst.
The more that data analysts are empowered and become selfreliant to find, access, prepare, and ultimately analyze and
process the relevant data, the more effective the data analysts
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
16
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
17
Thats MY data!
Organizations can become very
territorial and protective of their
dataoften for good reason. The
most sensitive details of a company,
its customers and employees, and its
sales information are within its data,
typically in databases managed by
the IT department. Legal, professional, and ethical requirements to
safeguard specific data elements
are very strong and must be adhered
to. Understandably, access to data
by anyone, especially those outside
the immediate group of data owners
and users, is scrutinized and often
challenging to obtain. No company
wants to make the news by having
its customers data stolen as has
recently happened with several bigname companies. The damage to
those companies reputations and
their legal liabilities are staggering.
Unfortunately, in some companies
(and even worse in many government agencies), access to the data
even for reasons that benefit the
business can be difficult to obtain.
Database administrators (DBAs) and
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
18
Data analysts in line-of-business departments are empowered by platforms such as Alteryx to perform data blending and dont have to rely on their IT departments.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
19
Figure2-1: M
ultiple data approaches of data blending.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
20
Blending results in a dataset with the purpose of supporting analysis for a specific business question and is
created by business and data analysts.
Both data integration and data blending have a valid place in
business, but given the increasing need for delivering better
information and resultsfaster and with less effort and complexitymore and more organizations are turning to data analysts who use Alteryx to deliver these deeper business insights
in hours, not weeks, as other analytics processes often require.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3
makers
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
22
Identifying theComponents
ofData Analysis
Analysts follow a structured methodology when performing
their analysis. Using modern toolsets, the process of data
blending is composed of three fundamental phases:
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
23
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
24
Joining data
Joining data is the phase where the prepared data from various
data sources is combined together to build the working dataset that will be the subject of the advanced analytics phase.
At this point, the data is already prepared so that it is free
from errors, is in usable formats, and is relevant for the specific analysis being performed; but the data is still in separate
buckets from each data source. The analyst must join that
data together to create a dataset so that analysis can begin.
Historically, the data scientist or data analyst used the following processes to join data into a workable dataset:
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
25
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
26
Figure 3-2: A
ccess, prepare, and join data in an Alteryx workflow.
Figure3-3: A
lteryx workflow allows data blending steps to be automated.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
27
Figure3-4: W
orkflow automation and resulting outputs.
Empowering theAnalyst
andDecision Makers
The greatest benefit of data blending is the capability to put
accurate, actionable data in the hands of the analysts and
decision makers in a timely manner. Diving deeper into the
benefits, you can see that data blending:
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
28
Puts the most critical data in front of analysts and decision makers at the right times
Opens up the access to large amounts of data with minimal complexity to promote a more complete level of
analysis with better data and results
Alteryx empowers line-of-business analysts with an intuitive
workflow that can easily access, prepare, cleanse, and join all
sources of data and deliver critical insight in the fastest time
possible.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 4
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
30
Intuitive workflows
Data analysts are both creative and logical people; their tools
should work in harmony with their thought processes. Using
intuitive workflows that flow with the natural processes of the
data analyst ensures that analysts work and creativity are not
disrupted. Essentially, you want tools that work with your analysts, not against them.
Workflows provide the benefit of documenting what is occurring during the data preparation; this functionality makes the
job of the data analyst easier and more efficient. Unlike cumbersome spreadsheets that often lead to confusion and dont
support retracing the steps through a process, analytic tools
such as Alteryx give analysts the ability to easily understand
processes using graphical workflows (see Figure4-1).
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
31
Figure4-1: A
lteryxs intuitive workflow for data blending and advanced
analytics.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
32
Finance Operations
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
33
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
34
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
35
Once the analytical processing is completed, many different output presentation formations are available to the data
analyst. As shown in Figure4-3, Alteryx can also create the
native file formats used by Tableau and QlikView, which are
especially effective with senior business decision makers to
perform data discovery.
In Figure4-3, you see output in the form of a Tableau workbook displaying sales results by categories for stores in a city.
The visualization aspect allows a clear understanding of sales
data per store based on customer segment. Additionally, a
spatial aspect is present with data relating to the customers
distance from their stores, zip codes, and territory assignment data.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
36
Figure4-3: O
utput directly to a visualization tool. Alteryx supports direct
output to Tableau and QlikView.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
37
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
38
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5
apps
40
41
42
43
Incorporate Visualization
Many analytical tools perform number crunching well, but the
results are output as raw data or in a format that is difficult
for nontechnical people to understand. Business decision
makers are often visual people so the capability to display
output in a graphical format is important. In the past, data
analysts have imported that data into presentation tools such
as Microsoft Excel to render more user-friendly graphics.
This process works, but the time has come to evolve past this
tedious process.
Alteryx Designer supports generating output into a wide
variety of formats, including Excel, Extensible Markup
Language (XML), and Portable Document Format (PDF) files.
Furthermore, the data analyst can generate output to Tableau
and QlikView visualization products.
Visualization products allow data to be more easily understood
and relationships to be identified and explored, thus allowing business decision makers to more quickly identify new
opportunities. However, visualization tools are not as strong
with advanced data blending and preparation, and they lack
predictive and spatial analytics capabilities. Therefore, using
a combination of Alteryx to perform the data blending and
advanced analytics with the output displayed within visualization tools leverages the individual strengths of the tools to
deliver a successful combined outcome.
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
44
These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.