Dresner Data Preparation Market Study 2021
Dresner Data Preparation Market Study 2021
2021 Edition
Licensed to Trifacta
Data Preparation Market Study 2021
Disclaimer:
This report is for informational purposes only. You should make vendor and product selections based on
multiple information sources, face-to-face meetings, customer reference checking, product demonstrations,
and proof-of-concept applications.
The information contained in this Wisdom of Crowds® market study report is a summary of the opinions
expressed in the online responses of individuals that chose to respond to our online questionnaire and does
not represent a scientific sampling of any kind. Dresner Advisory Services, LLC shall not be liable for the
content of this report, the study results, or for any damages incurred or alleged to be incurred by any of the
companies included in the report as a result of the report’s content.
Reproduction and distribution of this publication in any form without prior written permission is forbidden.
Definitions
Business Intelligence tools and technologies include query and reporting, OLAP (online
analytical processing), data mining and advanced analytics, end-user tools for ad hoc
query and analysis, and dashboards for performance monitoring.
Introduction
In 2021, we mark the 14th anniversary of Dresner Advisory Services and the seventh
edition of this report. Our thanks to all of you for your continued support and ongoing
encouragement. Since our founding in 2007, we worked hard to set the “bar” high—
challenging ourselves to innovate and lead the market—offering ever greater value with
each successive year.
At the time of publication of this report, the COVID-19 pandemic continues to affect
millions worldwide and impacts businesses and how they leverage data and business
intelligence. As our data collection took place during Q3 and Q4 of 2020, the data and
resulting analyses reflect the pandemic’s impact.
Through this period, we separately conducted specific COVID-19 research, which is not
reflected in this report but is available on our blog at no cost. Additionally, we will
continue to collect this data and will continue to publish research through the duration of
the pandemic.
An important step towards the ongoing trend of user empowerment and self-service
business intelligence, data preparation drives an increasing amount of investment on
both demand and supply sides of the equation.
Best,
Contents
Definitions ....................................................................................................................... 3
Business Intelligence Defined ...................................................................................... 3
Data Preparation Defined............................................................................................. 3
Introduction ..................................................................................................................... 4
Benefits of the Study ....................................................................................................... 7
Consumer Guide .......................................................................................................... 7
Supplier Tool ................................................................................................................ 7
External Awareness .................................................................................................. 7
Internal Planning ....................................................................................................... 7
About Howard Dresner and Dresner Advisory Services .................................................. 8
About Jim Ericson ........................................................................................................... 9
The Dresner Team ........................................................................................................ 10
About Elizabeth Espinoza .......................................................................................... 10
About Kathleen Goolsby ............................................................................................ 10
About Danielle Guinebertiere ..................................................................................... 10
About Michelle Whitson-Lorenzi................................................................................. 10
Survey Method and Data Collection .............................................................................. 11
Data Quality ............................................................................................................... 11
Findings and Analysis ................................................................................................ 11
Focus of Research ........................................................................................................ 11
Executive Summary ...................................................................................................... 12
Study Demographics ..................................................................................................... 13
Geography ................................................................................................................. 13
Functions ................................................................................................................... 14
Vertical Industries ...................................................................................................... 15
Organization Size ....................................................................................................... 16
Analysis of Findings ...................................................................................................... 17
Importance of Data Preparation ................................................................................. 18
Effectiveness of Current Approach to Data Preparation ............................................ 25
Frequency of Data Preparation .................................................................................. 32
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 5
Data Preparation Market Study 2021
Consumer Guide
As an objective source of industry research, consumers use the Dresner Advisory
Services Data Preparation Market Study to understand how their peers leverage and
invest in data preparation and related technologies.
Using our unique vendor performance measurement system, users glean key insights
into BI software supplier performance, which enables:
Supplier Tool
Vendor licensees use the Dresner Advisory Services Data Preparation Market Study in
several important ways:
External Awareness
Build awareness for business intelligence markets and supplier brands, citing the
Dresner Advisory Services Data Preparation Market Study trends and vendor
performance
Gain lead and demand generation for supplier offerings through association with
the Dresner Advisory Services Data Preparation Market Study brand, findings,
webinars, etc.
Internal Planning
Refine internal product plans and align with market priorities and realities as
identified in the Dresner Advisory Services Data Preparation Market Study
Better understand customer priorities, concerns, and issues
Identify competitive pressures and opportunities
Howard Dresner is one of the foremost thought leaders in business intelligence and
performance management, having coined the term “Business Intelligence” in 1989. He
has published two books on the subject, The Performance
Management Revolution – Business Results through Insight
and Action (John Wiley & Sons, Nov. 2007) and Profiles in
Performance – Business Intelligence Journeys and the
Roadmap for Change (John Wiley & Sons, Nov. 2009). He
lectures at forums around the world and is often cited by the
business and trade press.
Howard has conducted and directed numerous in-depth primary research studies over
the past two decades and is an expert in analyzing these markets.
Howard conducts a weekly Twitter “tweetchat” on Fridays at 1:00 p.m. ET. During these
live events the #BIWisdom “tribe” discusses a wide range of business intelligence
topics.
Jim has served as a consultant and journalist who studies end-user management
practices and industry trending in the data and information management fields.
From 2004 to 2013 he was the editorial director at Information Management magazine
(formerly DM Review), where he created architectures for user and
industry coverage for hundreds of contributors across the breadth of
the data and information management industry.
Data Quality
We carefully scrutinized and verified all respondent entries to ensure that only qualified
participants are included in the study.
Focus of Research
In this study, we address key data preparation issues including:
Executive Summary
- Data preparation ranks 5th among technologies and initiatives strategic to
business intelligence. Sixty-three percent say data preparation is critical or very
important. Importance declines slowly over time (p. 18-24). Industry importance
scores are flat over the last three years (p. 80)
- Almost three quarters say their current data-preparation approach is “highly” or
“somewhat” effective (p. 25-31). Industry support for usability is far ahead of user
requirements (p. 81).
- Sixty-five percent of respondents "constantly" or "frequently" make use of data
preparation. Year-over-year frequency increases slightly (p. 32-37).
- Thirty-one percent of respondents "constantly" or "frequently" enrich data
preparation with third-party data; 69 percent do so “occasionally,” rarely,” or
“never” (p. 38-43).
- There is strong interest in many data-preparation usability features, led by
“save/preview” and “automated detection of anomalies” (p. 44-49).
- Demand for data-preparation integration features is very strong, led by "ability to
combine data across multiple data sets and sources," "access to file formats,"
and “access to traditional databases” (p. 50-55). Robust industry support
answers all current user expectations for integration features (p. 82).
- Interest in data manipulation features is high; at least five are considered “very
important” to the majority of users (p. 56-61). Industry support for manipulation
features easily accommodates top user priorities (p. 84).
- Among support for outputs, “Excel” and “traditional database” are most required
by users (p. 62-67). Industry support is strong and addresses growing uptake of
Azure, Amazon Redshift, Google BigQuery, and other outputs (p. 83).
- Data-preparation deployment features for scheduling, monitoring, and processing
are of importance to users (p. 68-73). Vendors expect future investment to keep
feature availability well ahead of user requirements (p. 85).
- Respondents most prefer on-premises deployment, followed by private cloud and
public cloud. Over time, interest shifts to public cloud (p. 74-79). Industry support
is strong for cloud and on-premises deployment and shifting to cloud (p. 86-87).
- Data-preparation vendor rankings are shown on p. 88.
Study Demographics
Our sample includes a cross-section of data across geographies, functions,
organization sizes, and vertical industries. We believe that, unlike other industry
research, we offer a more characteristic sample and better indicator of true market
dynamics.
Geography
Survey respondents represent a mix of global geographies. Forty-eight percent
represent North America (including five Canadian provinces and the majority of U.S.
states). Thirty-one percent work in EMEA; the remainder represent Asia Pacific and
Latin America (fig. 1).
Geographies Represented
60%
50% 48.0%
40%
31.4%
30%
20%
13.7%
10% 7.0%
0%
North America Europe, Middle East and Asia Pacific Latin America
Africa
Figure 1 – Geographies represented
Functions
Information Technology accounts for the largest group of respondents by function (37
percent). About 19 percent come from the Business Intelligence Competency Center
(BICC). Executive Management and R&D are the next most represented (fig. 2).
Tabulating results by function enables us to compare and contrast the plans and
priorities of different departments within organizations.
Functions Represented
40% 37.3%
35%
30%
25%
19.2%
20%
15%
10.7%
10% 7.8%
6.6% 7.0%
0%
Vertical Industries
Survey participants represent a wide range of vertical industries, led by Business
Services (about 21 percent), Financial Services (17 percent), and Manufacturing (16
percent) (fig. 3). Technology, Consumer Services, and Healthcare are the next most
represented.
Industries Represented
25%
20.7%
20%
16.6%
15.9%
14.8%
15%
10%
8.1%
7.0%
5.9%
5% 4.1%
3.0% 2.6%
0%
Organization Size
Our survey sample includes a mix of small, medium, and large organizations (fig. 4). In
2020, small organizations (1-100 employees) account for about 25 percent of the
sample, and mid-sized organizations (101-1,001 employees) account for 26 percent of
the sample. Large organizations (>1,000 employees) account for the remaining 48
percent of respondents, with very large organizations (>10,000 employees) accounting
for 23 percent.
26% 25.8%
26%
25%
24.7%
25% 24.4%
24%
24% 23.3%
23%
23%
22%
22%
1-100 101-1,000 1,001-10,000 More than 10,000
Analysis of Findings
In 2021, our seventh annual Data Preparation Market Study examines the nature of
data preparation, exploring user sentiment and perceptions, the nature of current
implementations, and plans for the future.
Our latest study sample reports very high perceived importance of data preparation,
which in turn reflects the ongoing importance of self-service business intelligence and
user autonomy (fig. 6). Sixty-three percent of all respondents say data preparation is
either critical or very important. About 82 percent of respondents say data preparation
is, at minimum, important. Just 7 percent say data preparation is “not important.”
29.5%
30%
25%
20% 19.2%
15%
11.8%
10%
7.0%
5%
0%
Critical Very Important Important Somewhat Not Important
Important
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
2015 2016 2017 2018 2019 2020 2021
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Marketing & Strategic BICC R&D Information Executive Finance
Sales Planning Technology (IT) Management
Function
The mean perceived importance of data preparation varies little by weighted mean
across all geographic regions in our 2021 study, with importance rankings between 3.6
and 3.8 (near “very important”) (fig. 9). Asia Pacific notably posts the greatest number of
“critical” importance scores (38 percent), compared to 31 percent in EMEA, 28 percent
in North America and 16 percent in Latin America. Skepticism is low across all regions
(fewer than 6 percent “not important” scores) except EMEA, where 12 percent say data
preparation is “not important” and another 12 percent say it is only “somewhat
important.”
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Asia Pacific Latin America North America Europe, Middle East and
Africa
The importance of data preparation clearly increases with organization headcount (fig.
10). This effect is most pronounced in very large organizations (>10,000 employees),
where mean importance rises to 4.0, compared to 3.5-3.6 for all smaller peers. Thus,
overall scores are healthy across all organizations and range from between “important”
to “very important,” to “very important” in the largest organizations. The percentage of
“critical” importance scores also increases visibly with organization size.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
1-100 101-1,000 1,001-10,000 More than 10,000
The importance of data preparation varies from “important” to “very important” across
different industries in our 2021 study (fig. 11). This year, the strongest weighted-mean
sentiment is among respondents in Consumer Services (4.0), Government (3.9),
Healthcare (3.8), and Business Services (3.8). Importance thereafter declines slightly in
Financial Services (3.7), Technology (3.6), and Manufacturing (3.6). The lowest
importance is among respondents in Retail/Wholesale (3.1) and Higher Education (3.3).
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
49.6%
50%
40%
30%
25.0%
20.9%
20%
10%
4.5%
0%
Highly Effective Somewhat Effective Somewhat Ineffective Totally Ineffective
The effectiveness of the current approach to data preparation is largely steady across
seven years of data, particularly in the years 2017-2021 (fig. 13). In the current-year
study, the current approach to data preparation reaches an all-time weighted-mean high
of 2.95 (“somewhat effective”), up from 2.91 in 2020. Also, the number of "highly
effective" responses grows over the history of the survey and reaches an all-time high of
25 percent in 2021, though combined “highly effective” and “somewhat effective” scores
are slightly below a 2019 peak. We believe this steady effectiveness reflects satisfaction
but leaves considerable room for improvement in the opinions of data-preparation
users.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
2015 2016 2017 2018 2019 2020 2021
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
BICC Marketing & Executive R&D Information Finance
Sales Management Technology (IT)
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Asia Pacific North America Latin America Europe, Middle East and
Africa
The perceived effectiveness of data preparation in 2021 is very steady (weighted mean
2.9-3.0) across organizations of different sizes (fig. 16). Within these findings, however,
we find small organizations (1-100 employees) and very large organizations (> 10,000
employees) report slightly higher measures of data-preparation effectiveness. Also, very
large organizations report the highest combined percentage of “highly effective” and
“somewhat effective” data preparation satisfaction (80 percent). By these measures, we
can say effectiveness increases in the largest organizations, but by a smaller margin
than in earlier years of our study.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
1-100 101-1,000 1,001-10,000 More than 10,000
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Success with data preparation mildly correlates with success with business intelligence
(fig. 18). This is best seen in measures of “highly effective” data preparation, which
declines among organizations that are “somewhat successful” (25 percent), and
“unsuccessful and somewhat unsuccessful” (17 percent).
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Successful Somewhat Successful Unsuccessful & Somewhat
Unsuccessful
Rarely, 10.2%
Constantly, 23.9%
Occasionally, 22.7%
Frequently, 40.5%
Across the seven years of our focused data-preparation study, respondents report
weighted-mean frequency of use in a band between 3.6 and 3.9 (fig. 20). This includes
a high mark of 3.9 in 2017 to the current-year mark of 3.7, which is slightly up from
2020. Likewise, “constant” users improve to 24 percent in 2021 compared to 21 percent
in our 2020 study. The two lowest levels of use also improve in our 2021 study
compared to the year prior. It is possible, but not certain, that developments including
pre-formatting and automation play a part in frequency of data preparation over time.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
2015 2016 2017 2018 2019 2020 2021
The frequency of data preparation is robust across multiple functions, from a weighted-
mean high of 4.3 in Strategic Planning to a low of 3.5 in Finance (fig. 21).
Marketing/Sales is the next most frequent data-preparation user group by function and,
with Sales and Planning, accounts for the most numerous combined “constant” and
“frequent” users. A second tier of Executive Management, R&D, and BICC respondents
are the next most likely users, with about 70 percent either “constant” or “frequent
users. Finance, the least frequent user group by function, nonetheless accounts for
close to 90 percent who are, at minimum, “occasional” users of data preparation.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Strategic Marketing & Executive R&D BICC Information Finance
Planning Sales Management Technology (IT)
Function
Weighted-mean scores for frequency of data preparation are highest among Asia-
Pacific respondents (3.9), followed by North America (3.7), EMEA (3.7), and Latin
America (3.6) (fig. 22). Combined “constant” and “frequent” users are also most
common in Asia Pacific. North America trails all regions by this measure due to larger
numbers of “occasional” or “rarely” responses. Again, frequency is robust across
geographies, with between 84-95 percent of all respondents at least “occasional” users.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Asia Pacific North America Europe, Middle East and Latin America
Africa
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
1-100 101-1,000 1,001-10,000 More than 10,000
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
30% 29.1%
25.7%
25% 23.8%
20%
15% 14.3%
10%
7.2%
5%
0%
Constantly Frequently Occasionally Rarely Never
Between 2015 and 2021, respondents report a fairly constrained range of frequency of
third-party data use in conjunction with data preparation (fig. 26). By weighted mean,
third-party data enrichment remains in a narrow band between 2.74-2.94. Year-over-
year third-party weighted-mean frequency ticks up from 2.77 to 2.84 in 2021 but is lower
than the peak of 2.94 in 2018. We might well expect that third-party data enrichment
use will grow over time, though the latest three-year trend shows small increases,
indicating that organizations most often continue to grapple with internal data.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
2015 2016 2017 2018 2019 2020 2021
By function, Marketing/Sales is by far most likely to enrich data preparation with third-
party data in 2021 (fig. 27). Combined “constant” and “frequent” users account for 63
percent of Marketing/Sales respondents, compared to the 33-38 percent of similar users
in Strategic Planning, Executive Management, Finance, and the BICC. Weighted-mean
frequency of third-party enrichment data is also by far highest in Marketing/Sales (3.73),
particularly when compared to the lowest comparisons in R&D (2.47) and IT (2.66).
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Marketing & Strategic BICC Executive Finance Information R&D
Sales Planning Management Technology (IT)
Function
Interest in third-party data enrichment of data preparation is greatest in Asia Pacific and
Latin America, with the latter accounting for the most combined “constant” and
“frequent” users (42 percent) (fig. 28). By comparison, EMEA respondents report the
fewest combined “constant” and “frequent” third-party data users (27 percent), followed
by North America (31 percent). Close to half of EMEA respondents “rarely” or “never”
use third-party data-preparation enrichment.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Asia Pacific Latin America North America Europe, Middle East and
Africa
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
1-100 101-1,000 1,001-10,000 More than 10,000
In 2021, Higher Education industry respondents are the most likely users of third-party
data enrichment in data preparation by weighted mean (fig. 30). This group also claims
greater than 90 percent at least “occasional” users. Healthcare respondents are the
next most likely users; like Government, Business Services, and Technology industry
respondents, Healthcare respondents report the greatest select concentration of
“constant” users of third-party enrichment. Excluding Manufacturing, at least 60 percent
or far more of respondents in all industries are, at minimum, “occasional” users of third-
party enrichment data.
90% 4.5
80% 4
70% 3.5
60% 3
50% 2.5
40% 2
30% 1.5
20% 1
10% 0.5
0% 0
Collaboration capabilities/integration
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Across the last six years of our study, attitudes toward data-preparation features
fluctuate somewhat by rank and ebb and flow unevenly over time (fig. 32). In our 2021
study, "automated detection of anomalies" retains the top ranking, followed by
"immediate preview and feedback" and "visual interface.” While “automated detection”
gathers importance in our latest study, the second and third choices lost user interest
compared to earlier years. Many other usability features also decline in long-term
importance but were at higher levels than in 2020. These include “automated
recommendations,” “visual highlighting,” “support for entire data transformation
process,” and more. "Machine learning" actually loses a small amount of momentum
compared to our 2020 study.
Automated recommendations
Automatically generate data
for data relationships and keys
transformation code/scripts
for combining data across
for execution
multiple data sets and sources
Support for entire data Visual highlighting of
transformation process in a relationships between
single application/user columns, attributes and
interface datasets
By function, peak interest in usability features varies rather widely by function (fig. 33).
In our latest sample, respondents in Strategic Planning lead with “ability to save
preparation steps” and report an outsized interest in several features including
“immediate preview and feedback,” “automated recommendations,” “visual highlighting,”
“collaboration capabilities,” and “technical expertise not required.” R&D respondents
more narrowly give high marks to “automated detection of anomalies.” BICC
respondents give the top scores to “mask or redact sensitive data” and narrowly to
“machine learning and recommendations.”
0
View/audit all prep tasks (and Automated recommendations for
changes) and annotate for data relationships and keys for
knowledge-sharing combining data across multiple…
Interest in data-preparation usability features varies widely by industry in our 2021 study
(fig. 36). For example, Consumer Services respondents give high marks to “ability to
save preparation steps” and “collaboration,” while Business Services most prefers
“automated detection of anomalies.” Healthcare gives the top score to “immediate
preview and feedback.” Education respondents lead interest in “visual interface,” and
Government leads “visual highlighting of relationships” and “support for entire data
transformation process.” Financial Services gives the top score for “mask or redact
sensitive data.”
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Peak interest in the top two data-integration features is mostly sustained in the last five
years of our seven-year focused study (fig. 38). In our latest sample, “ability to combine
data” and “ability to access semi-structured data” are at peak interest in 2021 compared
to previous years. Scores for “access to traditional databases,” “ability to infer
metadata,” and “ability to extract data from documents” are below historic highs.
“Access to big data” falls below the level of “important” in 2021, a historic low for our
study.
Latin America Asia Pacific Europe, Middle East and Africa North America
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
3
Ability to normalize,
Custom user defined functions 2 standardize, and enrich data
0
Ability to un-nest data (e.g., Support for cutting, merging,
JSON / XML parsing) and replacing values
Latin America Asia Pacific North America Europe, Middle East and Africa
Amid results that are for the most part tightly clustered, large organizations (1,001-
10,000 employees) and very large organizations (> 10,000 employees) lead overall
interest in data-preparation manipulation features by organization size (fig. 47).
Thereafter, interest in unique features declines unevenly among respondents in mid-
sized (101-1,000 employees) and small (1-100 employees) organizations, though
overall differences are minimal.
The preeminence of Excel and CSV shows a continued run of dominance as the top
choice among data-preparation supported outputs across five years of data (fig. 50).
Among other output formats, the second and third choices (traditional relational
database and JSON) decline dramatically in importance compared to earlier years of
study. Interest in popular third-party business intelligence tool formats falls to an all-time
low of less than 40 percent. However, lower-ranked choices Azure, Amazon Redshift,
and Google BigQuery all tick up noticeably in the 2021 study. We expect that these
latter choices will continue to experience the greatest near-term growth by percentage
going forward.
Parquet Azure
All functional roles in organizations choose Excel and CSV as the most-preferred data-
preparation supported output (fig. 51). For this and other features, sentiment for support
of data-preparation outputs varies widely across functions in our 2021 study.
Respondents in Strategic Planning unanimously require Excel, CSV support, and post
the top marks (about 90 percent) for traditional relational database support, both figures
well ahead of other requirements by function. The next highest support requirement
comes from Executive Management for JSON (about 70 percent). Requirements for
trailing output support endpoints (Azure, third-party BI platform, Amazon RedShift,
Google BigQuery, etc.) fall quickly to below 50 percent across all industries.
Popular (third-party)
Parquet business intelligence tool
formats
Google BigQuery
Information Technology (IT) BICC Executive Management
R&D Finance Marketing & Sales
Strategic Planning Function
Preference for flat file Excel and CSV outputs for data prep is uniformly highest across
all geographic regions (about 90 percent), though lower (about 83 percent) among Asia-
Pacific respondents (fig. 52). EMEA respondents most require traditional database
output support (80 percent). Requirements thereafter fall to 60 percent or lower for all
other supported outputs across all geographies.
Parquet Azure
Google BigQuery
North America Europe, Middle East and Africa Asia Pacific Latin America
A great majority of organizations of any size (86-92 percent) share the highest
preference for Excel and CSV output support for data preparation (fig. 53). Traditional
relational database is a strong second choice, particularly in about 80 percent of very
large organizations (> 10,000 employees). Support requirements thereafter decrease
quickly to less than 60 percent in mid-sized organizations (101-1,000 employees) for
JSON and lower still for Azure, Amazon Redshift, and Google BigQuery.
Hadoop Azure
Google BigQuery
All industries we sampled in our 2021 study share the greatest preference (in the range
of 88-95 percent) for Excel, CSV output, and nearly all industries (led by
Retail/Wholesale and Technology) make traditional relational databases their second
choice (fig. 54). The aforementioned Technology organizations are also most likely by
far (81 percent) to require JSON output support for data preparation. To a lesser
degree, Consumer Services and Business Services most often require traditional third-
party BI tool formats. Interestingly, our 2021 study finds Government respondents
reporting a relatively high preference for Azure, Amazon RedShift, and Google
BigQuery.
Parquet Azure
Google BigQuery
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1.5
Push-down processing of data
Ability to schedule
transformations into the
execution/replay of data
native data source for script
transformation processing
execution (e.g., SQL, Pig, etc.)
Sentiment toward the top three data-preparation deployment features is high across
functions, with mean interest mostly ranging from "important" toward "very important"
(fig. 57). Interest in the top feature, “schedule a process,” is highest in Operations
(which also reports very high interest in “ability to schedule execution of data
transformation” and “API support”). Thereafter, R&D leads interest in multiple
deployment features, including “ability to monitor,” “API support,” “push-down
processing,” and “support for multiple execution environments.”
Latin America Asia Pacific North America Europe, Middle East and Africa
Interest in data preparation deployment features often tightly clusters but generally
increases with organization global headcount (fig. 59). In 2021, very large organizations
(> 10,000 employees) lead interest in all but one of seven features sampled, led by
weighted-mean scores of 4.0 and 3.9 for “schedule a process to run,” “ability to monitor
ongoing data transformation,” and “ability to schedule execution/replay of data
transformation process.” “API support” is the lone exception to very large organization
leadership, but both very large and large organizations (1,001-10,000 employees)
provide scores of about 3.7, close to “very important.”
On Premises
Private Cloud
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Across five years of data, we see an expected growth in interest in the use of public
cloud access to data-preparation capabilities and, to a lesser extent, a decrease in on-
premises location of data-preparation capabilities (fig. 62). Most notable is the upward
slope of public cloud over time, with five consecutive-year increases up to an extended
high in 2021. The use of cloud-hosted data preparation still trails both on-premises and
private cloud, though the three options have never been closer by user preference
(weighted mean 3.27, 3.42, and 3.66, respectively). All three of these values fall in the
range of somewhat more than “important.”
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
2017 2018 2019 2020 2021
Public cloud (SaaS) Private cloud On-premises
Linear (Public cloud (SaaS)) Linear (On-premises)
While the hierarchy of on-premises, private cloud, and public cloud deployment
generally holds across functions, not everyone most prefers on-premises deployment
(fig. 63). In our latest sample, Executive Management has a clear preference for public
cloud (3.96) versus private cloud (3.71) and on-premises (3.29), which will likely divine a
trend for future enterprise adoption. Executives and R&D respondents also show a
preference for private cloud over on-premises deployment. Marketing/Sales, BICC,
Strategic Planning, Finance, and IT are the stalwart supporters for on-premises
deployment in our 2021 study, with scores close to 4.0, or “very important.”
3.5
2.5
1.5
0.5
0
Marketing & Executive R&D BICC Finance Strategic Information
Sales Management Planning Technology
Function (IT)
The hierarchical preference for on-premises capabilities for data preparation is not quite
consistent across all geographic regions in our 2021 study (fig. 64). EMEA respondents
are the most steadfast supporters of on-premises deployment, with “very important”
scores. North and Latin America also keep to the preference for on-premises versus
private and public cloud. But, notably, we see Asia-Pacific respondents with the highest
of all scores for public cloud (> “very important”).
3.5
2.5
1.5
0.5
0
Asia Pacific Europe, Middle East and North America Latin America
Africa
3.5
2.5
1.5
0.5
0
1-100 101-1,000 1,001-10,000 More than 10,000
With the exception of Technology and Business Services, all vertical industries rank on-
premises location of data preparation higher than all other models, sometimes by a
large margin (fig. 66). In our 2021 study, Government, Healthcare, and Financial
Services are the most likely to require on-premises hosting, perhaps in part due to
regulatory or proprietary reasons. Technology organizations, and to a much lesser
extent Business Services, are the interesting exceptions; they rank preferences in
reverse order from public cloud to private cloud and, finally, on-premises deployment.
90%
4.5
80%
4
70%
3.5
60%
50% 3
40%
2.5
30%
2
20%
1.5
10%
0% 1
2017 2018 2019 2020 2021
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Today 12 months No plans
Fig. 74 provides another instructive view of industry support growth for cloud-based
data preparation deployment over time. Our 2021 study shows a continuation of the
upward linear trend in cloud-based deployment options for data preparation. We also
observe a declining linear slope in support for on-premises deployment. Indeed, on-
premises support diminishes by a full 10 percent in 2021 from a high of 92 percent in
2017. As cloud deployment of data-preparation capabilities continues to grow in
organizations, we expect both cloud and on-premises models to remain well supported.
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
2015 2016 2017 2018 2019 2020 2021
Dundas 8 Qlik
4
Dimensional Insight Domo
2
Tableau 1 Microsoft
Oracle RapidMiner
Infor Sisense
Matillion SAS
Ataccama Talend
Fishtown Analytics
Name*: _________________________________________________
Address 1: _________________________________________________
Address 2: _________________________________________________
City: _________________________________________________
State: _________________________________________________
Zip: _________________________________________________
Country: _________________________________________________
Major Geography
( ) Asia/Pacific
( ) Latin America
( ) North America
_________________________________________________
( ) Executive management
( ) Finance
( ) Manufacturing
( ) Marketing
( ) Sales
( ) Advertising
( ) Aerospace
( ) Agriculture
( ) Automotive
( ) Aviation
( ) Biotechnology
( ) Broadcasting
( ) Business services
( ) Chemical
( ) Construction
( ) Consulting
( ) Consumer products
( ) Defense
( ) Education
( ) Energy
( ) Executive search
( ) Federal government
( ) Financial services
( ) Healthcare
( ) Hospitality
( ) Gaming
( ) Insurance
( ) Legal
( ) Manufacturing
( ) Mining
( ) Pharmaceuticals
( ) Publishing
( ) Real estate
( ) Sports
( ) Technology
( ) Telecommunications
( ) Transportation
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 92
Data Preparation Market Study 2021
( ) Utilities
( ) 1 - 100
( ) 101 - 1000
( ) 1001 - 5000
How important is it for users to be able to prepare data (e.g., combine, clean, shape
datasets) prior to analysis?*
( ) Critical
( ) Very important
( ) Important
( ) Somewhat important
( ) Not important
____________________________________________
____________________________________________
____________________________________________
____________________________________________
How effective is the current approach to Data preparation for Business Intelligence/user
analysis today?
( ) Highly effective
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 93
Data Preparation Market Study 2021
( ) Somewhat effective
( ) Somewhat ineffective
( ) Totally ineffective
How often do users have to prepare data (e.g., combine, clean and shape datasets) to
get it in a format that can be used for analysis?
( ) Constantly
( ) Frequently
( ) Occasionally
( ) Rarely
( ) Never
How often do users enrich internal data with third party data (e.g.,Dun & Bradstreet, US
Census)?
( ) Constantly
( ) Frequently
( ) Occasionally
( ) Rarely
( ) Never
( ) Standalone
Please indicate the importance of the following usability features for Data preparation
software:
Technical () () () () ()
expertise/programmi
ng is *NOT* required
to build/execute data
transformation scripts
Immediate preview () () () () ()
and feedback for end
user
Automated () () () () ()
recommendations for
data relationships &
keys for combining
data across multiple
data sets and
sources
Visual highlighting of () () () () ()
relationships
between columns,
attributes & datasets
Automated detection () () () () ()
of anomalies,
outliers, & duplicates
Automatically () () () () ()
generate data
transformation
code/scripts for
execution
Please indicate the importance of the following data integration features for Data
preparation software:
Access to () () () () ()
traditional
databases
(e.g.,
RDBMS)
Access to () () () () ()
big data
(e.g.,
Hadoop)
Access to () () () () ()
NoSQL
sources
Access to () () () () ()
file formats
(e.g., log
files, CSV,
Excel)
Ability to () () () () ()
infer
metadata by
introspecting
the data
elements
Ability to () () () () ()
combine
data across
multiple data
sets and
sources
through
joins and
merging
data
[ ] Excel, CSV
[ ] Hadoop
[ ] Redshift
[ ] Azure
[ ] Avro
[ ] Parquet
[ ] Bizp/gizp
Please indicate the importance of the following data manipulation features for Data
preparation software:
Simple () () () () ()
interface for
imposing
structure on
raw data
Ability to un- () () () () ()
nest data
(e.g. JSON /
XML parsing)
Ability to () () () () ()
normalize,
standardize &
enrich data
Support for () () () () ()
cutting,
merging &
replacing of
values
Ability to () () () () ()
aggregate &
group data
Ability to pivot () () () () ()
(convert table
to matrix) &
reshape
(convert
matrix to
table) data
Ability to () () () () ()
derive new
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 98
Data Preparation Market Study 2021
data features
from existing
data (text
extraction,
math
expressions,
date
expressions,
etc.)
Ability to () () () () ()
manipulate
the order of
data
transformation
steps
Session-ize () () () () ()
log or event
data
Window and () () () () ()
time series
functions
Custom user () () () () ()
defined
functions
Please indicate the importance of the following deployment features for Data
preparation software:
Ability to () () () () ()
iteratively
sample data to
provide an
interactive
testing of
transformation
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 99
Data Preparation Market Study 2021
logic
Push-down () () () () ()
processing of
data
transformations
into the native
data source for
script execution
(SQL, Pig, etc.)
Ability to () () () () ()
schedule the
execution/replay
of data
transformation
processing
Ability to () () () () ()
monitor ongoing
data
transformation
processing to
alert on
anomalies or
changes in the
structure
Support for () () () () ()
multiple
execution
environments
(e.g.,
MapReduce,
Spark, Hive)
based on
volume and
scale of data
sets
API support () () () () ()
(e.g., REST)
On- () () () () ()
premises
Private () () () () ()
cloud
Public () () () () ()
cloud
(SaaS)