0% found this document useful (0 votes)
57 views

Dresner Data Preparation Market Study 2021

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Dresner Data Preparation Market Study 2021

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

February 25, 2021

Dresner Advisory Services, LLC

2021 Edition

Data Preparation Market Study

Wisdom of Crowds® Series

Licensed to Trifacta
Data Preparation Market Study 2021

Disclaimer:

This report is for informational purposes only. You should make vendor and product selections based on
multiple information sources, face-to-face meetings, customer reference checking, product demonstrations,
and proof-of-concept applications.

The information contained in this Wisdom of Crowds® market study report is a summary of the opinions
expressed in the online responses of individuals that chose to respond to our online questionnaire and does
not represent a scientific sampling of any kind. Dresner Advisory Services, LLC shall not be liable for the
content of this report, the study results, or for any damages incurred or alleged to be incurred by any of the
companies included in the report as a result of the report’s content.

Reproduction and distribution of this publication in any form without prior written permission is forbidden.

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 2
Data Preparation Market Study 2021

Definitions

Business Intelligence Defined


Business intelligence (BI) is “knowledge gained through the access and analysis of
business information.”

Business Intelligence tools and technologies include query and reporting, OLAP (online
analytical processing), data mining and advanced analytics, end-user tools for ad hoc
query and analysis, and dashboards for performance monitoring.

Howard Dresner, The Performance Management Revolution: Business Results Through


Insight and Action (John Wiley & Sons, 2007)

Data Preparation Defined


Data Preparation is a capability for a variety of users—both business and IT—to model,
prepare, and combine data prior to analysis.

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 3
Data Preparation Market Study 2021

Introduction
In 2021, we mark the 14th anniversary of Dresner Advisory Services and the seventh
edition of this report. Our thanks to all of you for your continued support and ongoing
encouragement. Since our founding in 2007, we worked hard to set the “bar” high—
challenging ourselves to innovate and lead the market—offering ever greater value with
each successive year.

At the time of publication of this report, the COVID-19 pandemic continues to affect
millions worldwide and impacts businesses and how they leverage data and business
intelligence. As our data collection took place during Q3 and Q4 of 2020, the data and
resulting analyses reflect the pandemic’s impact.

Through this period, we separately conducted specific COVID-19 research, which is not
reflected in this report but is available on our blog at no cost. Additionally, we will
continue to collect this data and will continue to publish research through the duration of
the pandemic.

Data preparation is a topic that resonates strongly with organizations—and especially


with power users and analysts that, historically, were relegated to using whatever tools
were available for the purpose—regardless of limitations.

An important step towards the ongoing trend of user empowerment and self-service
business intelligence, data preparation drives an increasing amount of investment on
both demand and supply sides of the equation.

We hope you enjoy this report!

Best,

Chief Research Officer


Dresner Advisory Services

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 4
Data Preparation Market Study 2021

Contents
Definitions ....................................................................................................................... 3
Business Intelligence Defined ...................................................................................... 3
Data Preparation Defined............................................................................................. 3
Introduction ..................................................................................................................... 4
Benefits of the Study ....................................................................................................... 7
Consumer Guide .......................................................................................................... 7
Supplier Tool ................................................................................................................ 7
External Awareness .................................................................................................. 7
Internal Planning ....................................................................................................... 7
About Howard Dresner and Dresner Advisory Services .................................................. 8
About Jim Ericson ........................................................................................................... 9
The Dresner Team ........................................................................................................ 10
About Elizabeth Espinoza .......................................................................................... 10
About Kathleen Goolsby ............................................................................................ 10
About Danielle Guinebertiere ..................................................................................... 10
About Michelle Whitson-Lorenzi................................................................................. 10
Survey Method and Data Collection .............................................................................. 11
Data Quality ............................................................................................................... 11
Findings and Analysis ................................................................................................ 11
Focus of Research ........................................................................................................ 11
Executive Summary ...................................................................................................... 12
Study Demographics ..................................................................................................... 13
Geography ................................................................................................................. 13
Functions ................................................................................................................... 14
Vertical Industries ...................................................................................................... 15
Organization Size ....................................................................................................... 16
Analysis of Findings ...................................................................................................... 17
Importance of Data Preparation ................................................................................. 18
Effectiveness of Current Approach to Data Preparation ............................................ 25
Frequency of Data Preparation .................................................................................. 32
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 5
Data Preparation Market Study 2021

Frequency of Data-Preparation Enrichment with Third-Party Data ............................ 38


Data-Preparation Usability Features .......................................................................... 44
Data-Preparation Data-Integration Features .............................................................. 50
Data-Preparation Manipulation Features ................................................................... 56
Data-Preparation Supported Outputs ......................................................................... 62
Data-Preparation Deployment Features .................................................................... 68
Location of Data-Preparation Capabilities .................................................................. 74
Industry Support for Data Preparation ........................................................................... 80
Industry Support for Data-Preparation Usability Features .......................................... 81
Industry Support for Data-Preparation Integration ..................................................... 82
Industry Support for Data-Preparation Output Options .............................................. 83
Industry Support for Data Preparation Data Manipulation Features ........................... 84
Industry Support for Data-Preparation Deployment Features .................................... 85
Industry Support for Data Preparation—Cloud versus On-Premises ......................... 86
Data Preparation Vendor Ratings ................................................................................. 88
Other Dresner Advisory Services Research Reports .................................................... 89
Appendix: Data Preparation Survey Instrument ............................................................ 90

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 6
Data Preparation Market Study 2021

Benefits of the Study


This Dresner Advisory Services Data Preparation Market Study provides a wealth of
information and analysis, offering value to both consumers and producers of business
intelligence technology and services.

Consumer Guide
As an objective source of industry research, consumers use the Dresner Advisory
Services Data Preparation Market Study to understand how their peers leverage and
invest in data preparation and related technologies.

Using our unique vendor performance measurement system, users glean key insights
into BI software supplier performance, which enables:

 Comparisons of current vendor performance to industry norms


 Identification and selection of new vendors

Supplier Tool
Vendor licensees use the Dresner Advisory Services Data Preparation Market Study in
several important ways:

External Awareness
 Build awareness for business intelligence markets and supplier brands, citing the
Dresner Advisory Services Data Preparation Market Study trends and vendor
performance
 Gain lead and demand generation for supplier offerings through association with
the Dresner Advisory Services Data Preparation Market Study brand, findings,
webinars, etc.

Internal Planning
 Refine internal product plans and align with market priorities and realities as
identified in the Dresner Advisory Services Data Preparation Market Study
 Better understand customer priorities, concerns, and issues
 Identify competitive pressures and opportunities

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 7
Data Preparation Market Study 2021

About Howard Dresner and Dresner Advisory Services


The Dresner Advisory Services Data Preparation Market Study was conceived,
designed and executed by Dresner Advisory Services, LLC—an independent advisory
firm—and Howard Dresner, its President, Founder and Chief Research Officer.

Howard Dresner is one of the foremost thought leaders in business intelligence and
performance management, having coined the term “Business Intelligence” in 1989. He
has published two books on the subject, The Performance
Management Revolution – Business Results through Insight
and Action (John Wiley & Sons, Nov. 2007) and Profiles in
Performance – Business Intelligence Journeys and the
Roadmap for Change (John Wiley & Sons, Nov. 2009). He
lectures at forums around the world and is often cited by the
business and trade press.

Prior to Dresner Advisory Services, founded in 2007, Howard


served as chief strategy officer at Hyperion Solutions and was a research fellow at
Gartner, where he led its business intelligence research practice for 13 years.

Howard has conducted and directed numerous in-depth primary research studies over
the past two decades and is an expert in analyzing these markets.

Through the Wisdom of Crowds® Business Intelligence market research reports, we


engage with a global community to redefine how research is created and shared. Other
research reports include:

 Wisdom of Crowds® Flagship BI Market Study


 Analytical Data Infrastructure
 Cloud Computing and Business Intelligence
 Data Science and Machine Learning
 Enterprise Performance Management
 Natural Language Analytics
 Self-Service BI

Howard conducts a weekly Twitter “tweetchat” on Fridays at 1:00 p.m. ET. During these
live events the #BIWisdom “tribe” discusses a wide range of business intelligence
topics.

You can find more information about Dresner Advisory Services at


www.dresneradvisory.com.

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 8
Data Preparation Market Study 2021

About Jim Ericson


Jim Ericson is a Research Director with Dresner Advisory Services.

Jim has served as a consultant and journalist who studies end-user management
practices and industry trending in the data and information management fields.

From 2004 to 2013 he was the editorial director at Information Management magazine
(formerly DM Review), where he created architectures for user and
industry coverage for hundreds of contributors across the breadth of
the data and information management industry.

As lead writer he interviewed and profiled more than 100 CIOs,


CTOs, and program directors in a program called “25 Top
Information Managers.” His related feature articles earned ASBPE
national bronze and multiple Mid-Atlantic region gold and silver
awards for Technical Article and for Case History feature writing.

A panelist, interviewer, blogger, community liaison, conference co-chair, and speaker in


the data-management community, he also sponsored and co-hosted a weekly podcast
in continuous production for more than five years.

Jim’s earlier background as senior morning news producer at NBC/Mutual Radio


Networks and as managing editor of MSNBC’s first Washington, D.C. online news
bureau cemented his understanding of fact-finding, topical reporting, and serving broad
audiences.

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 9
Data Preparation Market Study 2021

The Dresner Team


About Elizabeth Espinoza
Elizabeth is Research Director at Dresner Advisory and is responsible for the data
preparation, analysis, and creation of charts for Dresner Advisory reports.

About Kathleen Goolsby


Kathleen is Senior Editor at Dresner Advisory ensuring the quality and consistency of all
research publications.

About Danielle Guinebertiere


Danielle is the Director of Client Services at Dresner Advisory. She supports the
ongoing research process through her work with executives at companies included in
Dresner market reports.

About Michelle Whitson-Lorenzi


Michelle is Client Services Manager and is responsible for managing software company
survey activity and our internal market research data.

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 10
Data Preparation Market Study 2021

Survey Method and Data Collection


As with all our Wisdom of Crowds® Market Studies, we constructed a survey instrument
to collect data and used social media and crowdsourcing techniques to recruit
participants.

We include our research community of over 6,000 organizations as well as


crowdsourcing and vendors’ customer communities.

Data Quality
We carefully scrutinized and verified all respondent entries to ensure that only qualified
participants are included in the study.

Findings and Analysis


In this 2021 report, we present the deliverables for our Data Preparation Market Study
based upon data collection from July through October 2020.

Focus of Research
In this study, we address key data preparation issues including:

 Perceptions and intentions surrounding data preparation


 End-user requirements and features:
o Usability features
o Integration features
o Manipulation features
o Output options
o Deployment options
 Industry support for data preparation
 User requirements versus industry capabilities
 Vendor ratings

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 11
Data Preparation Market Study 2021

Executive Summary
- Data preparation ranks 5th among technologies and initiatives strategic to
business intelligence. Sixty-three percent say data preparation is critical or very
important. Importance declines slowly over time (p. 18-24). Industry importance
scores are flat over the last three years (p. 80)
- Almost three quarters say their current data-preparation approach is “highly” or
“somewhat” effective (p. 25-31). Industry support for usability is far ahead of user
requirements (p. 81).
- Sixty-five percent of respondents "constantly" or "frequently" make use of data
preparation. Year-over-year frequency increases slightly (p. 32-37).
- Thirty-one percent of respondents "constantly" or "frequently" enrich data
preparation with third-party data; 69 percent do so “occasionally,” rarely,” or
“never” (p. 38-43).
- There is strong interest in many data-preparation usability features, led by
“save/preview” and “automated detection of anomalies” (p. 44-49).
- Demand for data-preparation integration features is very strong, led by "ability to
combine data across multiple data sets and sources," "access to file formats,"
and “access to traditional databases” (p. 50-55). Robust industry support
answers all current user expectations for integration features (p. 82).
- Interest in data manipulation features is high; at least five are considered “very
important” to the majority of users (p. 56-61). Industry support for manipulation
features easily accommodates top user priorities (p. 84).
- Among support for outputs, “Excel” and “traditional database” are most required
by users (p. 62-67). Industry support is strong and addresses growing uptake of
Azure, Amazon Redshift, Google BigQuery, and other outputs (p. 83).
- Data-preparation deployment features for scheduling, monitoring, and processing
are of importance to users (p. 68-73). Vendors expect future investment to keep
feature availability well ahead of user requirements (p. 85).
- Respondents most prefer on-premises deployment, followed by private cloud and
public cloud. Over time, interest shifts to public cloud (p. 74-79). Industry support
is strong for cloud and on-premises deployment and shifting to cloud (p. 86-87).
- Data-preparation vendor rankings are shown on p. 88.

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 12
Data Preparation Market Study 2021

Study Demographics
Our sample includes a cross-section of data across geographies, functions,
organization sizes, and vertical industries. We believe that, unlike other industry
research, we offer a more characteristic sample and better indicator of true market
dynamics.

Geography
Survey respondents represent a mix of global geographies. Forty-eight percent
represent North America (including five Canadian provinces and the majority of U.S.
states). Thirty-one percent work in EMEA; the remainder represent Asia Pacific and
Latin America (fig. 1).

Geographies Represented
60%

50% 48.0%

40%

31.4%
30%

20%
13.7%

10% 7.0%

0%
North America Europe, Middle East and Asia Pacific Latin America
Africa
Figure 1 – Geographies represented

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 13
Data Preparation Market Study 2021

Functions
Information Technology accounts for the largest group of respondents by function (37
percent). About 19 percent come from the Business Intelligence Competency Center
(BICC). Executive Management and R&D are the next most represented (fig. 2).

Tabulating results by function enables us to compare and contrast the plans and
priorities of different departments within organizations.

Functions Represented
40% 37.3%

35%

30%

25%

19.2%
20%

15%
10.7%
10% 7.8%
6.6% 7.0%

5% 4.1% 3.3% 2.6%

0%

Figure 2 – Functions represented

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 14
Data Preparation Market Study 2021

Vertical Industries
Survey participants represent a wide range of vertical industries, led by Business
Services (about 21 percent), Financial Services (17 percent), and Manufacturing (16
percent) (fig. 3). Technology, Consumer Services, and Healthcare are the next most
represented.

Industries Represented
25%

20.7%
20%

16.6%
15.9%
14.8%
15%

10%
8.1%
7.0%
5.9%
5% 4.1%
3.0% 2.6%

0%

Figure 3 – Industries represented

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 15
Data Preparation Market Study 2021

Organization Size
Our survey sample includes a mix of small, medium, and large organizations (fig. 4). In
2020, small organizations (1-100 employees) account for about 25 percent of the
sample, and mid-sized organizations (101-1,001 employees) account for 26 percent of
the sample. Large organizations (>1,000 employees) account for the remaining 48
percent of respondents, with very large organizations (>10,000 employees) accounting
for 23 percent.

Segmenting respondents by organization size helps us identify differences in behavior,


attitudes, and planning often related to headcount.

Organization Sizes Represented


27%

26% 25.8%

26%

25%
24.7%

25% 24.4%

24%

24% 23.3%

23%

23%

22%

22%
1-100 101-1,000 1,001-10,000 More than 10,000

Figure 4 – Organization sizes represented

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 16
Data Preparation Market Study 2021

Analysis of Findings
In 2021, our seventh annual Data Preparation Market Study examines the nature of
data preparation, exploring user sentiment and perceptions, the nature of current
implementations, and plans for the future.

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 17
Data Preparation Market Study 2021

Importance of Data Preparation


Among technologies and initiatives strategic to business intelligence, data preparation
ranks 5th, among 41 topics we study (fig. 5). This is a steep increase from the previous
year when it ranked 19th out of 37. Data preparation importance now enters the top five,
which includes Reporting, Dashboards, Data Integration, and Data Warehousing.
Though we measure 6th-ranked “end-user self-service” as a separate category, data
preparation falls into and underscores the value of user empowerment and self-service
generally.

Technologies and Initiatives Strategic to Business Intelligence


Reporting
Dashboards
Data Integration
Data Warehousing
Data Preparation and Blending
End-User "Self-Service"
Advanced Visualization
Data Discovery
Data Storytelling
Cloud (Software-as-a-Service)
Enterprise Planning / Budgeting
Governance
GDPR (General Data Protection Regulation)
Integration with Operational Processes
Data Catalog
Mobile Device Support
Marketing Analytics
Embedded BI (contained within an application, portal, etc.)
Sales Planning
Machine Learning, Data Mining, Advanced Algorithms, Predictive
Collaborative Support for Group-based Analysis
Ability to Write to Transactional Applications
IT Analytics
Location Intelligence / Analytics
In-Memory Analysis
Big Data (e.g., Hadoop)
Cognitive BI (e.g., Artificial Intelligence-based BI)
Search-based Interface
HCM / People Analytics
Text Analytics
Streaming Data Analysis
Natural Language Analytics (natural language query/ natural…
Prepackaged Vertical / Functional Analytical Applications
Open Source Software
Social Media Analysis (Social BI)
Internet of Things (IoT)
Complex Event Processing (CEP)
Edge Computing
Robotic Process Automation (RPA) & Analysis
Video Analytics
Voice Analytics

0% 20% 40% 60% 80% 100%


Critical Very Important Important Somewhat Important Not Important

Figure 5 – Technologies and initiatives strategic to business intelligence

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 18
Data Preparation Market Study 2021

Our latest study sample reports very high perceived importance of data preparation,
which in turn reflects the ongoing importance of self-service business intelligence and
user autonomy (fig. 6). Sixty-three percent of all respondents say data preparation is
either critical or very important. About 82 percent of respondents say data preparation
is, at minimum, important. Just 7 percent say data preparation is “not important.”

Importance of Data Preparation


35%
32.5%

29.5%
30%

25%

20% 19.2%

15%
11.8%

10%
7.0%

5%

0%
Critical Very Important Important Somewhat Not Important
Important

Figure 6 – Importance of data preparation

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 19
Data Preparation Market Study 2021

Across seven years of data, respondents’ perceived importance of data preparation


remains consistently high with weighted-mean values between 3.6 and 4.0 (from well
above “important” to "very important”) (fig. 7). Year-over-year weighted-mean sentiment
reverses a downtrend from a peak of 3.97 in 2018 and a low of 3.61 in 2020, to 3.66 in
our 2021 study sample. Estimations of "critical" importance stand at 30 percent in 2021,
down from the 39 percent high in 2018, but steady to recovering in the three years
since. Eighty-one percent of respondents say data preparation is, at minimum,
"important" in 2021, down from 87 percent in 2019 but steady year over year.

Importance of Data Preparation


2015-2021
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
2015 2016 2017 2018 2019 2020 2021

Critical Very important Important Somewhat important Not important Mean

Figure 7 – Importance of data preparation 2015-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 20
Data Preparation Market Study 2021

Respondents in Marketing/Sales report the highest weighted-mean importance scores


for data preparation (4.45, between “very important” and “critical”). This is among the
higher criticality scores by department we scored in our studies and is notable for the
front-office importance of data preparation. Strategic Planning respondents also assign
greater than “very important” criticality to data preparation (4.22) (fig. 8). Importance
thereafter falls to 3.75 among BICC respondents and only slightly lower among
respondents in R&D, IT, and Executive Management before falling to 3.3 in Finance,
which is still above the level of 3.0, or “important.”

Importance of Data Preparation


by Function
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
Marketing & Strategic BICC R&D Information Executive Finance
Sales Planning Technology (IT) Management
Function

Critical Very Important Important


Somewhat Important Not Important Weighted Mean

Figure 8 – Importance of data preparation by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 21
Data Preparation Market Study 2021

The mean perceived importance of data preparation varies little by weighted mean
across all geographic regions in our 2021 study, with importance rankings between 3.6
and 3.8 (near “very important”) (fig. 9). Asia Pacific notably posts the greatest number of
“critical” importance scores (38 percent), compared to 31 percent in EMEA, 28 percent
in North America and 16 percent in Latin America. Skepticism is low across all regions
(fewer than 6 percent “not important” scores) except EMEA, where 12 percent say data
preparation is “not important” and another 12 percent say it is only “somewhat
important.”

Importance of Data Preparation


by Geography
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
Asia Pacific Latin America North America Europe, Middle East and
Africa

Critical Very Important Important


Somewhat Important Not Important Weighted Mean

Figure 9 – Importance of data preparation by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 22
Data Preparation Market Study 2021

The importance of data preparation clearly increases with organization headcount (fig.
10). This effect is most pronounced in very large organizations (>10,000 employees),
where mean importance rises to 4.0, compared to 3.5-3.6 for all smaller peers. Thus,
overall scores are healthy across all organizations and range from between “important”
to “very important,” to “very important” in the largest organizations. The percentage of
“critical” importance scores also increases visibly with organization size.

Importance of Data Preparation


by Organization Size
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
1-100 101-1,000 1,001-10,000 More than 10,000

Critical Very Important Important


Somewhat Important Not Important Weighted Mean

Figure 10 – Importance of data preparation by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 23
Data Preparation Market Study 2021

The importance of data preparation varies from “important” to “very important” across
different industries in our 2021 study (fig. 11). This year, the strongest weighted-mean
sentiment is among respondents in Consumer Services (4.0), Government (3.9),
Healthcare (3.8), and Business Services (3.8). Importance thereafter declines slightly in
Financial Services (3.7), Technology (3.6), and Manufacturing (3.6). The lowest
importance is among respondents in Retail/Wholesale (3.1) and Higher Education (3.3).

Importance of Data Preparation


by Industry
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0

Critical Very Important Important


Somewhat Important Not Important Weighted Mean

Figure 11 – Importance of data preparation by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 24
Data Preparation Market Study 2021

Effectiveness of Current Approach to Data Preparation


Nearly half of organizations (49 percent) say their current approach to data preparation
is "somewhat effective" (fig. 12). Combined with those that report "highly effective" data
preparation, the total figure nears three-quarters (74 percent) of respondents. The
remaining respondents, about 26 percent, report only a "somewhat ineffective" or
"totally ineffective" approach to data preparation. This overall response implies very
positive interactions and growing maturity of data preparation, likely in the context of
increasing self-service and user autonomy.

Current Approach to Data Preparation


60%

49.6%
50%

40%

30%
25.0%

20.9%
20%

10%
4.5%

0%
Highly Effective Somewhat Effective Somewhat Ineffective Totally Ineffective

Figure 12 – Current approach to data preparation

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 25
Data Preparation Market Study 2021

The effectiveness of the current approach to data preparation is largely steady across
seven years of data, particularly in the years 2017-2021 (fig. 13). In the current-year
study, the current approach to data preparation reaches an all-time weighted-mean high
of 2.95 (“somewhat effective”), up from 2.91 in 2020. Also, the number of "highly
effective" responses grows over the history of the survey and reaches an all-time high of
25 percent in 2021, though combined “highly effective” and “somewhat effective” scores
are slightly below a 2019 peak. We believe this steady effectiveness reflects satisfaction
but leaves considerable room for improvement in the opinions of data-preparation
users.

Current Approach to Data Preparation


2015-2021
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
2015 2016 2017 2018 2019 2020 2021

Highly effective Somewhat effective Somewhat ineffective


Totally ineffective Weighted Mean

Figure 13 – Current approach to data preparation 2015-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 26
Data Preparation Market Study 2021

Weighted-mean effectiveness of the current approach to data preparation varies only


slightly (2.8-3.1) across functions in 2021 (fig. 14). BICC respondents, which we might
expect to be most proficient with data preparation tools, notably post the highest
number of “highly effective” scores (33 percent). Finance, which we would expect to be
proficient with data manipulation, posts the second-highest number of “highly effective”
scores (24 percent). Across extended populations of data preparation users,
respondents in BICC, Marketing/Sales, and R&D are equally likely (about 80 percent),
to say the current approach is either “highly effective” or “somewhat effective.”

Current Approach to Data Preparation


by Function
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
BICC Marketing & Executive R&D Information Finance
Sales Management Technology (IT)

Highly Effective Somewhat Effective Somewhat Ineffective


Totally Ineffective Weighted Mean

Figure 14 – Current approach to data preparation by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 27
Data Preparation Market Study 2021

Though individual perceptions vary considerably, weighted-mean satisfaction with data-


preparation approaches is very consistent (between 2.9 and 3.1) across geographies in
2021 (fig. 15). We observe that Asia-Pacific respondents are considerably more likely to
report a “highly effective” approach (35 percent), and the lowest “highly effective” scores
come from EMEA (22 percent). EMEA also reports the highest concentration of
“somewhat ineffective” and “totally ineffective” satisfaction levels (30 percent).

Current Approach to Data Preparation


by Geography
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
Asia Pacific North America Latin America Europe, Middle East and
Africa

Highly Effective Somewhat Effective Somewhat Ineffective


Totally Ineffective Weighted Mean

Figure 15 – Current approach to data preparation by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 28
Data Preparation Market Study 2021

The perceived effectiveness of data preparation in 2021 is very steady (weighted mean
2.9-3.0) across organizations of different sizes (fig. 16). Within these findings, however,
we find small organizations (1-100 employees) and very large organizations (> 10,000
employees) report slightly higher measures of data-preparation effectiveness. Also, very
large organizations report the highest combined percentage of “highly effective” and
“somewhat effective” data preparation satisfaction (80 percent). By these measures, we
can say effectiveness increases in the largest organizations, but by a smaller margin
than in earlier years of our study.

Current Approach to Data Preparation


by Organization Size
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
1-100 101-1,000 1,001-10,000 More than 10,000

Highly Effective Somewhat Effective Somewhat Ineffective


Totally Ineffective Weighted Mean

Figure 16 – Current approach to data preparation by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 29
Data Preparation Market Study 2021

By industry, the perceived weighted-mean effectiveness of data preparation ranges


from a high of 3.21 in Healthcare to a low of 2.86 in Manufacturing (fig. 17). Estimations
of combined “highly effective” and “somewhat effective” data-preparation approaches
are highest in Healthcare (90 percent) and Education (82 percent), while “highly
effective” scores are highest in Retail/Wholesale (35 percent) and Healthcare (32
percent). Conversely, the highest combined “somewhat ineffective” and “totally
ineffective” scores are in Retail/Wholesale (41 percent), followed by Manufacturing and
Education (30 percent each).

Current Approach to Data Preparation


by Industry
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0

Highly Effective Somewhat Effective Somewhat Ineffective


Totally Ineffective Weighted Mean

Figure 17 – Current approach to data preparation by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 30
Data Preparation Market Study 2021

Success with data preparation mildly correlates with success with business intelligence
(fig. 18). This is best seen in measures of “highly effective” data preparation, which
declines among organizations that are “somewhat successful” (25 percent), and
“unsuccessful and somewhat unsuccessful” (17 percent).

Current Approach to Data Preparation


by Success with BI
100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%
Successful Somewhat Successful Unsuccessful & Somewhat
Unsuccessful

Highly Effective Somewhat Effective Somewhat Ineffective Totally Ineffective

Figure 18 – Current approach to data preparation by success with BI

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 31
Data Preparation Market Study 2021

Frequency of Data Preparation


Sixty-five percent of respondents say they "constantly" or "frequently" make use of data
preparation in 2021 (up 3 percent from 2020) (fig. 19). We cannot distinguish whether
end-user efforts are unique or repeated manipulations; but overall usage of data
preparation appears to be high, with a total of 88 percent reporting at least "occasional"
data-preparation activity. Only about 13 percent of respondents "rarely" or "never"
perform data preparation (down from 16 percent in 2020).

Frequency of Data Preparation


Never, 2.7%

Rarely, 10.2%

Constantly, 23.9%

Occasionally, 22.7%

Frequently, 40.5%

Figure 19 – Frequency of data preparation

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 32
Data Preparation Market Study 2021

Across the seven years of our focused data-preparation study, respondents report
weighted-mean frequency of use in a band between 3.6 and 3.9 (fig. 20). This includes
a high mark of 3.9 in 2017 to the current-year mark of 3.7, which is slightly up from
2020. Likewise, “constant” users improve to 24 percent in 2021 compared to 21 percent
in our 2020 study. The two lowest levels of use also improve in our 2021 study
compared to the year prior. It is possible, but not certain, that developments including
pre-formatting and automation play a part in frequency of data preparation over time.

Frequency of Data Preparation


2015-2021
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
2015 2016 2017 2018 2019 2020 2021

Constantly Frequently Occasionally Rarely Never Mean

Figure 20 – Frequency of data preparation 2015-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 33
Data Preparation Market Study 2021

The frequency of data preparation is robust across multiple functions, from a weighted-
mean high of 4.3 in Strategic Planning to a low of 3.5 in Finance (fig. 21).
Marketing/Sales is the next most frequent data-preparation user group by function and,
with Sales and Planning, accounts for the most numerous combined “constant” and
“frequent” users. A second tier of Executive Management, R&D, and BICC respondents
are the next most likely users, with about 70 percent either “constant” or “frequent
users. Finance, the least frequent user group by function, nonetheless accounts for
close to 90 percent who are, at minimum, “occasional” users of data preparation.

Frequency of Data Preparation


by Function
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
Strategic Marketing & Executive R&D BICC Information Finance
Planning Sales Management Technology (IT)
Function

Constantly Frequently Occasionally Rarely Never Weighted Mean

Figure 21 – Frequency of data preparation by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 34
Data Preparation Market Study 2021

Weighted-mean scores for frequency of data preparation are highest among Asia-
Pacific respondents (3.9), followed by North America (3.7), EMEA (3.7), and Latin
America (3.6) (fig. 22). Combined “constant” and “frequent” users are also most
common in Asia Pacific. North America trails all regions by this measure due to larger
numbers of “occasional” or “rarely” responses. Again, frequency is robust across
geographies, with between 84-95 percent of all respondents at least “occasional” users.

Frequency of Data Preparation


by Geography
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
Asia Pacific North America Europe, Middle East and Latin America
Africa

Constantly Frequently Occasionally Rarely Never Weighted Mean

Figure 22 – Frequency of data preparation by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 35
Data Preparation Market Study 2021

Mean frequency of data preparation in 2021 spikes notably in large organizations of


more than 10,000 employees (fig. 23). Very large organizations account for the largest
group of “critical” (35 percent) and combined “critical” and “frequent” users (80 percent),
a full 20 points ahead of small, mid-sized, and large peer organizations. Weighted-mean
frequency (4.0) is also well ahead of the 3.6-3.7 range of all smaller peers. These high
weighted-mean scores also reflect that at least 80 percent or well more of respondents
from all organizations are, at minimum, “occasional” users of data preparation.

Frequency of Data Preparation


by Organization Size
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
1-100 101-1,000 1,001-10,000 More than 10,000

Constantly Frequently Occasionally Rarely Never Weighted Mean

Figure 23 – Frequency of data preparation by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 36
Data Preparation Market Study 2021

Weighted-mean data preparation frequency is highest in Business Services and


Consumer Services organizations (3.9) and lowest in Retail/Wholesale (3.3) (fig. 24).
Government and Consumer Services organizations report the highest combined
“constant” and “frequent” use (76-77 percent), followed by Business Services and
Healthcare. The highest number of “constant” users” is in Business Services (34
percent).

Frequency of Data Preparation


by Industry
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0

Constantly Frequently Occasionally Rarely Never Weighted Mean

Figure 24 – Frequency of data preparation by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 37
Data Preparation Market Study 2021

Frequency of Data-Preparation Enrichment with Third-Party Data


Thirty-one percent of respondents indicate they "constantly" or "frequently" enrich data
preparation with third-party data (fig. 25). (This finding is up from 26 percent in 2020 and
24 percent in 2019.) "Frequent," "occasional," or “rarely” are the most likely level of
activity with third-party data (69 percent). This might suggest that the select 7.2 percent
of "constant" users of non-proprietary data are in unique roles and industries with
specific use cases.

Frequency of Data-Preparation Enrichment with


Third-Party Data
35%

30% 29.1%

25.7%
25% 23.8%

20%

15% 14.3%

10%
7.2%

5%

0%
Constantly Frequently Occasionally Rarely Never

Figure 25 – Frequency of data-preparation enrichment with third-party data

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 38
Data Preparation Market Study 2021

Between 2015 and 2021, respondents report a fairly constrained range of frequency of
third-party data use in conjunction with data preparation (fig. 26). By weighted mean,
third-party data enrichment remains in a narrow band between 2.74-2.94. Year-over-
year third-party weighted-mean frequency ticks up from 2.77 to 2.84 in 2021 but is lower
than the peak of 2.94 in 2018. We might well expect that third-party data enrichment
use will grow over time, though the latest three-year trend shows small increases,
indicating that organizations most often continue to grapple with internal data.

Frequency of Data-Preparation Enrichment with


Third-Party Data 2015-2021
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
2015 2016 2017 2018 2019 2020 2021

Constantly Frequently Occasionally Rarely Never Mean

Figure 26 – Frequency of data-preparation enrichment with third-party data 2015-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 39
Data Preparation Market Study 2021

By function, Marketing/Sales is by far most likely to enrich data preparation with third-
party data in 2021 (fig. 27). Combined “constant” and “frequent” users account for 63
percent of Marketing/Sales respondents, compared to the 33-38 percent of similar users
in Strategic Planning, Executive Management, Finance, and the BICC. Weighted-mean
frequency of third-party enrichment data is also by far highest in Marketing/Sales (3.73),
particularly when compared to the lowest comparisons in R&D (2.47) and IT (2.66).

Frequency of Data-Preparation Enrichment with


Third-Party Data by Function
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
Marketing & Strategic BICC Executive Finance Information R&D
Sales Planning Management Technology (IT)
Function

Constantly Frequently Occasionally Rarely Never Weighted Mean

Figure 27 – Frequency of data-preparation enrichment with third-party data by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 40
Data Preparation Market Study 2021

Interest in third-party data enrichment of data preparation is greatest in Asia Pacific and
Latin America, with the latter accounting for the most combined “constant” and
“frequent” users (42 percent) (fig. 28). By comparison, EMEA respondents report the
fewest combined “constant” and “frequent” third-party data users (27 percent), followed
by North America (31 percent). Close to half of EMEA respondents “rarely” or “never”
use third-party data-preparation enrichment.

Frequency of Data-Preparation Enrichment with


Third-Party Data by Geography
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
Asia Pacific Latin America North America Europe, Middle East and
Africa

Constantly Frequently Occasionally Rarely Never Weighted Mean

Figure 28 – Frequency of data-preparation enrichment with third-party data by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 41
Data Preparation Market Study 2021

As we observed in earlier measures, the frequency of data preparation with third-party


data jumps noticeably at very large organizations with more than 10,000 employees (fig.
29). These very large organizations have the most “constant,” users (13 percent),
combined “constant” and “frequent” users (40 percent), and the greatest number of at
least “occasional” users (69 percent) among all enterprises. Even so, the use of third-
party enrichment is at least “occasionally” the case at all smaller organizations between
55-60 percent of the time.

Frequency of Data-Preparation Enrichment with


Third-Party Data by Organization Size
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0
1-100 101-1,000 1,001-10,000 More than 10,000

Constantly Frequently Occasionally Rarely Never Weighted Mean

Figure 29 – Frequency of data-preparation enrichment with third-party data by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 42
Data Preparation Market Study 2021

In 2021, Higher Education industry respondents are the most likely users of third-party
data enrichment in data preparation by weighted mean (fig. 30). This group also claims
greater than 90 percent at least “occasional” users. Healthcare respondents are the
next most likely users; like Government, Business Services, and Technology industry
respondents, Healthcare respondents report the greatest select concentration of
“constant” users of third-party enrichment. Excluding Manufacturing, at least 60 percent
or far more of respondents in all industries are, at minimum, “occasional” users of third-
party enrichment data.

Frequency of Data-Preparation Enrichment with


Third-Party Data by Industry
100% 5

90% 4.5

80% 4

70% 3.5

60% 3

50% 2.5

40% 2

30% 1.5

20% 1

10% 0.5

0% 0

Constantly Frequently Occasionally Rarely Never Weighted Mean

Figure 30 – Frequency of data-preparation enrichment with third-party data by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 43
Data Preparation Market Study 2021

Data-Preparation Usability Features


There is strong interest across 14 sampled data-preparation usability features, 12 of
which are at least "important" to about half or far more of respondents (fig. 31). We
believe this reflects good understanding of needs and high expectations for basic to
advanced data-preparation features. The most important of these features is utilitarian
save-and-preview capabilities, closely followed by automated detection and immediate
preview and feedback. We note that machine learning, highly touted by the vendor
community, is currently the least-required usability feature for data preparation.

Data-Preparation Usability Features


Ability to save preparation steps and apply to similar datasets

Automated detection of anomalies, outliers, and duplicates

Immediate preview and feedback for end user

Visual interface for users to view and explore in-process data…

Automated recommendations for data relationships and keys…

Mask or redact sensitive data

Visual highlighting of relationships between columns, attributes…

Support for entire data transformation process in a single…

Collaboration capabilities/integration

Automatically generate data transformation code/scripts for…

View/audit all prep tasks (and changes) and annotate for…

Technical expertise/programming is *NOT* required to…

Less than two-second response time for design features

Machine learning and recommendations based on usage data…

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very Important Important Somewhat Important Not Important

Figure 31 – Data-preparation usability features

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 44
Data Preparation Market Study 2021

Across the last six years of our study, attitudes toward data-preparation features
fluctuate somewhat by rank and ebb and flow unevenly over time (fig. 32). In our 2021
study, "automated detection of anomalies" retains the top ranking, followed by
"immediate preview and feedback" and "visual interface.” While “automated detection”
gathers importance in our latest study, the second and third choices lost user interest
compared to earlier years. Many other usability features also decline in long-term
importance but were at higher levels than in 2020. These include “automated
recommendations,” “visual highlighting,” “support for entire data transformation
process,” and more. "Machine learning" actually loses a small amount of momentum
compared to our 2020 study.

Data-Preparation Usability Features 2017-2021


Automated detection of
anomalies, outliers, and
duplicates
4.5
Machine learning and
recommendations based on 4 Immediate preview and
usage data gathered across 3.5 feedback for end user
users, groups, or organizations 3
2.5
Technical 2 Visual interface for users to
expertise/programming is 1.5 view and explore in-process
*NOT* required to data sets, interactively profile
build/execute data 1 and refine data
transformation scripts 0.5 transformations prior to…

Automated recommendations
Automatically generate data
for data relationships and keys
transformation code/scripts
for combining data across
for execution
multiple data sets and sources
Support for entire data Visual highlighting of
transformation process in a relationships between
single application/user columns, attributes and
interface datasets

2017 2018 2019 2020 2021

Figure 32 – Data-preparation usability features 2017-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 45
Data Preparation Market Study 2021

By function, peak interest in usability features varies rather widely by function (fig. 33).
In our latest sample, respondents in Strategic Planning lead with “ability to save
preparation steps” and report an outsized interest in several features including
“immediate preview and feedback,” “automated recommendations,” “visual highlighting,”
“collaboration capabilities,” and “technical expertise not required.” R&D respondents
more narrowly give high marks to “automated detection of anomalies.” BICC
respondents give the top scores to “mask or redact sensitive data” and narrowly to
“machine learning and recommendations.”

Data-Preparation Usability Features by Function


Ability to save preparation steps
and apply to similar datasets
Machine learning and Automated detection of
4.5
recommendations based on anomalies, outliers, and
usage data gathered across… 4 duplicates
3.5
Less than two-second response Immediate preview and feedback
3
time for design features for end user
2.5
2
Technical expertise/programming 1.5 Visual interface for users to view
is *NOT* required to 1 and explore in-process data sets,
build/execute data… interactively profile and refine…
0.5
0
View/audit all prep tasks (and Automated recommendations for
changes) and annotate for data relationships and keys for
knowledge-sharing combining data across multiple…

Support for entire data


Collaboration
transformation process in a single
capabilities/integration
application/user interface
Automatically generate data Visual highlighting of relationships
transformation code/scripts for between columns, attributes and
execution datasets
Mask or redact sensitive data

Strategic Planning Function BICC


Executive Management Marketing & Sales
R&D Finance
Information Technology (IT)

Figure 33 – Data-preparation usability features by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 46
Data Preparation Market Study 2021

Interest in data-preparation features varies by geographical region and is most often


highest among respondents in Asia Pacific or Latin America (fig. 34). Areas of clustered
common interest across all regions include “ability to save preparation steps,” “visual
highlighting,” and “support for entire data transformation process.” Asia-Pacific interest
is notably highest in “ability to save preparation steps,” “immediate preview and
feedback,” “automatically generate data transformation,” and “technical expertise not
required.” Most often, North America and EMEA rank third and fourth respectively in
usability feature interest.

Data-Preparation Usability Features by


Geography
Ability to save preparation steps
and apply to similar datasets
Machine learning and 4.5 Automated detection of
recommendations based on anomalies, outliers, and
usage data gathered across… 4 duplicates
3.5
Less than two-second response 3 Immediate preview and feedback
time for design features for end user
2.5
2
Technical expertise/programming 1.5 Visual interface for users to view
is *NOT* required to 1 and explore in-process data sets,
build/execute data… interactively profile and refine…
0.5
0
View/audit all prep tasks (and Automated recommendations for
changes) and annotate for data relationships and keys for
knowledge-sharing combining data across multiple…

Automatically generate data


transformation code/scripts for Mask or redact sensitive data
execution
Visual highlighting of relationships
Collaboration
between columns, attributes and
capabilities/integration
datasets
Support for entire data
transformation process in a single
application/user interface
Asia Pacific Latin America North America Europe, Middle East and Africa

Figure 34 – Data-preparation usability features by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 47
Data Preparation Market Study 2021

Compared to other measures, interest in data-preparation usability features most often


clusters across organizations of different sizes (fig. 35). Features that skew most
according to organization size include “mask or redact sensitive data” and “support for
entire data transformation process.” Interest in usability features increases with
organization headcount. Very large organizations (> 10,000 employees) lead interest in
every usability feature category we sample except the three that get top scores from
large organizations (1,001-10,000 employees).

Data-Preparation Usability Features


by Organization Size
Ability to save preparation steps
and apply to similar datasets
Machine learning and 4 Automated detection of
recommendations based on anomalies, outliers, and
usage data gathered across… 3.5 duplicates
3
Less than two-second response Immediate preview and feedback
time for design features 2.5 for end user
2
1.5
Technical expertise/programming Visual interface for users to view
is *NOT* required to 1 and explore in-process data sets,
build/execute data… 0.5 interactively profile and refine…

0
View/audit all prep tasks (and Automated recommendations for
changes) and annotate for data relationships and keys for
knowledge-sharing combining data across multiple…

Automatically generate data


transformation code/scripts for Mask or redact sensitive data
execution
Visual highlighting of relationships
Collaboration
between columns, attributes and
capabilities/integration
datasets
Support for entire data
transformation process in a single
application/user interface
1-100 101-1,000 1,001-10,000 More than 10,000

Figure 35 – Data-preparation usability features by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 48
Data Preparation Market Study 2021

Interest in data-preparation usability features varies widely by industry in our 2021 study
(fig. 36). For example, Consumer Services respondents give high marks to “ability to
save preparation steps” and “collaboration,” while Business Services most prefers
“automated detection of anomalies.” Healthcare gives the top score to “immediate
preview and feedback.” Education respondents lead interest in “visual interface,” and
Government leads “visual highlighting of relationships” and “support for entire data
transformation process.” Financial Services gives the top score for “mask or redact
sensitive data.”

Data-Preparation Usability Features


by Industry
Ability to save preparation steps and
Machine learning and apply to similar datasets
recommendations based on usage 4.5 Automated detection of anomalies,
data gathered across users, groups, 4 outliers, and duplicates
or organizations 3.5
Less than two-second response time Immediate preview and feedback for
3
for design features end user
2.5
2
Visual interface for users to view and
Technical expertise/programming is 1.5 explore in-process data sets,
*NOT* required to build/execute 1 interactively profile and refine data
data transformation scripts 0.5 transformations prior to execution
0
Automated recommendations for
View/audit all prep tasks (and
data relationships and keys for
changes) and annotate for
combining data across multiple data
knowledge-sharing
sets and sources

Automatically generate data Visual highlighting of relationships


transformation code/scripts for between columns, attributes and
execution datasets
Collaboration
Mask or redact sensitive data
capabilities/integration
Support for entire data
transformation process in a single
application/user interface

Business Services Healthcare Government


Financial Services Consumer Services Retail and Wholesale
Manufacturing Technology Education

Figure 36 – Data-preparation usability features by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 49
Data Preparation Market Study 2021

Data-Preparation Data-Integration Features


Demand for data-preparation data-integration features is selectively very strong across
eight different choices we sampled in our 2021 study (fig. 37). The top three features,
"ability to combine data across multiple data sets and sources," "access to file formats,"
and “access to traditional databases,” are "critical" to between 48-54 percent of
respondents and at least “very important” to between 70-82 percent of our sample.
“Ability to access semi-structured data” is also high on the list, with more than 60
percent “critical” or “very important” scores. We also note that "access to big data" and
“access to NoSQL sources” remain the least important data-preparation integration
features among respondents.

Data-Preparation Data-Integration Features


Ability to combine data across multiple data sets
and sources through joins and merging data

Access to flat file formats (e.g.,log files, CSV, Excel)

Access to traditional databases (e.g.,RDBMS)

Ability to access semi-structured data (e.g., XML,


JSON, HTML, PDF)

Ability to infer metadata by introspecting the data


elements

Ability to extract data from documents

Access to NoSQL sources

Access to big data (e.g., Hadoop)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very Important Important Somewhat Important Not Important

Figure 37 – Data-preparation data-integration features

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 50
Data Preparation Market Study 2021

Peak interest in the top two data-integration features is mostly sustained in the last five
years of our seven-year focused study (fig. 38). In our latest sample, “ability to combine
data” and “ability to access semi-structured data” are at peak interest in 2021 compared
to previous years. Scores for “access to traditional databases,” “ability to infer
metadata,” and “ability to extract data from documents” are below historic highs.
“Access to big data” falls below the level of “important” in 2021, a historic low for our
study.

Data-Preparation Data-Integration Features


2017-2020
Ability to combine data across
multiple data sets and sources
through joins and merging data
5
4.5
4 Access to flat file formats (e.g.,log
Access to big data (e.g., Hadoop)
3.5 files, CSV, Excel)
3
2.5
2
1.5
1
0.5
Access to traditional databases
Access to NoSQL sources 0 (e.g.,RDBMS)

Ability to extract data from Ability to access semi-structured


documents data (e.g., XML, JSON, HTML, PDF)

Ability to infer metadata by


introspecting the data elements

2017 2018 2019 2020 2021

Figure 38 – Data-preparation data-integration features 2017-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 51
Data Preparation Market Study 2021

Interest in data-preparation data-integration features is high (above “very important”


levels) and often varies by function in 2021 (fig. 39). “Ability to combine data across
multiple data sets” is of greatest interest to BICC and Strategic Planning respondents.
Marketing/Sales gives top scores to “access to flat file formats” and “ability to access
semi-structured data.” “Access to file formats” and, to a lesser extent, “access to
NoSQL” and “access to big data” are mostly in the purview of R&D. IT respondents lead
“access to traditional databases,” while Strategic Planning gives high scores to “ability
to infer metadata.” Finance is most interested in “ability to extract data from documents.”

Data-Preparation Data-Integration Features


by Function
Ability to combine data across multiple
data sets and sources through joins
and merging data
5
4.5
4 Access to flat file formats (e.g.,log files,
Access to big data (e.g., Hadoop)
3.5 CSV, Excel)
3
2.5
2
1.5
1
0.5 Access to traditional databases
Access to NoSQL sources 0 (e.g.,RDBMS)

Ability to access semi-structured data


Ability to extract data from documents
(e.g., XML, JSON, HTML, PDF)

Ability to infer metadata by


introspecting the data elements

Marketing & Sales Strategic Planning Function R&D


BICC Information Technology (IT) Executive Management
Finance

Figure 39 – Data-preparation data-integration features by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 52
Data Preparation Market Study 2021

Viewed by geographic region, interest in data-preparation data-integration features is


most often highest among respondents in EMEA and Asia Pacific (fig. 40). EMEA leads
the three top choices, “ability to combine data across multiple data sets,” “access to flat
file formats,” and “access to traditional databases.” Thereafter, Asia-Pacific respondents
give the highest score to “access to semi-structured data” and outsized interest to
“ability to infer metadata.” North American respondents most often give the lowest
importance scores to data-preparation integration features.

Data-Preparation Data-Integration Features


by Geography
Ability to combine data across
multiple data sets and sources
through joins and merging data
4.5
4
Access to big data (e.g., 3.5 Access to flat file formats
Hadoop) 3 (e.g.,log files, CSV, Excel)
2.5
2
1.5
1
0.5
Access to traditional databases
Access to NoSQL sources 0 (e.g.,RDBMS)

Ability to access semi-structured


Ability to extract data from
data (e.g., XML, JSON, HTML,
documents
PDF)

Ability to infer metadata by


introspecting the data elements

Latin America Asia Pacific Europe, Middle East and Africa North America

Figure 40 – Data-preparation data-integration features by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 53
Data Preparation Market Study 2021

Interest in data-integration features often tightly clusters across organizations of


different sizes and often, but not always, increases with organization size (fig. 41). Very
large organizations (> 10,000 employees) lead interest or share the highest interest with
the next largest organizations (1,001-10,000 employees) toward all but two data-
preparation integration features, with very high interest (near 4.5, well above “very
important”), for “ability to combine data,” “access to flat file formats,” and “access to
traditional databases.” The two exceptions, “ability to access semi-structured data” and
“ability to extract data from documents,” are narrowly of the greatest interest to small
organizations (1-100 employees). Mid-sized organizations (101-1,000 employees) often
show the lowest interest in integration features for data preparation.

Data-Preparation Data-Integration Features


by Organization Size
Ability to combine data across
multiple data sets and sources
through joins and merging data
4.5
4
Access to big data (e.g., 3.5 Access to flat file formats
Hadoop) 3 (e.g.,log files, CSV, Excel)
2.5
2
1.5
1
0.5
Access to traditional databases
Access to NoSQL sources 0 (e.g.,RDBMS)

Ability to access semi-structured


Ability to extract data from
data (e.g., XML, JSON, HTML,
documents
PDF)

Ability to infer metadata by


introspecting the data elements

1-100 101-1,000 1,001-10,000 More than 10,000

Figure 41 – Data-preparation data-integration features by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 54
Data Preparation Market Study 2021

Interest in data-preparation data-integration features varies across industries but


reveals some leaders in our 2021 study (fig. 42). Of note, Business Services and
Healthcare give the highest marks to “ability to combine data across multiple data sets.”
Healthcare also narrowly leads “access to flat file formats” and “access to traditional
databases.” Technology (and Healthcare again) show the most support for “access to
semi-structured data.” Government respondents give the top scores to “ability to infer
metadata” and “ability to extract data from documents.”

Data-Preparation Data-Integration Features


by Industry
Ability to combine data across
multiple data sets and sources
through joins and merging data
5
4.5
4 Access to flat file formats (e.g.,log
Access to big data (e.g., Hadoop)
3.5 files, CSV, Excel)
3
2.5
2
1.5
1
0.5 Access to traditional databases
Access to NoSQL sources 0 (e.g.,RDBMS)

Ability to extract data from Ability to access semi-structured


documents data (e.g., XML, JSON, HTML, PDF)

Ability to infer metadata by


introspecting the data elements

Business Services Healthcare Government


Technology Retail and Wholesale Financial Services
Manufacturing Consumer Services Education

Figure 42 – Data-preparation data-integration features by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 55
Data Preparation Market Study 2021

Data-Preparation Manipulation Features


We asked organizations to score their interest in specific data-manipulation features
and, once again, find a very high and broad level of interest (fig. 43). The top five
features (“ability to aggregate and group,” “ability to pivot,” “ability to normalize,”
“support for cutting, merging and replacing,” and “ability to derive new data features”),
are at least "very important" to 70 percent or far more respondents. The next two most
important features, “simple interface for imposing raw structure” and “ability to
manipulate order of data transformation steps,” are nearly as important. All but the
lowest-ranked feature ("session-ize log or event data") are, at minimum, “very
important” to half or more of respondents.

Data-Preparation Manipulation Features


Ability to aggregate and group data

Ability to pivot (convert table to matrix) and reshape (convert


matrix to table) data

Ability to normalize, standardize, and enrich data

Support for cutting, merging, and replacing values

Ability to derive new data features from existing data (text


extraction, math expressions, date expressions, etc.)

Simple interface for imposing structure on raw data

Ability to manipulate the order of data transformation steps

Window and time series functions

Ability to un-nest data (e.g., JSON / XML parsing)

Custom user defined functions

Session-ize log or event data

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very Important Important Somewhat Important Not Important

Figure 43 – Data-preparation manipulation features

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 56
Data Preparation Market Study 2021

Year-over-year interest in data-manipulation features is at or near peak levels for all


features sampled (fig. 44). Over the last five years of our dedicated data-preparation
study, importance levels remain positively and very tightly clustered with universally
high levels of importance. Year-over-year interest peaks most noticeably for “ability to
normalize and standardize,” “support for cutting merging and replacing values,” and
“ability to un-nest data.” Overall, priority rankings for these same features are mostly
unchanged over time.

Data-Preparation Manipulation Features


2017-2021
Ability to aggregate and group
data
5 Ability to pivot (convert table
Session-ize log or event data to matrix) and reshape
4 (convert matrix to table) data

3
Ability to normalize,
Custom user defined functions 2 standardize, and enrich data

0
Ability to un-nest data (e.g., Support for cutting, merging,
JSON / XML parsing) and replacing values

Ability to derive new data


Window and time series features from existing data
functions (text extraction, math
expressions, date…
Ability to manipulate the order Simple interface for imposing
of data transformation steps structure on raw data

2017 2018 2019 2020 2021

Figure 44 – Data-preparation manipulation features 2017-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 57
Data Preparation Market Study 2021

Interest in discrete data-manipulation features for data preparation varies by function


but is often most important to respondents in Strategic Planning and BICC (fig. 45).
Strategic Planning interest is highest in four areas: “ability to aggregate,” “ability to
pivot,” “simple interface for imposing raw structure,” and “window and time series
functions.” BICC respondents give the greatest importance to “ability to normalize” and
“support for cutting, merging and replacing values.” Exceptions to this finding includes
high R&D interest in “ability to manipulate the order of data transformation steps” and
high Marketing/Sales interest in “ability to un-nest data.”

Data-Preparation Manipulation Features


by Function
Ability to aggregate and group
data
5 Ability to pivot (convert table
Session-ize log or event data 4.5 to matrix) and reshape
4 (convert matrix to table) data
3.5
3
2.5 Ability to normalize,
Custom user defined functions 2 standardize, and enrich data
1.5
1
0.5
0
Ability to un-nest data (e.g., Support for cutting, merging,
JSON / XML parsing) and replacing values

Ability to derive new data


Window and time series features from existing data
functions (text extraction, math
Ability to manipulate the expressions, date…
Simple interface for imposing
order of data transformation
structure on raw data
steps

R&D BICC Strategic Planning Function


Executive Management Marketing & Sales Information Technology (IT)
Finance

Figure 45 – Data-preparation manipulation features by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 58
Data Preparation Market Study 2021

With minor variations, interest in data-preparation manipulation features tightly clusters


across geographic regions. EMEA respondents have high interest in the leading four
features: “ability to aggregate and group,” “ability to pivot,” “ability to normalize,” and
“support for cutting, merging and replacing values.” North American and Asia-Pacific
respondents also figure highly in “ability to normalize,” along with “ability to derive new
data features,” “simple interface for imposing raw data,” “ability to manipulate order of
data transformation steps,” and “window and time series functions.”

Data-Preparation Manipulation Features


by Geography
Ability to aggregate and group
data
4.5 Ability to pivot (convert table
Session-ize log or event data 4 to matrix) and reshape
3.5 (convert matrix to table) data
3
2.5
Custom user defined Ability to normalize,
2
functions standardize, and enrich data
1.5
1
0.5
0
Ability to un-nest data (e.g., Support for cutting, merging,
JSON / XML parsing) and replacing values

Ability to derive new data


Window and time series
features from existing data
functions
(text extraction, math…
Ability to manipulate the
Simple interface for imposing
order of data transformation
structure on raw data
steps

Latin America Asia Pacific North America Europe, Middle East and Africa

Figure 46 – Data-preparation manipulation features by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 59
Data Preparation Market Study 2021

Amid results that are for the most part tightly clustered, large organizations (1,001-
10,000 employees) and very large organizations (> 10,000 employees) lead overall
interest in data-preparation manipulation features by organization size (fig. 47).
Thereafter, interest in unique features declines unevenly among respondents in mid-
sized (101-1,000 employees) and small (1-100 employees) organizations, though
overall differences are minimal.

Data-Preparation Manipulation Features


by Organization Size
Ability to aggregate and group
data
4.5 Ability to pivot (convert table
Session-ize log or event data 4 to matrix) and reshape
3.5 (convert matrix to table) data
3
2.5
Custom user defined Ability to normalize,
2
functions standardize, and enrich data
1.5
1
0.5
0
Ability to un-nest data (e.g., Support for cutting, merging,
JSON / XML parsing) and replacing values

Ability to derive new data


Window and time series
features from existing data
functions
(text extraction, math…
Ability to manipulate the
Simple interface for imposing
order of data transformation
structure on raw data
steps

1-100 101-1,000 1,001-10,000 More than 10,000

Figure 47 – Data-preparation manipulation features by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 60
Data Preparation Market Study 2021

Interest in data-preparation manipulation features varies broadly by industry, often with


widely distributed results (fig. 48). In our 2021 study, the top feature, "ability to
aggregate and group data,” shows the most unanimity with scores greater than 4.0
(“very important”) among all industries except Education. (Education respondents trail
interest in most manipulation features in the current study.) Among many other
examples, Business Services, Healthcare, Consumer Services, and Financial Services
figure highly in the top manipulation feature choices. Technology and Retail/Wholesale
narrowly lead interest in “simple interface for imposing structure on raw data.”

Data-Preparation Manipulation Features


by Industry
Ability to aggregate and group
data
4.5 Ability to pivot (convert table to
Session-ize log or event data 4 matrix) and reshape (convert
3.5 matrix to table) data
3
2.5
Ability to normalize,
Custom user defined functions 2
standardize, and enrich data
1.5
1
0.5
0
Ability to un-nest data (e.g., Support for cutting, merging,
JSON / XML parsing) and replacing values

Ability to derive new data


Window and time series features from existing data (text
functions extraction, math expressions,
date expressions, etc.)
Ability to manipulate the order Simple interface for imposing
of data transformation steps structure on raw data

Business Services Technology Healthcare


Retail and Wholesale Consumer Services Financial Services
Manufacturing Government Education

Figure 48 – Data-preparation manipulation features by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 61
Data Preparation Market Study 2021

Data-Preparation Supported Outputs


Respondents say the most important data-prep output is to Excel and CSV (89 percent),
followed by traditional relational databases (69 percent) (fig. 49). The next most popular
outputs are JSON (51 percent), popular third-party business intelligence tool formats
(36 percent), and Azure (35 percent). After Azure, supported output interest drops for
Amazon Redshift (30 percent) and Google BigQuery (27 percent). All remaining output
support options are named by less than 20 percent of respondents. (The predominance
of Excel, shown over time in the following chart, provides another useful perspective.)

Data-Preparation Supported Outputs


100%
89.2%
90%
80%
68.6%
70%
60% 51.2%
50%
40% 36.1% 35.3%
29.8% 27.1%
30%
20.2% 20.2% 17.8%
20% 13.2%
10%
0%

Figure 49 – Data-preparation supported outputs

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 62
Data Preparation Market Study 2021

The preeminence of Excel and CSV shows a continued run of dominance as the top
choice among data-preparation supported outputs across five years of data (fig. 50).
Among other output formats, the second and third choices (traditional relational
database and JSON) decline dramatically in importance compared to earlier years of
study. Interest in popular third-party business intelligence tool formats falls to an all-time
low of less than 40 percent. However, lower-ranked choices Azure, Amazon Redshift,
and Google BigQuery all tick up noticeably in the 2021 study. We expect that these
latter choices will continue to experience the greatest near-term growth by percentage
going forward.

Data-Preparation Supported Outputs


2017-2021
Excel, CSV
100% Traditional relational
Bzip/gzip 90% database (e.g., SQL
80% Server)
70%
60%
50%
Spark 40% JSON
30%
20%
10%
0%
Popular (third-party)
Hadoop business intelligence tool
formats

Parquet Azure

Google BigQuery Amazon Redshift


2017 2018 2019 2020 2021

Figure 50 – Data-preparation supported outputs 2017-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 63
Data Preparation Market Study 2021

All functional roles in organizations choose Excel and CSV as the most-preferred data-
preparation supported output (fig. 51). For this and other features, sentiment for support
of data-preparation outputs varies widely across functions in our 2021 study.
Respondents in Strategic Planning unanimously require Excel, CSV support, and post
the top marks (about 90 percent) for traditional relational database support, both figures
well ahead of other requirements by function. The next highest support requirement
comes from Executive Management for JSON (about 70 percent). Requirements for
trailing output support endpoints (Azure, third-party BI platform, Amazon RedShift,
Google BigQuery, etc.) fall quickly to below 50 percent across all industries.

Data-Preparation Supported Outputs


by Function
Excel, CSV
100%
Traditional relational
Avro 90% database (e.g., SQL Server)
80%
70%
60%
Bzip/gzip 50% JSON
40%
30%
20%
10%
Spark 0% Azure

Popular (third-party)
Parquet business intelligence tool
formats

Hadoop Amazon Redshift

Google BigQuery
Information Technology (IT) BICC Executive Management
R&D Finance Marketing & Sales
Strategic Planning Function

Figure 51 – Data-preparation supported outputs by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 64
Data Preparation Market Study 2021

Preference for flat file Excel and CSV outputs for data prep is uniformly highest across
all geographic regions (about 90 percent), though lower (about 83 percent) among Asia-
Pacific respondents (fig. 52). EMEA respondents most require traditional database
output support (80 percent). Requirements thereafter fall to 60 percent or lower for all
other supported outputs across all geographies.

Data-Preparation Supported Outputs


by Geography
Excel, CSV
100%
Traditional relational
Avro 90% database (e.g., SQL Server)
80%
70%
60%
Bzip/gzip 50% JSON
40%
30%
20%
10% Popular (third-party)
Spark 0% business intelligence tool
formats

Parquet Azure

Hadoop Amazon Redshift

Google BigQuery

North America Europe, Middle East and Africa Asia Pacific Latin America

Figure 52 – Data-preparation supported outputs by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 65
Data Preparation Market Study 2021

A great majority of organizations of any size (86-92 percent) share the highest
preference for Excel and CSV output support for data preparation (fig. 53). Traditional
relational database is a strong second choice, particularly in about 80 percent of very
large organizations (> 10,000 employees). Support requirements thereafter decrease
quickly to less than 60 percent in mid-sized organizations (101-1,000 employees) for
JSON and lower still for Azure, Amazon Redshift, and Google BigQuery.

Data-Preparation Supported Outputs


by Organization Size
Excel, CSV
100%
Traditional relational
Avro 90% database (e.g., SQL Server)
80%
70%
60%
Bzip/gzip 50% JSON
40%
30%
20%
10% Popular (third-party)
Spark 0% business intelligence tool
formats

Hadoop Azure

Parquet Amazon Redshift

Google BigQuery

Text Response 1-100 101-1,000 1,001-10,000

Figure 53 – Data-preparation supported outputs by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 66
Data Preparation Market Study 2021

All industries we sampled in our 2021 study share the greatest preference (in the range
of 88-95 percent) for Excel, CSV output, and nearly all industries (led by
Retail/Wholesale and Technology) make traditional relational databases their second
choice (fig. 54). The aforementioned Technology organizations are also most likely by
far (81 percent) to require JSON output support for data preparation. To a lesser
degree, Consumer Services and Business Services most often require traditional third-
party BI tool formats. Interestingly, our 2021 study finds Government respondents
reporting a relatively high preference for Azure, Amazon RedShift, and Google
BigQuery.

Data-Preparation Supported Outputs


by Industry
Excel, CSV
100%
Traditional relational
Avro 90% database (e.g., SQL Server)
80%
70%
60%
Bzip/gzip 50% JSON
40%
30%
20%
10% Popular (third-party)
Spark 0% business intelligence tool
formats

Parquet Azure

Hadoop Amazon Redshift

Google BigQuery

Business Services Financial Services Manufacturing


Technology Consumer Services Healthcare
Retail and Wholesale Education Government

Figure 54 – Data-preparation supported outputs by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 67
Data Preparation Market Study 2021

Data-Preparation Deployment Features


We asked respondents about their preferences for scheduling, monitoring, and
processing aspects that make data preparation part of a more formal ongoing process
(fig. 55). While such deployment features resonate slightly less than other data-
preparation capabilities, the three most popular features ("schedule a process to run on
a time-based or trigger-based event," “ability to monitor ongoing data transformation,”
and "ability to schedule execution/replay of data transformation processing") are either
"critical" or "very important" to between 64-81 percent of respondents. Interestingly, “API
support” ranks fourth out of seven feature choices, up one place in ranking from our
2020 study.

Data-Preparation Deployment Features

Schedule a process to run on a time-based or trigger-based event

Ability to monitor ongoing data transformation processing to


alert on anomalies or changes in the structure

Ability to schedule execution/replay of data transformation


processing

API support (e.g., REST)

Ability to iteratively sample data to provide an interactive testing


of transformation logic

Push-down processing of data transformations into the native


data source for script execution (e.g., SQL, Pig, etc.)

Support for multiple execution environments (e.g., MapReduce,


Spark, Hive) based on volume and scale of data sets

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very Important Important Somewhat Important Not Important

Figure 55 – Data-preparation deployment features

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 68
Data Preparation Market Study 2021

Across our most recent studies, interest in data-preparation deployment features


remains tightly clustered with mostly minor changes year over year (fig. 56). The top
feature, “schedule a process,” holds top importance with a weighted-mean score of 3.9
(near “very important”). Also, importance rankings are mostly unchanged over time. Six
of the seven deployment features we sampled reach peak interest in our 2021 study.
The exception, “support for multiple execution environments,” falls below peak levels
seen in 2018. All seven features are at or well above levels of 3.0 or “important” to
respondents.

Data-Preparation Deployment Features 2017-2021


Schedule a process to run on
a time-based or trigger-based
event
4.5
Support for multiple 4 Ability to monitor ongoing
execution environments (e.g., data transformation
MapReduce, Spark, Hive) 3.5 processing to alert on
based on volume and scale of anomalies or changes in the
data sets 3 structure
2.5

1.5
Push-down processing of data
Ability to schedule
transformations into the
execution/replay of data
native data source for script
transformation processing
execution (e.g., SQL, Pig, etc.)

Ability to iteratively sample


data to provide an interactive API support (e.g., REST)
testing of transformation logic

2017 2018 2019 2020 2021

Figure 56 – Data-preparation deployment features 2017-2020

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 69
Data Preparation Market Study 2021

Sentiment toward the top three data-preparation deployment features is high across
functions, with mean interest mostly ranging from "important" toward "very important"
(fig. 57). Interest in the top feature, “schedule a process,” is highest in Operations
(which also reports very high interest in “ability to schedule execution of data
transformation” and “API support”). Thereafter, R&D leads interest in multiple
deployment features, including “ability to monitor,” “API support,” “push-down
processing,” and “support for multiple execution environments.”

Data-Preparation Deployment Features


by Function
Schedule a process to run on a
time-based or trigger-based
event
4.5
Support for multiple execution 4
3.5 Ability to monitor ongoing data
environments (e.g.,
3 transformation processing to
MapReduce, Spark, Hive) based
alert on anomalies or changes
on volume and scale of data 2.5
in the structure
sets 2
1.5
1
0.5
0
Push-down processing of data
Ability to schedule
transformations into the native
execution/replay of data
data source for script execution
transformation processing
(e.g., SQL, Pig, etc.)

Ability to iteratively sample


data to provide an interactive API support (e.g., REST)
testing of transformation logic

BICC Executive Management


Information Technology (IT) R&D
Strategic Planning Function Marketing & Sales
Finance

Figure 57 – Data-preparation deployment features by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 70
Data Preparation Market Study 2021

Viewed by geography, interest in all data-preparation scheduling, monitoring, and


processing features is most often highest among respondents in Latin America and Asia
Pacific (fig. 58). North American respondents nonetheless give near to the highest
scores for “schedule a process to run on a time-based or trigger-based event.” In all
cases, EMEA respondents are least likely to require the data-preparation deployment
features sampled in our 2021 study.

Data-Preparation Deployment Features


by Geography
Schedule a process to run on a
time-based or trigger-based
event
4.5
Support for multiple execution 4 Ability to monitor ongoing
environments (e.g., 3.5 data transformation
MapReduce, Spark, Hive) 3 processing to alert on
based on volume and scale of 2.5 anomalies or changes in the
data sets 2 structure
1.5
1
0.5
0
Push-down processing of data
Ability to schedule
transformations into the
execution/replay of data
native data source for script
transformation processing
execution (e.g., SQL, Pig, etc.)

Ability to iteratively sample


data to provide an interactive API support (e.g., REST)
testing of transformation logic

Latin America Asia Pacific North America Europe, Middle East and Africa

Figure 58 – Data preparation deployment features by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 71
Data Preparation Market Study 2021

Interest in data preparation deployment features often tightly clusters but generally
increases with organization global headcount (fig. 59). In 2021, very large organizations
(> 10,000 employees) lead interest in all but one of seven features sampled, led by
weighted-mean scores of 4.0 and 3.9 for “schedule a process to run,” “ability to monitor
ongoing data transformation,” and “ability to schedule execution/replay of data
transformation process.” “API support” is the lone exception to very large organization
leadership, but both very large and large organizations (1,001-10,000 employees)
provide scores of about 3.7, close to “very important.”

Data-Preparation Deployment Features


by Organization Size
Schedule a process to run on a
time-based or trigger-based
event
4
Support for multiple execution 3.5 Ability to monitor ongoing
environments (e.g., 3 data transformation
MapReduce, Spark, Hive) 2.5 processing to alert on
based on volume and scale of anomalies or changes in the
data sets
2 structure
1.5
1
0.5
0
Push-down processing of data
Ability to schedule
transformations into the
execution/replay of data
native data source for script
transformation processing
execution (e.g., SQL, Pig, etc.)

Ability to iteratively sample


data to provide an interactive API support (e.g., REST)
testing of transformation logic

1-100 101-1,000 1,001-10,000 More than 10,000

Figure 59 – Data-preparation deployment features by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 72
Data Preparation Market Study 2021

Interest in data-preparation deployment features varies somewhat by industry, though


we see some leaders emerge (fig. 60). In our latest study, respondents in Healthcare
lead interest in the top three deployment features: “schedule execution/replay,” “ability
to monitor ongoing data transformation,” and “ability to monitor ongoing data
processing.” Government and Consumer Services respondents report the most interest
in “API support.” Retail/Wholesale respondents narrowly lead interest in “ability to
iteratively sample.” Respondents in Education trail interest in deployment features by
industry.

Data-Preparation Deployment Features


by Industry
Schedule a process to run on a
time-based or trigger-based
event
4.5
Support for multiple execution 4
3.5 Ability to monitor ongoing data
environments (e.g.,
3 transformation processing to
MapReduce, Spark, Hive) based
alert on anomalies or changes
on volume and scale of data 2.5
in the structure
sets 2
1.5
1
0.5
0
Push-down processing of data
Ability to schedule
transformations into the native
execution/replay of data
data source for script execution
transformation processing
(e.g., SQL, Pig, etc.)

Ability to iteratively sample


data to provide an interactive API support (e.g., REST)
testing of transformation logic

Healthcare Business Services Technology


Retail and Wholesale Financial Services Government
Consumer Services Manufacturing Education

Figure 60 – Data-preparation deployment features by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 73
Data Preparation Market Study 2021

Location of Data-Preparation Capabilities


We gave respondents three choices to describe their preferred deployment location
scenario for data-preparation capabilities (fig. 61). In our 2021 study, respondents most
prefer on-premises deployment (which might include desktop, LAN, or other
configurations inside the firewall). Compared to on-premises deployments, which are
"critical" or "very important" to nearly 60 percent of respondents, private cloud
deployments are "critical" or "very important" to about 53 percent and public cloud to
about 48 percent (well up from 33 percent in 2020). All options are, at minimum,
“important” to between 69-82 percent of respondents.

Location of Data-Preparation Capabilities

On Premises

Private Cloud

Public Cloud (SaaS)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very Important Important Somewhat Important Not Important

Figure 61 – Location of data-preparation capabilities

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 74
Data Preparation Market Study 2021

Across five years of data, we see an expected growth in interest in the use of public
cloud access to data-preparation capabilities and, to a lesser extent, a decrease in on-
premises location of data-preparation capabilities (fig. 62). Most notable is the upward
slope of public cloud over time, with five consecutive-year increases up to an extended
high in 2021. The use of cloud-hosted data preparation still trails both on-premises and
private cloud, though the three options have never been closer by user preference
(weighted mean 3.27, 3.42, and 3.66, respectively). All three of these values fall in the
range of somewhat more than “important.”

Location of Data-Preparation Capabilities


4.5
2017-2021
4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0
2017 2018 2019 2020 2021
Public cloud (SaaS) Private cloud On-premises
Linear (Public cloud (SaaS)) Linear (On-premises)

Figure 62 – Location of data-preparation capabilities 2017-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 75
Data Preparation Market Study 2021

While the hierarchy of on-premises, private cloud, and public cloud deployment
generally holds across functions, not everyone most prefers on-premises deployment
(fig. 63). In our latest sample, Executive Management has a clear preference for public
cloud (3.96) versus private cloud (3.71) and on-premises (3.29), which will likely divine a
trend for future enterprise adoption. Executives and R&D respondents also show a
preference for private cloud over on-premises deployment. Marketing/Sales, BICC,
Strategic Planning, Finance, and IT are the stalwart supporters for on-premises
deployment in our 2021 study, with scores close to 4.0, or “very important.”

Location of Data-Preparation Capabilities


by Function
4.5

3.5

2.5

1.5

0.5

0
Marketing & Executive R&D BICC Finance Strategic Information
Sales Management Planning Technology
Function (IT)

On Premises Private Cloud Public Cloud (SaaS)

Figure 63 – Location of data-preparation capabilities by function

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 76
Data Preparation Market Study 2021

The hierarchical preference for on-premises capabilities for data preparation is not quite
consistent across all geographic regions in our 2021 study (fig. 64). EMEA respondents
are the most steadfast supporters of on-premises deployment, with “very important”
scores. North and Latin America also keep to the preference for on-premises versus
private and public cloud. But, notably, we see Asia-Pacific respondents with the highest
of all scores for public cloud (> “very important”).

Location of Data-Preparation Capabilities


by Geography
4.5

3.5

2.5

1.5

0.5

0
Asia Pacific Europe, Middle East and North America Latin America
Africa

On Premises Private Cloud Public Cloud (SaaS)

Figure 64 – Location of data-preparation capabilities by geography

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 77
Data Preparation Market Study 2021

The preference for on-premises location of data-preparation capabilities increases with


organization size and is predictably highest at very large organizations that are
historically likely to maintain in-house infrastructure (fig. 65). The ranking of private
cloud location of data preparation is less linear, with small followed by very large
organizations assigning a higher priority to it. As we would expect, small organizations
are the strongest advocates for public cloud, followed by very large organizations, likely
representing a departmental approach to cloud-hosted data preparation.

Location of Data-Preparation Capabilities


by Organization Size
4.5

3.5

2.5

1.5

0.5

0
1-100 101-1,000 1,001-10,000 More than 10,000

On Premises Private Cloud Public Cloud (SaaS)

Figure 65 – Location of data-preparation capabilities by organization size

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 78
Data Preparation Market Study 2021

With the exception of Technology and Business Services, all vertical industries rank on-
premises location of data preparation higher than all other models, sometimes by a
large margin (fig. 66). In our 2021 study, Government, Healthcare, and Financial
Services are the most likely to require on-premises hosting, perhaps in part due to
regulatory or proprietary reasons. Technology organizations, and to a much lesser
extent Business Services, are the interesting exceptions; they rank preferences in
reverse order from public cloud to private cloud and, finally, on-premises deployment.

Location of Data-Preparation Capabilities


by Industry
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0

On Premises Private Cloud Public Cloud (SaaS)


Linear (On Premises) Linear (Public Cloud (SaaS))

Figure 66 – Location of data-preparation capabilities by industry

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 79
Data Preparation Market Study 2021

Industry Support for Data Preparation


Like the end-user respondent community, the solution providers attach high importance
to data preparation (fig. 67). With that in mind, it also appears that parts of the industry
may feel data prep reached maturity, given that criticality and mean levels of importance
are mostly flat in the last five years of our study. “Critical” scores for data preparation
are at their lowest level in the last five years of our study at 59 percent. That is offset by
diminishing skepticism in the decline of “somewhat important” scores and the
disappearance of “not important” in our last two studies. Current industry mean
importance is slightly below but in the range of user importance (fig. 7, p. 20). While
other BI imperatives may come more to the fore, we are confident that data preparation
will be a common if not ubiquitous component or feature of BI solutions going forward.

Industry Importance of Data Preparation


2017-2021
100% 5

90%
4.5

80%
4
70%

3.5
60%

50% 3

40%
2.5

30%
2
20%

1.5
10%

0% 1
2017 2018 2019 2020 2021

Critically Important Very Important Somewhat Important


Not Important Mean

Figure 67 – Industry importance of data preparation 2017-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 80
Data Preparation Market Study 2021

Industry Support for Data-Preparation Usability Features


We asked vendors to describe their current and future support for 13 usability features
associated with data preparation (fig. 68). Generally, we can report that the industry
provides very high levels of support for usability. Some of the most-supported features
today, such as “technical expertise not required,” “support for entire data transformation
process,” and others, are not neatly aligned with user priorities but are well ahead of
current user importance scores (fig. 31, p. 44). Even the least-supported feature,
machine learning, is also far ahead of user requirements in 2021.

Industry Support for Usability Features


Machine learning and recommendations based on…
Automated recommendations for data…
Automated detection of anomalies, outliers, &…
View/audit all prep tasks (and changes) and…
Ability to save preparation steps and apply to…
Visual highlighting of relationships between…
Visual interface for users to view and explore in-…
Automatically generate data transformation…
Collaborative capabilities
Mask or redact sensitive data
Support for entire data transformation process in…
Technical expertise/programming is *NOT*…
Immediate preview and feedback for end user
Less than 2 second response time for design…

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Today 12 months 24 months No plans

Figure 68 – Industry support for usability features

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 81
Data Preparation Market Study 2021

Industry Support for Data-Preparation Integration


Industry investment and support for data-preparation integration features is very strong
and mature, with levels of current support near or above 90 percent for five of eight
functions under study (fig. 69). All industry participants already support “access to flat
file formats” and “ability to combine data.” There is near universal support for "access to
flat file formats" and close to 95 percent support for “ability to combine data across
multiple data sets and sources." Vendors expect all integration features except “access
to big data” to have close to 95 percent or greater support in coming time frames. Such
robust support certainly answers all current user expectations for integration features
(fig. 37, p. 50).

Industry Support for Integration Features

Access to cloud-native database sources (e.g., Snowflake)

Access to flat file formats (e.g.,log files, CSV, Excel)

Ability to combine data across multiple data sets and


sources through joins and merging data

Ability to access semi-structured data (XML, JSON, HTML,


PDF)

Access to traditional databases (e.g.,RDBMS)

Access to NoSQL sources

Ability to infer metadata by introspecting the data


elements

Access to Bigdata (e.g., Hadoop)

Ability to extract data from documents

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Today 12 months 24 months No plans

Figure 69 – Industry support for integration features

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 82
Data Preparation Market Study 2021

Industry Support for Data-Preparation Output Options


In our 2021 study, industry support for output options is somewhat less robust than for
integration and usability features but appears ready to address all current demands of
users (fig. 70). The six most-required outputs selected by users (fig. 49, p. 62) are
among those with the highest level of vendor support in 2021. Industry current support
also appears to address the accelerating user uptake of Azure, Amazon Redshift,
Google BigQuery, and other outputs (fig. 50, p. 63).

Industry Support for Output Options


Excel, CSV
Cloud-native database sources (e.g., Snowflake)
Traditional relational database (e.g., SQL Server)
JSON
Amazon Redshift
Own proprietary BI tool format
Azure
Google BigQuery
Hadoop
Popular (third-party) business intelligence tool…
Parquet
Avro
Spark
Bzip/gzip

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Today 12 months 24 months No plans

Figure 70 – Industry support for output options

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 83
Data Preparation Market Study 2021

Industry Support for Data Preparation Data Manipulation Features


Industry support for data-manipulation features is strong “across the board” in our 2021
study (fig. 71). The top six features all currently enjoy 90 percent or greater support, and
all but “window and time series functions” have at least 80 percent support. Vendors
expect all data manipulation features we sampled to reach 90 percent support in coming
time frames. The top industry-supported manipulation features mostly match and easily
accommodate the top user priorities (fig. 43, p. 56).

Industry Support for Data-Manipulation Features


Ability to aggregate & group data

Support for cutting, merging & replacing of values

Ability to normalize, standardize & enrich data

Simple interface for imposing structure on raw data


Ability to pivot (convert table to matrix) & reshape
(convert matrix to table) data
Ability to derive new data features from existing
data (text extraction, math expressions, date…
Ability to unnest data (e.g. json/xml parsing)

Custom user defined functions


Ability to manipulate the order of data
transformation steps
Session-ize log or event data

Window and time series functions

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Today 12 months 24 months No plans

Figure 71 – Industry support for data-manipulation features

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 84
Data Preparation Market Study 2021

Industry Support for Data-Preparation Deployment Features


Industry support of data-preparation deployment features is robust, and vendors expect
to extend investment in capabilities in 2021 and beyond (fig. 72). Top user priorities (fig.
55, p. 68), are largely aligned with current industry support, and current support appears
to easily surpass user requirements. The third top user requirement, “ability to monitor,”
is among the less-supported features today; but vendors expect future investment to
keep availability well ahead of user requirements.

Industry Support for Deployment and


Performance Features
Schedule a process to run on a time-based or
trigger-based event

Ability to iteratively sample data to provide an


interactive testing of transformation logic

Ability to schedule the execution/replay of data


transformation processing

API support (e.g., REST)

Push-down processing of data transformations into


the native data source for script execution (SQL,…

Ability to monitor ongoing data transformation


processing to alert on anomalies or changes in the…

Support for multiple execution environments (e.g.,


MapReduce, Spark, Hive) based on volume and…

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Today 12 months 24 months No plans

Figure 72 – Industry support deployment and performance features

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 85
Data Preparation Market Study 2021

Industry Support for Data Preparation—Cloud versus On-Premises


Industry support for data-preparation deployment options nears maturity in 2021, with
remaining future investment (12 percent) targeted for cloud deployment (fig. 73).
Currently, 82 percent of vendors support on-premises deployment; about 84 percent
currently support cloud deployment. As we found in 2020, more industry products are
available for cloud versus on-premises deployment of data preparation. We also
observe no predicted future investment for on-premises support. As noted earlier (fig.
61, p. 74), user demand still leans toward on-premises versus public or private cloud
deployment. Assuming users shift towards greater cloud deployment (fig. 62, p. 76),
industry support will already be in place.

Industry Support for Cloud and On-premises


Deployment
90%

80%

70%

60%

50%

40%

30%

20%

10%

0%
Today 12 months No plans

On-Premises Cloud Based

Figure 73 – Industry support for cloud and on-premises deployment

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 86
Data Preparation Market Study 2021

Fig. 74 provides another instructive view of industry support growth for cloud-based
data preparation deployment over time. Our 2021 study shows a continuation of the
upward linear trend in cloud-based deployment options for data preparation. We also
observe a declining linear slope in support for on-premises deployment. Indeed, on-
premises support diminishes by a full 10 percent in 2021 from a high of 92 percent in
2017. As cloud deployment of data-preparation capabilities continues to grow in
organizations, we expect both cloud and on-premises models to remain well supported.

Industry Support for Cloud and On-premises


Deployment 2015-2021
100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%
2015 2016 2017 2018 2019 2020 2021

On-premises Cloud Based Linear (On-premises) Linear (Cloud Based)

Figure 74 – Industry support for cloud and on-premises deployment 2015-2021

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 87
Data Preparation Market Study 2021

Data Preparation Vendor Ratings


We include 24 vendors in our data preparation ratings (fig. 75). For each vendor, we
consider usability, integration, output, data manipulation, and deployment features. Only
vendors that score 50 percent or greater are included in this report. Top-rated vendors
include Trifacta (1st), Datameer (2nd), Alteryx and Pyramid Analytics tied for 3rd, Qlik
(4th), and Domo (5th).

Data Preparation Vendor Ratings


Trifacta
ThoughtSpot 64 Datameer
Google Cloud Alteryx
32
MicroStrategy Pyramid Analytics
16

Dundas 8 Qlik

4
Dimensional Insight Domo
2

Tableau 1 Microsoft

Sigma Computing Informatica

Oracle RapidMiner

Infor Sisense

Matillion SAS
Ataccama Talend
Fishtown Analytics

Usability Integration Output Options Data Manipulation Deployment Total Score

Figure 75 – Data preparation vendor ratings

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 88
Data Preparation Market Study 2021

Other Dresner Advisory Services Research Reports

- Wisdom of Crowds® “Flagship” Business Intelligence Market Study


- Analytical Data Infrastructure
- BI Competency Center
- Big Data Analytics
- Cloud Computing and Business Intelligence
- Data Catalog
- Data Pipelines and Integration
- Data Science and Machine Learning
- Embedded Business Intelligence
- Enterprise Performance Management
- Natural Language Analytics
- Sales Performance Management
- Self-Service BI
- Small and Mid-Sized Enterprise Business Intelligence
- Small and Mid-Sized Enterprise Performance Management

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 89
Data Preparation Market Study 2021

Appendix: Data Preparation Survey Instrument

Name*: _________________________________________________

Company Name: _________________________________________________

Address 1: _________________________________________________

Address 2: _________________________________________________

City: _________________________________________________

State: _________________________________________________

Zip: _________________________________________________

Country: _________________________________________________

Email Address*: _________________________________________________

Phone Number: _________________________________________________

Major Geography

( ) Asia/Pacific

( ) Europe, Middle East and Africa

( ) Latin America

( ) North America

What is your current title?

_________________________________________________

What function are you a part of?

( ) Business intelligence competency center

( ) Executive management

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 90
Data Preparation Market Study 2021

( ) Finance

( ) Information Technology (IT)

( ) Manufacturing

( ) Marketing

( ) Project/program management office

( ) Sales

( ) Research and development (R&D)

( ) Other - Write In: _________________________________________________

Please select an industry

( ) Advertising

( ) Aerospace

( ) Agriculture

( ) Apparel and accessories

( ) Automotive

( ) Aviation

( ) Biotechnology

( ) Broadcasting

( ) Business services

( ) Chemical

( ) Construction

( ) Consulting

( ) Consumer products

( ) Defense

( ) Distribution & logistics


COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 91
Data Preparation Market Study 2021

( ) Education

( ) Energy

( ) Entertainment and leisure

( ) Executive search

( ) Federal government

( ) Financial services

( ) Food, beverage and tobacco

( ) Healthcare

( ) Hospitality

( ) Gaming

( ) Insurance

( ) Legal

( ) Manufacturing

( ) Mining

( ) Motion picture and video

( ) Not for profit

( ) Pharmaceuticals

( ) Publishing

( ) Real estate

( ) Retail and wholesale

( ) Sports

( ) State and local government

( ) Technology

( ) Telecommunications

( ) Transportation
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 92
Data Preparation Market Study 2021

( ) Utilities

( ) Other - Write In: _________________________________________________

How many employees does your company employ worldwide?

( ) 1 - 100

( ) 101 - 1000

( ) 1001 - 5000

( ) More than 5000

How important is it for users to be able to prepare data (e.g., combine, clean, shape
datasets) prior to analysis?*

( ) Critical

( ) Very important

( ) Important

( ) Somewhat important

( ) Not important

What tool(s) do users currently use to prepare data for analysis?

____________________________________________

____________________________________________

____________________________________________

____________________________________________

How effective is the current approach to Data preparation for Business Intelligence/user
analysis today?

( ) Highly effective
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 93
Data Preparation Market Study 2021

( ) Somewhat effective

( ) Somewhat ineffective

( ) Totally ineffective

How often do users have to prepare data (e.g., combine, clean and shape datasets) to
get it in a format that can be used for analysis?

( ) Constantly

( ) Frequently

( ) Occasionally

( ) Rarely

( ) Never

How often do users enrich internal data with third party data (e.g.,Dun & Bradstreet, US
Census)?

( ) Constantly

( ) Frequently

( ) Occasionally

( ) Rarely

( ) Never

Should Data preparation be a standalone capability or part of another tool?

( ) Standalone

( ) Part of business intelligence tools

( ) Part of existing data quality/data integration tools

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 94
Data Preparation Market Study 2021

Please indicate the importance of the following usability features for Data preparation
software:

Very Somewh Not


Critica Importan
importan at importan
l t
t important t

Technical () () () () ()
expertise/programmi
ng is *NOT* required
to build/execute data
transformation scripts

Immediate preview () () () () ()
and feedback for end
user

Automated () () () () ()
recommendations for
data relationships &
keys for combining
data across multiple
data sets and
sources

Visual interface for () () () () ()


users to view and
explore in-process
data sets,
interactively profile
and refine data
transformations prior
to execution

Visual highlighting of () () () () ()
relationships
between columns,
attributes & datasets

Automated detection () () () () ()
of anomalies,
outliers, & duplicates

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 95
Data Preparation Market Study 2021

Automatically () () () () ()
generate data
transformation
code/scripts for
execution

Support for entire () () () () ()


data transformation
process in a single
application/user
interface

Machine learning and () () () () ()


recommendations
based on usage data
gathered across
users, groups, or
organizations

Please indicate the importance of the following data integration features for Data
preparation software:

Very Somewhat Not


Critical Important
important important important

Access to () () () () ()
traditional
databases
(e.g.,
RDBMS)

Access to () () () () ()
big data
(e.g.,
Hadoop)

Access to () () () () ()
NoSQL
sources

Access to () () () () ()

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 96
Data Preparation Market Study 2021

file formats
(e.g., log
files, CSV,
Excel)

Ability to () () () () ()
infer
metadata by
introspecting
the data
elements

Ability to () () () () ()
combine
data across
multiple data
sets and
sources
through
joins and
merging
data

What output formats should a Data preparation solution support?

[ ] Traditional relational database (e.g., SQL Server)

[ ] Excel, CSV

[ ] Popular (third-party) business intelligence tool formats

[ ] Hadoop

[ ] Redshift

[ ] Azure

[ ] Avro

[ ] Parquet

[ ] Bizp/gizp

[ ] Other - Write In: _________________________________________________

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 97
Data Preparation Market Study 2021

Please indicate the importance of the following data manipulation features for Data
preparation software:

Very Somewhat Not


Critical Important
important important important

Simple () () () () ()
interface for
imposing
structure on
raw data

Ability to un- () () () () ()
nest data
(e.g. JSON /
XML parsing)

Ability to () () () () ()
normalize,
standardize &
enrich data

Support for () () () () ()
cutting,
merging &
replacing of
values

Ability to () () () () ()
aggregate &
group data

Ability to pivot () () () () ()
(convert table
to matrix) &
reshape
(convert
matrix to
table) data

Ability to () () () () ()
derive new
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 98
Data Preparation Market Study 2021

data features
from existing
data (text
extraction,
math
expressions,
date
expressions,
etc.)

Ability to () () () () ()
manipulate
the order of
data
transformation
steps

Session-ize () () () () ()
log or event
data

Window and () () () () ()
time series
functions

Custom user () () () () ()
defined
functions

Please indicate the importance of the following deployment features for Data
preparation software:

Very Somewhat Not


Critical Important
important important important

Ability to () () () () ()
iteratively
sample data to
provide an
interactive
testing of
transformation
COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC
Page | 99
Data Preparation Market Study 2021

logic

Push-down () () () () ()
processing of
data
transformations
into the native
data source for
script execution
(SQL, Pig, etc.)

Ability to () () () () ()
schedule the
execution/replay
of data
transformation
processing

Ability to () () () () ()
monitor ongoing
data
transformation
processing to
alert on
anomalies or
changes in the
structure

Support for () () () () ()
multiple
execution
environments
(e.g.,
MapReduce,
Spark, Hive)
based on
volume and
scale of data
sets

API support () () () () ()
(e.g., REST)

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 100
Data Preparation Market Study 2021

Where should Data preparation functionality reside?

Very Somewhat Not


Critical Important
important important important

On- () () () () ()
premises

Private () () () () ()
cloud

Public () () () () ()
cloud
(SaaS)

COPYRIGHT 2021 DRESNER ADVISORY SERVICES, LLC


Page | 101

You might also like