5 Essential Components of Data Strategy
5 Essential Components of Data Strategy
Contents
Data Strategy: What Problem Does It Solve?.................................................. 1
Store.................................................................................................................................6
Provision ..........................................................................................................................8
Process.............................................................................................................................9
Govern .........................................................................................................................10
Learn More...........................................................................................................13
As our corporate data stores have grown in both size and subject area diversity, it has
become clear that a strategy to address data is necessary. Yet some still struggle with
the idea that corporate data needs a comprehensive strategy.
Let’s see how this played out in real life at one organization that set out to develop a
data strategy.
The bank was already successful. Its revenue and costs were well-managed, and the
individual business units and technology groups were good at delivering against their
commitments. To the bank’s credit, it wasn’t complacent. Management was always
looking for ways to increase staff members’ productivity and reduce ongoing costs.
There were all kinds of metrics and key performance indicators (KPIs) to measure IT
performance, business benefits and total cost of ownership. The idea of building yet
another road map to address a problem that wasn’t well-understood met with pushback.
With the bank doing so many things right, he needed to understand why and how a
data strategy would make a difference. To answer these questions, we first have to
consider how data was created and used in the past compared to how it’s created and
used today.
Today, business is very different. The value of data is accepted; the results of reporting
and analytics have made data the secret sauce of many new business initiatives. It’s
common for application data to be shared with as many as 10 other systems.
While the value of data has evolved tremendously over the past 20 years – and business
users recognize it – few companies have adjusted their approaches to capturing,
sharing and managing corporate data assets. Their behavior reflects an outdated,
underlying belief that data is simply an application byproduct.
Organizations need to create data strategies that match today’s realities. To build such a
comprehensive data strategy, they need to account for current business and tech-
nology commitments while also addressing new goals and objectives.
We weren’t challenging the premise or value of any individual project. The problem
was the approach that each individual project and activity took. Each activity addressed
data needs independently from one another without any awareness of the overlapping
efforts and costs.
• Most projects required access to the same data content. Unfortunately, there was no
coordination to prevent overlapping (and wasted) work.
• There was no data sharing, no data reuse, or any economies-of-scale activities to
simplify or reduce the cost of data movement and development.
3
• Business users accessed common data across separate applications. Data value
names and formatting varied across applications.
• Users found inconsistencies across reports because source data wasn’t docu-
mented, and it varied across individual reports.
The result was duplicate data, processing overlaps and little awareness that individual
projects were replicating work. There wasn’t anything in place to support communi-
cating, collaborating or sharing data methods and practices across projects and systems.
The problem: Every project at the bank addressed data issues as one-off, built-from-
scratch activities.
• Each project required access to customer data, and each had over-
lapping tasks and resources.
• Every project included a source data inventory and analysis activity
because there was no way to know where specific data resided.
• New data extracts (subsets of the application’s data copied for use by
other systems) had to be built because IT had no way of determining
if the data was already available.
• No two teams shared their source extract data. Each had their own
copies to support their integration and database build activities
(which tied up storage for this transient content).
• Each team’s integration logic was custom built and individually main-
tained, because the logic and rules weren’t identified or documented
to be shared.
The idea behind developing a data strategy is to make sure all data resources are posi-
tioned in such a way that they can be used, shared and moved easily and efficiently. A data strategy is a plan
Data is no longer a byproduct of business processing – it’s a critical asset that enables
processing and decision making. A data strategy helps by ensuring that data is
designed to improve
managed and used like an asset. It provides a common set of goals and objectives all of the ways you
across projects to ensure data is used both effectively and efficiently. A data strategy
acquire, store, manage,
establishes common methods, practices and processes to manage, manipulate and
share data across the enterprise in a repeatable manner. share and use data.
While most companies have multiple data management initiatives underway
(metadata, master data management, data governance, data migration, modernization,
data integration, data quality, etc.), most efforts are focused on point solutions that
address specific project or organizational needs. A data strategy establishes a road map
for aligning these activities across each data management discipline in such a way that
they complement and build on one another to deliver greater benefits.
A data strategy must address data storage, but it must also take into account the way
data is identified, accessed, shared, understood and used. To be successful, a data
strategy has to include each of the different disciplines within data management. Only
then will it address all of the issues related to making data accessible and usable so that
it can support today’s multitude of processing and decision-making activities.
There are five core components of a data strategy that work together as building blocks
to comprehensively support data management across an organization: identify, store,
provision, process and govern.
5
Identify
The
Govern Core Store
Components
Process Provision
Identify
Identify data and understand its meaning regardless of structure, origin or
location
One of the most basic constructs for using and sharing data within a company is estab-
lishing a means to identify and represent the content. Whether it’s structured or unstruc-
tured content, manipulating and processing data isn’t feasible unless the data value has
a name, a defined format and value representation (even unstructured data has these
details). Establishing consistent data element naming and value conventions is core to
using and sharing data. These details should be independent of how the data is
stored (in a database, file, etc.) or the physical system where it resides.
It’s also important to have a means of referencing and accessing metadata associated
with your data (definition, origin, location, domain values, etc.). In much the same way that
having an accurate card catalog supports an individual’s success in using a library to
retrieve a book, successful data usage depends on the existence of metadata (to help
retrieve specific data elements). Consolidating business terminology and meaning into a
business data glossary is a common means to addressing part of the challenge.
6
Libraries have card catalogs because it’s impractical to remember the location of every
book. Metadata is critical for business data usage because it’s impossible to know the
location and meaning of all of the company’s business data – thousands of data
elements across numerous data sources. Without data identification details, you would
be forced to undertake a data inventory and analysis effort every time you wanted to
include new data in your processing or analysis activities.
Without a data glossary and metadata (i.e., the “data card catalog”), companies are
likely to ignore some of their most prized data assets because they won’t know they
exist. If data is truly a corporate asset, a data strategy has to ensure that all of the data
can be identified.
Location
Product
Customer
Attribute Source Definition Type ... ... Steward
Customer ID SalesCRM Value uniquely identifying Integer ... ... Susan Craff
First Name CapBilling Customer’s first name Character ... ... Susan Craff
Last Name CapBilling Customer’s last name Character ... ... Susan Craff
Middle Initial CapBilling Customer’s middle initial Character ... ... Susan Craff
Home Street ServCont Home street address Character ... ... Susan Craff
Home City ServCont Home residence city Character ... ... Susan Craff
... ... ... ... ... ... ...
... ... ... ... ... ... ...
Store
Persist data in a structure and location that supports easy, shared access and
processing
Data storage is one of the basic capabilities in a company’s technology portfolio – yet it
is a complex discipline. Most IT organizations have mature methods for identifying and
managing the storage needs of individual application systems; each system receives
sufficient storage to support its own processing and storage requirements. Whether
dealing with transactional processing applications, analytical systems or even general
purpose data storage (files, email, pictures, etc.), most organizations use sophisticated
methods to plan capacity and allocate storage to the various systems. Unfortunately,
this approach only reflects a “data creation” perspective. It does not encompass data
sharing and usage.
The gap in this approach is that there’s rarely a plan for efficiently managing the storage
required to share and move data between systems. The reason is simple; the most
visible data sharing in the IT world is transactional in nature. Transactional details
between applications are moved and shared to complete a specific business process.
Bulk data sharing isn’t well-understood and is often perceived as a one-off or infrequent
occurrence.
7
With the popularity of big data, the growth of business analytics and increased informa-
tion sharing between companies, it’s much more common to share large volumes (or
bulk) data. Most of this shared content falls into two categories: internally created data Forbes magazine 1
(customer details, purchase details, etc.) and externally created content (cloud applica-
identified a medical
tions, third-party data, syndicated content, etc.). The lack of a centrally managed data
sharing process typically forces all systems to manage this space individually, so research facility gener-
everyone creates their own copy of the source. ating 100 terabytes of
As organizations have evolved and data assets have grown, it has become clear that data that was ultimately
storing all data in a single location isn’t feasible. It’s not that we can’t build a system copied and retained
large enough to hold the content. The problem is that the size and distributed nature of
our organizations – and the diversity of our data sources – makes loading data into a by 18 different teams
single platform impractical. Everyone doesn’t need access to all of the company’s data; and required more
they need access to specific data to support their individual needs.
than 10 petabytes
The key is to make sure there’s a practical means of storing all the data that’s created in of storage.
a way that allows it to be easily accessed and shared. You don’t have to store all the data
1
in one place; you need to store the data once and provide a way for people to find and Best Practices for Managing Big Data,
by Ash Ashutosh. Forbes.com
access it.
We know that once data is created, it will be shared with numerous other systems; it’s
critical to address storage efficiently, in a way that simplifies access. A good data strategy
will ensure that any data created is available for future access without requiring
everyone to create their own copies.
Social
Support Media SFA Suppliers Distribution
Internal External Providers
Finance
Data Syndicated
Sales Inventory Vendors Data
Figure 3: Each system creating its own data copies causes a fourfold increase in storage and processing.
8
Provision
Package data so it can be reused and shared, and provide rules and access
guidelines for the data
In the early days of IT, most application systems were built as individual, independent
data processing engines that contained all of the data necessary to perform their
defined duties. There was little or no thought given to sharing data across applications.
Data was organized and stored for the convenience of the application that collected,
created and stored the content.
When the occasional request for data came up, an application developer created an
extract by either dumping that data into a file or building a one-off program to support
another application’s request. The developer didn’t think about ongoing data provi-
sioning needs, or data reuse or sharing. At that time, data sharing was infrequent.
Today, data sharing is definitely not a specialized need or an infrequent occurrence –
data is often used by 10 other systems to support additional business processes and
decision making.
But most application systems were not designed to share data. The logic and rules
required to decode data for use by others is rarely documented or even known outside
of the application development team. Most IT organizations don’t provide budget or staff
resources to address nontransactional data sharing. Instead, it’s handled as a courtesy
or convenience – and often addressed as a personal favor between staff members.
When data is shared, it’s usually packaged at the convenience of the application devel-
oper, not the data user. Such an approach might have been acceptable in years past,
when just a few systems and a couple of teams needed access. But it’s completely
impractical in today’s world where IT manages dozens of systems that rely on data from
multiple sources to support individual business processes. Packaging and sharing data
at the convenience of a single source developer – instead of the individuals
managing 10 downstream systems that require the data – is ridiculous. And expecting
individuals to learn the idiosyncrasies of dozens of source application systems just so
they can use the data is an incredible waste of time.
Figure 4: Customer details stored and referenced differently in each operational application.
9
If a company’s data is truly a corporate asset, then all data must be packaged and
prepared for sharing. To treat data as an asset instead of a burden of doing business,
a data strategy has to address data provisioning as a standard business process.
Process
Move and combine data residing in disparate systems, and provide a unified,
consistent data view
Data generated from applications is a treasure trove of knowledge – but data is a raw
commodity at the time of creation. It hasn’t been prepared, transformed or corrected to
make it “ready to use.” Process is the component of data strategy that addresses the
activities required to evolve data from a raw ingredient into a finished good.
While most organiza-
tions have initiatives to
Source system data is much like a raw ingredient in a manufacturing process. For a
address code reuse
manufacturer to construct a product (let’s say a box of cereal), it must acquire a large
quantity of raw ingredients (flour, fruit, nuts, cardboard, printing ink, etc.) and develop a and collaboration for
manufacturing process to build and deliver a box of cereal to the grocer’s shelf. A box application develop-
filled with flour, nuts and ink isn’t ready to use; baking, processing, packing and
shipping are required to make a product that’s ready to use and available on the ment, they have not
grocer’s shelf. focused this effort on
Data generated from an application is very much a raw ingredient. At most companies, delivering data that
data originates from both internal and external sources. Internal data is generated from is ready to use and
dozens (if not hundreds) of application systems. External data may be delivered from a
variety of different sources (cloud applications, business partners, data providers,
promotes sharing
government agencies, etc.). While this data is often rich with information, it wasn’t and reuse.
packaged in a manner to be integrated with the unique combination of sources that
exist within each individual company. To make the data ready to use, a series of steps
are necessary to transform, correct and format the data. The result of this process is a
small set of homogeneous data sets that can be merged or integrated by a data user
with a set of data preparation tasks specific to their individual needs (analytics, transac-
tion processing, data sharing, etc.).
It’s common for companies to establish a centralized team to address data cleansing,
standardization, transformation and integration for the data warehouse. Unfortunately,
many have learned that this type of processing isn’t unique to a data warehouse. Most
data users (applications, analytics users, developers, etc.) require ready-to-use data – so
these users end up taking on the development effort themselves. Developing code to
identify and match records across these individual sources can be quite complex,
particularly when some systems require data from 20 or more sources.
10
Developers spend enormous time building logic to match and link values across a
multitude of sources. Unfortunately, as each new development team requires
access to individual data sources, they reconstruct or reinvent the logic needed to link
values across the same data sources. The tragedy of data integration is that this rework
happens with each new project because the learnings of the past are never captured
for reuse.
While most organizations have initiatives to address code reuse and collaboration for
application development, they have not focused this effort on delivering data that is
ready to use and promotes sharing and reuse. It’s not practical (nor is it appropriate) for
data users to become developers. Making data ready to use is about offering tools and
establishing processes to produce data that individuals can use – without IT
involvement.
Govern
Establish, manage and communicate information policies and mechanisms for
effective data usage
Since data is still often perceived as a byproduct of application processing, few organi-
zations have fully developed the methods and processes needed to manage data
outside the context of an application and across the enterprise. While many have
begun investing in data governance initiatives, many are still in the infancy stage of their
respective initiatives.
Data
Sources
Application
Figure 5: Each data source contains unique data (colored boxes). Since each application creates its own integration logic, the
data values may differ across each application.
11
Most data governance initiatives start by addressing specific tactical issues (e.g., data
accuracy, business rule definition or terminology standards) and are confined to
specific organizations or project efforts. As governance awareness grows, and as data
sharing and usage issues gain visibility, governance initiatives often broaden in scope.
As those initiatives expand, organizations may establish a set of information policies,
rules and methods to ensure uniform data usage, manipulation and management.
Data governance
But all too often data governance is perceived as a rigor specific only to users and the
analytics environment. In fact, data governance applies to all applications, systems and provides the necessary
staff members. The biggest challenge with data governance is adoption – because data rigor over the data
governance is an overarching set of information policies and rules that everyone must
respect and follow. content as changes
occur to the tech-
The reason for establishing a strong governance process is to ensure that once data is
decoupled from the application that created it, the rules and details of the data are
nology, processing and
known and respected by all other data constituents. The role governance plays within methodology areas
an overall data strategy is to ensure that data is managed consistently across the
company.
associated with the data
strategy effort.
Whether it is for determining security details, data correction logic, data naming stan-
dards or even establishing new data rules, effective data governance makes sure
data is consistently managed, manipulated and accessed. Decisions about how data is
processed, manipulated or shared aren’t made by an individual developer; they’re
established by the rules and polices of data governance.
The purpose of data governance isn’t to limit data access or insert a harsh, unusable
level of rigor that interferes with usage. Its premise is simply to ensure that data
becomes easier to access, use and share. The rigor introduced by a data governance
effort shouldn’t be overwhelming or burdensome. While data governance may initially
affect developers’ productivity (because of the new processes and work activities), the
benefits to downstream data constituents and dramatic improvements in productivity
should more than counteract the initial impact.
It should be no surprise that a data strategy has to include data governance. It’s simply
impractical to move forward – without an integrated governance effort – in establishing
a plan and road map to address all the ways you capture, store, manage and use infor-
mation. Data governance provides the necessary rigor over the data content as
changes occur to the technology, processing and methodology areas associated with
the data strategy effort.
12
While most companies have invested millions of dollars to improve data management,
most activities are point solutions addressing individual problems and issues. Few
people are aware of the impact a single investment may have in strengthening or
(unfortunately) weakening other projects or data initiatives. The challenge most organi-
zations have is realizing that data access and usage stretch across every organization
and skill level at their company.
The risk of investing in a point solution is that its focused nature prevents it from
addressing issues that cross organizational and project boundaries – and data issues by
nature are not specific to a single application or organization. Efforts to deliver new data
and/or analytics to a business won’t succeed unless all of the other data-related compo-
nents have been addressed: identify, store, provision, process and govern.
The strength of the data strategy components is that they help you identify focused,
tangible goals within each individual discipline area. Every company has a unique
combination of skills and a different set of strengths and weaknesses. Moving forward
with a data strategy starts with identifying the strengths and weaknesses that exist within
your data environment (within each component area) – and identifying an achievable
and measurable set of goals that will improve data access and sharing. The compo-
nents’ purpose is not to identify every potential activity within a data strategy; the
components offer visibility into the different disciplines that contribute to a data
strategy.
A data strategy initiative isn’t a once-and-done effort; by its very nature, a strategy is a
long-term set of goals. It’s common to identify a multiyear set of goals and identify a
shorter-term set of delivery milestones (e.g., quarterly or yearly). This allows the strategy
to undergo review and measurement on an ongoing basis to prevent the types of chal-
lenges the bank executive mentioned. The components provide a means of catego-
rizing activities and identifying shorter-term deliverables.
Most companies have already invested in data management activities across the
different component areas; unfortunately, the different areas are not typically coordi-
nated or aligned with one another. The bank’s data management challenges illustrate
how the lack of a data strategy (and aligned activities) can cause significant tribulations
for data access and usage. A data strategy gives visibility into the relationship each of
the components (or disciplines) have with one another. If you don’t coordinate the
different component activities, you risk delivering a series of point solutions that can’t
work together.
The idea behind a data strategy isn’t to build a perfect world that can address any
unforeseen data need. The power of a data strategy is that it positions you to deliver the
best possible solution as your organization’s needs grow and evolve. When new
requirements arise and gaps become visible, the component framework provides a
method for identifying the changes needed across your company’s various data
management capability and technology areas. Your data strategy is a road map and
means for addressing both existing and future data management needs.
Learn More
Find out how SAS® Data Management can help you build a successful data strategy by
visiting SAS Data Management Consulting.
To discover how SAS Data Management solutions can help you make decisions you can
trust, visit sas.com/data.
To contact your local SAS office, please visit: sas.com/offices
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2018, SAS Institute Inc.
All rights reserved. 108109_G62437.0118