Data Strategy
Data Strategy
A data strategy states how we want to manage and apply data. It gives a vision, principles, target state
that you want to achieve, and it tells how we’re going to get there. It also defines how our data will play
a role in helping our business achieve its objectives.
A data strategy is a common reference of methods, services, architectures, usage patterns and
procedures for acquiring, integrating, storing, securing, managing, monitoring, analyzing, consuming and
operationalizing data. It is, in effect, a checklist for developing a roadmap toward the transformation
journey. This includes clarifying the target vision and practical guidance for achieving that vision, with
clearly articulated success criteria and key performance indicators that can be used to evaluate and
rationalize all subsequent data initiatives.
A data strategy does not contain a detailed solution to use cases and specific technical problems. Nor is
it limited to high-level constructs intended only for senior leadership. Sustaining a successful data
strategy requires executive sponsorship and governance for alignment with corporate objectives and
enforced adherence. As corporate objectives evolve, so should the data strategy — keeping up not only
with how the business is operating but also with how supporting technologies and related innovations
are maturing.
If a data strategy isn’t in place, it is harder to use data analytics to make evidence based decisions, these
decisions can be less accurate, and it can take more time (and cost more) to come to these decisions.
A data strategy results in improved accuracy and timeliness, thus allowing organizations to make better
and faster business decisions.
“Identify data and understand its meaning regardless of structure, origin or location”
One of the most basic constructs for using and sharing data within a company is establishing a means to
identify and represent the content. Whether it’s structured or unstructured content, manipulating and
processing data isn’t feasible unless the data value has a name, a defined format and value
representation (even unstructured data has these details). Establishing consistent data element naming
and value conventions is core to using and sharing data. These details should be independent of how the
data is stored (in a database, file, etc.) or the physical system where it resides.
It’s also important to have a means of referencing and accessing metadata associated with your data
(definition, origin, location, domain values, etc.). In much the same way that having an accurate card
catalog supports an individual’s success in using a library to retrieve a book, successful data usage
depends on the existence of metadata (to help retrieve specific data elements). Consolidating business
terminology and meaning into a business data glossary is a common means to addressing part of the
challenge.
Metadata is critical for business data usage because it’s impossible to know the location and meaning of
all of the company’s business data – thousands of data elements across numerous data sources.
Without data identification details, we would be forced to undertake a data inventory and analysis effort
every time we wanted to include new data in our processing or analysis activities.
Without a data glossary and metadata (i.e., the “data card catalog”), we are likely to ignore some of
their most prized data assets because we won’t know they exist. If data is truly a corporate asset, a data
strategy has to ensure that all of the data can be identified.
Store:
“Persist data in a structure and location that supports easy, shared access and processing”
Data storage is one of the basic capabilities in an organization’s technology portfolio – yet it is a complex
discipline. Most organizations have mature methods for identifying and managing the storage needs of
individual application systems; each system receives sufficient storage to support its own processing and
storage requirements. Whether dealing with transactional processing applications, analytical systems or
even general purpose data storage (files, email, pictures, etc.), most organizations use sophisticated
methods to plan capacity and allocate storage to the various systems. Unfortunately, this approach only
reflects a “data creation” perspective. It does not encompass data sharing and usage.
The gap in this approach is that there’s rarely a plan for efficiently managing the storage required to
share and move data between systems. The reason is simple; the most visible data sharing in the world
is transactional in nature. Transactional details between applications are moved and shared to complete
a specific business process. Bulk data sharing isn’t well-understood and is often perceived as a one-off or
infrequent occurrence.
As our organization has evolved and data assets have grown, it has become clear that storing all data in
a single location isn’t feasible. It’s not that we can’t build a system large enough to hold the content. The
problem is that the size and distributed nature of our organizations – and the diversity of our data
sources – makes loading data into a single platform impractical. Everyone doesn’t need access to all of
the company’s data; they need access to specific data to support their individual needs. For this we will
be require to have Data Lake so that storing all the data that’s present will be easily accessed and
shared.
Provision:
“Package data so it can be reused and shared, and provide rules and access guidelines for the data”
Most application systems were built as individual, independent data processing engines that contained
all of the data necessary to perform their defined duties. There was little or no thought given to sharing
data across applications. Data was organized and stored for the convenience of the application that
collected, created and stored the content.
When the occasional request for data came up, an application developer created an extract by either
dumping that data into a file or building a one-off program to support another application’s request. The
developer didn’t think about ongoing data provisioning needs, or data reuse or sharing.
The logic and rules required to decode data for use by others is rarely documented or even known
outside of the application development team. When data is shared, it’s usually packaged at the
convenience of the application developer, not the data user. Such an approach might have been
acceptable in years past, when just a few systems and a couple of teams needed access. But it’s
completely impractical in today’s world where dozens of systems are managed and they rely on data
from multiple sources to support individual business processes. Packaging and sharing data at the
convenience of a single source developer – instead of the individuals managing 10 downstream systems
that require the data – is ridiculous. And expecting individuals to learn the dozens of source application
systems just so they can use the data is an incredible waste of time.
All data must be packaged and prepared for sharing. To treat data as an asset instead of a burden of
doing business, a data strategy has to address data provisioning as a standard business process.
Integrate:
“Move and combine data residing in disparate systems and provide a unified, consistent data view”
It’s no secret that data integration is one of the more costly IT activities; nearly 40 percent of the cost of
new development is consumed by data integration activities. Integration isn’t just about traditional data
extract, transform and load processes associated with data warehousing; it includes all data (structured,
semi structured, unstructured, etc.) and its movement across and between all systems (operational and
analytical). The challenge of data integration is to match data across multiple sources without having to
use an explicit key or unique identifier.
Developers spend enormous time building logic to match and link values across a multitude of sources.
Unfortunately, as each new development team requires access to individual data sources, they each
reconstruct or reinvent the logic needed to link values across the same data sources. The tragedy of data
integration is that this rework happens with each new project because what was learned in the past is
never captured for reuse.
What complicates matters even more is that most development teams operate in silos with little
awareness of what other teams are doing. The lack of data collaboration tools and methods often
prevents teams from realizing available code that they could potentially reuse. Integration should
ensure that data is distilled and merged into resultant data sets in a consistent and repeatable method
for all consuming systems.
If data is truly a corporate asset, and if accuracy and consistency are critical, a data strategy must include
integration as a core component.
Govern:
“Establish, manage and communicate information policies and mechanisms for effective data usage”
Most data governance initiatives start by addressing specific tactical issues (e.g., data accuracy, business
rule definition or terminology standards) and are confined to specific organizations or project efforts. As
governance awareness grows, and as data sharing and usage issues gain visibility, governance initiatives
often broaden in scope. As those initiatives expand, organizations may establish a set of information
policies, rules and methods to ensure uniform data usage, manipulation and management.
But all too often data governance is perceived as a rigor specific only to users and the analytics
environment. In fact, data governance applies to all applications, systems and staff members. The
biggest challenge with data governance is adoption – because data governance is an overarching set of
information policies and rules that everyone must respect and follow.
The reason for establishing a strong governance process is to ensure that once data is decoupled from
the application that created it, the rules and details of the data are known and respected by all other
data constituents. The role governance plays within an overall data strategy is to ensure that data is
managed consistently across the company.
Whether it is for determining security details, data correction logic, data naming standards or even
establishing new data rules, effective data governance makes sure data is consistently managed,
manipulated and accessed. Decisions about how data is processed, manipulated or shared aren’t made
by an individual developer; they’re established by the rules and policies of data governance.
The purpose of data governance isn’t to limit data access or insert a harsh, unusable level of rigor that
interferes with usage. Its premise is simply to ensure that data becomes easier to access, use and share.
The rigor introduced by a data governance effort shouldn’t be overwhelming or burdensome. While data
governance may initially affect developers’ productivity (because of the new processes and work
activities), the benefits to downstream data constituents and dramatic improvements in productivity
should more than counteract the initial impact.
It should be no surprise that a data strategy has to include data governance. It’s simply impractical to
move forward – without an integrated governance effort – in establishing a plan and road map to
address all the ways you capture, store, manage and use information. Data governance provides the
necessary rigor over the data content as changes occur to the technology, processing and methodology
areas associated with the data strategy effort.
Access:
1) To ensure accessibility: allowing authorized individuals to obtain and use data when and where
necessary and
2) To provide security: protecting privacy and preventing unauthorized use of sensitive
information.
Moreover, a comprehensive data strategy will tailor accessibility and security requirements to each data
asset, although establishment of an overarching framework to classify data assets for access and
security protocols is helpful for streamlining purposes.
Data access in a closed paradigm is often conceptualized as user authentication to ensure that only
authorized individuals have access to sensitive or restricted data; this principle is “Security”. Conversely,
accessibility in an open paradigm extends data out to these authorized users so that they have data
when and where they need it. Considerations include the devices and networks on which data may be
accessed, the applications that may be used to work with the data, and the timeliness of data. The data
strategy should balance the data vision and organizational goals against costs and potential return.
Further, since security restrictions may place additional limits on some data assets but not others,
accessibility may be asset specific.
While the acquisition provision of the data strategy covers how to get data into organizational systems,
the extraction and reporting component formalizes how to query and retrieve data from storage and
deliver it to users through both regular and ad hoc reporting to support day-to-day operations. Methods
for querying and extracting data from storage should be identified, along with user types associated
with each extraction method. The data strategy should establish roles for users who access raw data,
build reports, or simply access reports.
Additionally, the data strategy should establish a searchable inventory of reports and their intended use,
and the inventory should be maintained in an accessible area. Reports should be automated based on
return on investment that includes some forecasting about the stability of the environment or reporting
needs. Finally, the data strategy should institute some metrics for report usage, such as how often each
report is run and how many distinct users run a report in a given time period.
Analytics describe the past (descriptive analytics), explain the present (exploratory analytics), forecast
the future (predictive analytics), and propose future courses of action (prescriptive analytics). The data
strategy should acknowledge that a successful analytics system requires maturity in other aspects of the
data strategy, including data acquisition, governance, quality, access, usage, and extraction.
Budget:
While it is understood that there will be a budget requirement to develop a data strategy, a common
mistake is there is no plan for the lag between the development and endorsement of the strategy and
receiving the actual budget required to start implementing it. It is recommended to think about the
budget approval process long before the data strategy project commences and look to have some
budget available to get going on the implementation, or at least enough to tide over for a short period.
The development and implementation of a data strategy needs momentum, and we may lose it if we
don’t have the funding to at least commence the project.
Engagement:
In order to truly make data strategy successful, it is imperative that we engage with a wide variety of
departments and individuals across the organization. We need to find out what the pain points are, and
also the opportunities. It is recommended to visit other offices to see first-hand what actually happens,
how staff work, the issues they have and the opportunities for improvement. By interviewing a wide
cross-section of staff, we will have a diverse representation of current business practices and the
impacts our data strategy will have.
Business strategy:
Another mistake organizations can make is that they develop a data strategy which is technically correct
and lays out how the organization should manage their data, yet it doesn’t align with the broader
business strategy. In order for a data strategy to work, it needs to attach to the business drivers and
needs to demonstrate how it will help support the broader business strategy.
It is recommended that we assess what’s in it for the organization and the teams to have a new data
strategy in place. If there is going to be considerable effort to develop and implement a new data
strategy, the team need to understand why we’re doing it. If we are asking staff to work in new ways
that they may view as being an increased burden, at least in the short term, then we need to make sure
that they understand how it helps them and how it will impact the organization as a whole.
Approval process:
Before we commence our data strategy implementation project, it is important that we know what our
approval process is and who needs to be involved. A common mistake which slows down a project team
is they don’t factor in the approval process or align it with when their organization signs off new
projects.
Lack of expertise:
It is imperative that the team working on the data strategy deeply understand this area. They need to
have experience engaging with a wide range of stakeholders, both internal and external, and they need
to have the broader insight into how the strategy will help the organization achieve its goals. They also
need to be skilled at communication and selling the concept of a data strategy so that it has the best
chance of being accepted by the Board, senior management team and the staff.
Viable:
A data strategy should be visionary. It should take us from our current state to a more optimized future
state. It should improve the way we manage and apply our data and continue to improve business
processes. Our data strategy needs to be viable and practical while also allowing for growth. We may
need to break our implementation into phases so other areas of the business can implement the
necessary changes that are required to take the organization forward. This delivers value incrementally
and shows everyone in the organization what can be achieved and the benefits that can be realized. By
taking a practical, realistic approach to the data strategy, we are more likely to get it off the ground and
have a lasting positive change on the organization.