Data Warehouse Agility Array Conference2011

25568 Genesee Trail Rd
Golden, Colorado 80401
(303) 526-0340

 Data Vault Modeling and Approach  DW2.0 and Unstructured Data  Master Data Management and Metadata

Data Warehousing Agility

BI-Event May 17
Hans Hultgren
© 2011 Genesee Academy, LLC
25568 Genesee Trail Rd
Golden, Colorado 80401


Welcome

• Definition of agility
• Types of agility
• Discuss current approaches
• Hyper-agility
• Observations from the field

– Also topics of operational data warehousing, operational bi, agile project
management techniques, agility oriented tools, and operational integration

Data Warehouse Agility

• Agility

– The overall measure of adaptability in terms of speed
& scope.

– Overall performance in adapting to change.

NOTE: Not warehouse machine throughput, near real time (NRT)
processing, and operational DW performance…

Ability of the data warehouse to adapt to change
Versus
Performance of an existing (steady state) warehouse

• Agility
– Agile in IT
• Agile Project Management
• Agile Software Development
– Agile Manifesto
We are uncovering better ways of developing software by doing it and helping others do it.
Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.

• Agile Modeling Driven Design (AMDD)
• Test-Driven Design (TDD)

• Agility in the Data Warehouse
– Agility in terms of Data Warehousing is related to the ability to
build incrementally.
– The approach today is more concerned with the development
of a business intelligence, data warehousing program – the
capability to increment (adapt and grow).
– Since the business is always changing (new reporting needs,
new business processes, new business units, new data
sources, etc.) the EDW program is an ongoing initiative that
needs to focus on adapting to these changes.
– Note: distinguish between operational integration and data
warehousing.

Types of Data Warehouse Agility
Change DW
New Source
New Mart

Data Warehouse

New Attribute

New Subject Area

Types of Data Warehouse Agility
– Presentation Layer Agility – ability to adapt to new business requirements
based on existing data elements in the EDW.
• Bottom Line: Ability to quickly and flexibly spin off new data marts
– New Data Source Agility – ability to assimilate new data sources into the
EDW architecture from stage to CDW+ and existing data marts.
• Bottom Line: Ability to quickly adapt to new data sources * using existing structures

– New Attribute Agility – ability to absorb new attributes into the EDW
architecture such that they can be loaded from the sources and integrate
new attributes in terms of business context.
• Bottom Line: Ability to quickly incorporate new attributes in the EDW and
apply business context to these attributes
– EDW Machine Agility – ability of the EDW machine (business and
technical) to accommodate a new subject area from stage to mart.
• Bottom Line: EDW response time; a function of people, process & tools
– Changes in the DW – ability to absorb other changes such as
integration logic, mappings, and business rules. Current


Presentation Layer Agility

– Presentation Layer Agility - ability to adapt to new business requirements
based on existing data elements in the EDW.
• Bottom Line: Ability to quickly and flexibly spin off new data marts

– In this layer, agility is measured as a function of the time it takes to design,
construct and deliver a new data mart.
– Variables in this layer include:
• Strength of the BI team to capture requirements and define data mart.
• Ability of ETL integration team to understand EDW model and mart.
• Strength and repeatability of ETL processes for sourcing the EDW.
• Strength and repeatability of ETL development, testing and delivery.
– Constraints:
• Dependent upon the existence of the data in the EDW.
• Dependent upon the level of business alignment of the data in the EDW.


New Data Source Agility

– New Data Source Agility - ability to assimilate new data sources into the
EDW architecture from stage to CDW+ and existing data marts.
• Bottom Line: Ability to quickly adapt to new data sources * using existing structures

model, build and load data into the EDW from a new source.
• Strength of the DW team to design the required model changes.
• Strength and repeatability of EDW development, testing and delivery.
• Ability of ETL integration team to understand new EDW model.
• Strength and repeatability of ETL processes for mapping and loading new
source into the EDW.
– Constraints:
• Level of alignment of the new source data with the existing model.
• Dependent upon the level of business alignment with the data in the EDW


New Attribute Technical Agility

– New Attribute (Technical) Agility - ability to absorb new attributes into
the EDW architecture such that they can be loaded from the sources.
• Bottom Line: Ability to quickly incorporate new attributes in the EDW

map, add and load a new attribute from a source.
• Strength of the DW team to design the required model changes.
• Ability of ETL integration team to understand new EDW attribute(s).
• Strength and repeatability of ETL processes for mapping and loading new
source attributes into the EDW.
– Constraints:
• Level of alignment of the new attribute with the existing model.
• Dependent upon business context being defined.


New Attribute Business Context

– New Attribute (Business) Context Agility - ability to integrate new
attributes in terms of business context.
• Bottom Line: Ability to quickly apply business context to new attributes

– In this layer, agility is measured as a function of the time it takes to align
business context with a new attribute from a source.
• Ability of the BI / DW team to accurately assess the business context of the
new source attribute.
– Constraints:
• Level of alignment of the new attribute with the existing model.
• Dependent upon the level of business alignment with the data in the EDW


EDW Machine Agility

– EDW Machine Agility – ability of the EDW machine (business and
technical) to accommodate a new subject area from stage to mart.
• Bottom Line: EDW response time; a function of people, process & tools

– In this layer, agility is measured as an overall function of the EDW machine
to integrate a new subject area from stage to mart.
• Strength of the BI / DW development team.
• Strength and ability of ETL integration team.
• Strength and repeatability of all BI / DW processes.
– Constraints:
• Executive sponsorship of the EDW program.
• Well defined organizational structure for BIW, BICC, Architecture and
Governance.


DW Agility Current Approaches

– Incremental Data Warehouse Development
• Data Vault modeling, 2G, Anchor, etc.

– Agile BI Programs (People, Process, Models & Data)
• Methodologies (Centennium, Platon, etc.)
• Templates, Tools & Automation (Wherescape, etc.)

– Alternate & New Paradigms for the Agile DW


DW Agility Components

– Absorb Changes
• Capture the Change
• Understand the Change

– A major constraint on agility is the required data
warehouse modeling changes...
• So we can capture the data (create the buckets)
• So we can understand the data (context, meaning)
– Align to business keys, classify, describe (metadata)


• Why create a Data Model for the DW?

• Model Data versus Meaning?

– Separate the capture of data from the meaning?
– The structure of a table versus the semantics
– Business meaning versus data loading
– As XML is to EDI

HYPER AGILITY AND THE
NAME VALUE PAIR (NVP)

Concept of Name/Value Pair

Cust_ID Lname Fname Add City State Zip Bdate
121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977
123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982
139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988
119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967
120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976

Each Value or ”data item” (record value for each attribute), is provided in a
List format paired with the corresponding Name or ”field name” (column
header) from the normalized table structure.

Moving to Name / Value Pair…

Concept of Name/Value Pair

Name Value


Moving to Name/Value Pair







V
N
A
A
L
M
U
E
E
Transpose
…with column headings…

Name Value
Cust_ID
Lname
121202
Lundquist
Name/Value Pair
Fname Carl
Add 22 Bird St
City NYC
State NY
Zip 98291
Bdate 10/9/1977
Cust_ID 123335
Lname Dahlgren
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
Cust_ID 139090
Lname Lundberg
Fname Scott

Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
The concept of the ”record” is effectively
City NYC
lost in this transformation.
State NY
Zip 98291 Now a RECORD is a set of Name/Value Pair
Bdate 10/9/1977 instances…
Cust_ID 123335
Lname Dahlgren
CON Lose resolution on the record.
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
Cust_ID 139090
Lname Lundberg
Fname Scott

Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
City NYC
State NY
Zip 98291
Bdate 10/9/1977
Cust_ID 123335
Lname Dahlgren
Fname Eva Also, the attributes are not defined in
Add 7 Academy advance – we don’t know what to expect and
City Madison we can’t check for attribute meaning,
State NJ definitions, domain values or data types.
Zip 7940
Bdate 2/12/1982
CON Attributes are not pre-defined.
Cust_ID 139090
Lname Lundberg
Fname Scott

Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
New attributes that are introduced into the
City NYC
source feed are added instantly to the DW.
State NY
There is no modeling delay, no code
Zip 98291
change, and no ETL impact…
Bdate 10/9/1977
CustClass Big
Cust_ID 123335 PRO Absorb new attributes instantly.
Lname Dahlgren
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
CustClass Small
Cust_ID 139090

Hyper Agility
• The solution to deal with these issues requires a further level of
abstraction which in effect moves the persisted (historized,
permanent, integrated) data store even further away from the
business context that it is intended to represent.
• The DW model – the data model itself – is then not readable (not
understandable). In fact ETL professionals will also find themselves
further removed from this model. To the extent that a model is
intuitive, self-descriptive, and aligned with business meaning, this
approach takes a step in the other direction.
• Moving towards addressing these business driven agility
requirements casues the model itself to move much further away
(an order of magnitude away) from the business. So far as to
become effectively a technical solution utilizing only abstract
representations.

Hyper Agility

• The context – the meaning of the data – will in these cases need to
be managed in a different way.
• This can include a form of persisted and historized metadata
concerning the mappings and business rules. In effect a form of
EAI within the DW.
• Or it might include a more traditional secondary DW layer.

DW AGILITY SUMMARY

• Consider specific Agility Requirements

• Classify Agility Types and consider Alternatives

• Distinguish between operational integration and DW

• Look to modeling techniques optimized for Data Warehouse

• Look at entire picture – people, process, models and data

• Consider specific methodologies, templates and tools

• Determine if hyper agility is a requirement

Questions?

www.GeneseeAcademy.com

CDVDM Certification Seminar

June 23-24
October 27-28

© 2011 Genesee Academy, LLC Hans@GeneseeAcademy.com
25568 Genesee Trail Rd USA +1 303.526.0340
Golden, Colorado 80401 Sverige 070 250 2102

28

Data Warehouse Agility Array Conference2011

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Data Warehouse Agility Array Conference2011 (20)

Recently uploaded (20)

Data Warehouse Agility Array Conference2011