Data Vault ReConnect Speed Presenting AM Part TwoHans Hultgren
Â
The document discusses using a Data Vault approach for data warehousing large, complex datasets that may be unstructured or streaming. It describes how Data Vault separates the data storage from metadata and schemas to allow for flexibility. It also emphasizes that becoming an agile organization requires changes to both tools and company culture or "DNA". Finally, it compares Data Vault to other data modeling techniques and emphasizes learning from experience.
This document discusses Data Vault fundamentals and best practices. It introduces Data Vault modeling, which involves modeling hubs, links, and satellites to create an enterprise data warehouse that can integrate data sources, provide traceability and history, and adapt incrementally. The document recommends using data virtualization rather than physical data marts to distribute data from the Data Vault. It also provides recommendations for further reading on Data Vault, Ensemble modeling, data virtualization, and certification programs.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
Â
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session Iâll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. Iâll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then Iâll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus Iâll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
This document provides an introduction and overview of implementing Data Vault 2.0 on Snowflake. It begins with an agenda and the presenter's background. It then discusses why customers are asking for Data Vault and provides an overview of the Data Vault methodology including its core components of hubs, links, and satellites. The document applies Snowflake features like separation of workloads and agile warehouse scaling to support Data Vault implementations. It also addresses modeling semi-structured data and building virtual information marts using views.
Big data architectures and the data lakeJames Serra
Â
The document provides an overview of big data architectures and the data lake concept. It discusses why organizations are adopting data lakes to handle increasing data volumes and varieties. The key aspects covered include:
- Defining top-down and bottom-up approaches to data management
- Explaining what a data lake is and how Hadoop can function as the data lake
- Describing how a modern data warehouse combines features of a traditional data warehouse and data lake
- Discussing how federated querying allows data to be accessed across multiple sources
- Highlighting benefits of implementing big data solutions in the cloud
- Comparing shared-nothing, massively parallel processing (MPP) architectures to symmetric multi-processing (
Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.
If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).
Thank-you kindly,
Daniel Linstedt
This document discusses data mesh, a distributed data management approach for microservices. It outlines the challenges of implementing microservice architecture including data decoupling, sharing data across domains, and data consistency. It then introduces data mesh as a solution, describing how to build the necessary infrastructure using technologies like Kubernetes and YAML to quickly deploy data pipelines and provision data across services and applications in a distributed manner. The document provides examples of how data mesh can be used to improve legacy system integration, batch processing efficiency, multi-source data aggregation, and cross-cloud/environment integration.
Building an Effective Data Warehouse ArchitectureJames Serra
Â
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Enterprise Data Management Framework OverviewJohn Bao Vuu
Â
A solid data management foundation to support big data analytics and more importantly a data-driven culture is necessary for todayâs organizations.
A mature Data Management Program can reduce operational costs and enable rapid business growth and development. Data Management program must evolve to monetize data assets, deliver breakthrough innovation and help drive business strategies in new markets.
This document discusses Snowflake's data governance capabilities including challenges around data silos, complexity of data management, and balancing security and governance with data utilization. It provides an overview of Snowflake's platform for ingesting and sharing data across various sources and consumers. Key governance capabilities in Snowflake like object tagging, classification, anonymization, access history and row/column level policies are described. The document also previews upcoming conditional masking policies and provides examples of implementing object tagging and access policies in Snowflake.
Data Management, Metadata Management, and Data Governance â Working TogetherDATAVERSITY
Â
The data disciplines listed in the title must work together. The key to success requires understanding the boundaries and overlaps between the disciplines. Wouldnât it be great to be able to present the relationships between the disciplines in a simple all-in diagram? At the end of this webinar, you will be able to do just that.
This new RWDG webinar with Bob Seiner will outline how Data Management, Metadata Management, and Data Governance can be optimized to work together. Bob will share a diagram that has successfully communicated the relationship between these disciplines to leadership resulting in the disciplines working in harmony and delivering success.
Bob will share the following in this webinar:
- Categories of disciplines focused on managing data as an asset
- A definition of Data Management that embraces numerous data disciplines
- The importance of Metadata -Management to all data disciplines
- Why data and metadata require formal governance
- A graphic that effectively exhibits the relationship between the disciplines
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
Â
The document provides an introduction to Data Vault data modeling and discusses how it enables agile data warehousing. It describes the core structures of a Data Vault model including hubs, links, and satellites. It explains how the Data Vault approach provides benefits such as model agility, productivity, and extensibility. The document also summarizes the key changes in the Data Vault 2.0 methodology.
This document discusses how big data analytics is used in the telecom business. It defines big data and analytics, and explains why big data analytics is needed. It provides examples of how major telecom companies like Reliance Jio, Vodafone, and Globe Telecom are using big data analytics for applications like fraud prevention, targeted marketing, customer segmentation, and network optimization. Case studies show how these companies are gaining business benefits and competitive advantages from analyzing the large amounts of customer data in their possession.
The document discusses data mesh vs data fabric architectures. It defines data mesh as a decentralized data processing architecture with microservices and event-driven integration of enterprise data assets across multi-cloud environments. The key aspects of data mesh are that it is decentralized, processes data at the edge, uses immutable event logs and streams for integration, and can move all types of data reliably. The document then provides an overview of how data mesh architectures have evolved from hub-and-spoke models to more distributed designs using techniques like kappa architecture and describes some use cases for event streaming and complex event processing.
Data Mesh is a new socio-technical approach to data architecture, first described by Zhamak Dehghani and popularised through a guest blog post on Martin Fowler's site.
Since then, community interest has grown, due to Data Mesh's ability to explain and address the frustrations that many organisations are experiencing as they try to get value from their data. The 2022 publication of Zhamak's book on Data Mesh further provoked conversation, as have the growing number of experience reports from companies that have put Data Mesh into practice.
So what's all the fuss about?
On one hand, Data Mesh is a new approach in the field of big data. On the other hand, Data Mesh is application of the lessons we have learned from domain-driven design and microservices to a data context.
In this talk, Chris and Pablo will explain how Data Mesh relates to current thinking in software architecture and the historical development of data architecture philosophies. They will outline what benefits Data Mesh brings, what trade-offs it comes with and when organisations should and should not consider adopting it.
Emerging Trends in Data Architecture â Whatâs the Next Big Thing?DATAVERSITY
Â
With technological innovation and change occurring at an ever-increasing rate, itâs hard to keep track of whatâs hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.Â
Data governance Program PowerPoint Presentation Slides SlideTeam
Â
The document discusses the need for data governance programs in companies. It outlines why companies suffer without effective data governance, such as applications being unable to communicate and inconsistencies in data leading to increased costs. The document then compares manual and automated approaches to data governance. It provides details on key aspects of building a data governance program, including establishing a framework, defining roles and responsibilities, and outlining a roadmap for improving data governance over time.
Polestar we hope to bring the power of data to organizations across industries helping them analyze billions of data points and data sets to provide real-time insights, and enabling them to make critical decisions to grow their business.
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
Introduction to Data Warehouse. Summarized from the first chapter of 'The Data Warehouse Lifecyle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses' by Ralph Kimball
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
Â
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Â
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesnât address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
Data Vault ReConnect Speed Presenting AM Part OneHans Hultgren
Â
First set of 5x5 Speed Presenting Updates:
1) Core Business Concept
2) Ensemble Modeling
3) Re-Defining the Link
4) Re-Defining the Satellite
5) Architectural Layers
Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.
If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).
Thank-you kindly,
Daniel Linstedt
This document discusses data mesh, a distributed data management approach for microservices. It outlines the challenges of implementing microservice architecture including data decoupling, sharing data across domains, and data consistency. It then introduces data mesh as a solution, describing how to build the necessary infrastructure using technologies like Kubernetes and YAML to quickly deploy data pipelines and provision data across services and applications in a distributed manner. The document provides examples of how data mesh can be used to improve legacy system integration, batch processing efficiency, multi-source data aggregation, and cross-cloud/environment integration.
Building an Effective Data Warehouse ArchitectureJames Serra
Â
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Enterprise Data Management Framework OverviewJohn Bao Vuu
Â
A solid data management foundation to support big data analytics and more importantly a data-driven culture is necessary for todayâs organizations.
A mature Data Management Program can reduce operational costs and enable rapid business growth and development. Data Management program must evolve to monetize data assets, deliver breakthrough innovation and help drive business strategies in new markets.
This document discusses Snowflake's data governance capabilities including challenges around data silos, complexity of data management, and balancing security and governance with data utilization. It provides an overview of Snowflake's platform for ingesting and sharing data across various sources and consumers. Key governance capabilities in Snowflake like object tagging, classification, anonymization, access history and row/column level policies are described. The document also previews upcoming conditional masking policies and provides examples of implementing object tagging and access policies in Snowflake.
Data Management, Metadata Management, and Data Governance â Working TogetherDATAVERSITY
Â
The data disciplines listed in the title must work together. The key to success requires understanding the boundaries and overlaps between the disciplines. Wouldnât it be great to be able to present the relationships between the disciplines in a simple all-in diagram? At the end of this webinar, you will be able to do just that.
This new RWDG webinar with Bob Seiner will outline how Data Management, Metadata Management, and Data Governance can be optimized to work together. Bob will share a diagram that has successfully communicated the relationship between these disciplines to leadership resulting in the disciplines working in harmony and delivering success.
Bob will share the following in this webinar:
- Categories of disciplines focused on managing data as an asset
- A definition of Data Management that embraces numerous data disciplines
- The importance of Metadata -Management to all data disciplines
- Why data and metadata require formal governance
- A graphic that effectively exhibits the relationship between the disciplines
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
Â
The document provides an introduction to Data Vault data modeling and discusses how it enables agile data warehousing. It describes the core structures of a Data Vault model including hubs, links, and satellites. It explains how the Data Vault approach provides benefits such as model agility, productivity, and extensibility. The document also summarizes the key changes in the Data Vault 2.0 methodology.
This document discusses how big data analytics is used in the telecom business. It defines big data and analytics, and explains why big data analytics is needed. It provides examples of how major telecom companies like Reliance Jio, Vodafone, and Globe Telecom are using big data analytics for applications like fraud prevention, targeted marketing, customer segmentation, and network optimization. Case studies show how these companies are gaining business benefits and competitive advantages from analyzing the large amounts of customer data in their possession.
The document discusses data mesh vs data fabric architectures. It defines data mesh as a decentralized data processing architecture with microservices and event-driven integration of enterprise data assets across multi-cloud environments. The key aspects of data mesh are that it is decentralized, processes data at the edge, uses immutable event logs and streams for integration, and can move all types of data reliably. The document then provides an overview of how data mesh architectures have evolved from hub-and-spoke models to more distributed designs using techniques like kappa architecture and describes some use cases for event streaming and complex event processing.
Data Mesh is a new socio-technical approach to data architecture, first described by Zhamak Dehghani and popularised through a guest blog post on Martin Fowler's site.
Since then, community interest has grown, due to Data Mesh's ability to explain and address the frustrations that many organisations are experiencing as they try to get value from their data. The 2022 publication of Zhamak's book on Data Mesh further provoked conversation, as have the growing number of experience reports from companies that have put Data Mesh into practice.
So what's all the fuss about?
On one hand, Data Mesh is a new approach in the field of big data. On the other hand, Data Mesh is application of the lessons we have learned from domain-driven design and microservices to a data context.
In this talk, Chris and Pablo will explain how Data Mesh relates to current thinking in software architecture and the historical development of data architecture philosophies. They will outline what benefits Data Mesh brings, what trade-offs it comes with and when organisations should and should not consider adopting it.
Emerging Trends in Data Architecture â Whatâs the Next Big Thing?DATAVERSITY
Â
With technological innovation and change occurring at an ever-increasing rate, itâs hard to keep track of whatâs hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.Â
Data governance Program PowerPoint Presentation Slides SlideTeam
Â
The document discusses the need for data governance programs in companies. It outlines why companies suffer without effective data governance, such as applications being unable to communicate and inconsistencies in data leading to increased costs. The document then compares manual and automated approaches to data governance. It provides details on key aspects of building a data governance program, including establishing a framework, defining roles and responsibilities, and outlining a roadmap for improving data governance over time.
Polestar we hope to bring the power of data to organizations across industries helping them analyze billions of data points and data sets to provide real-time insights, and enabling them to make critical decisions to grow their business.
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
Introduction to Data Warehouse. Summarized from the first chapter of 'The Data Warehouse Lifecyle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses' by Ralph Kimball
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
Â
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Â
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesnât address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
Data Vault ReConnect Speed Presenting AM Part OneHans Hultgren
Â
First set of 5x5 Speed Presenting Updates:
1) Core Business Concept
2) Ensemble Modeling
3) Re-Defining the Link
4) Re-Defining the Satellite
5) Architectural Layers
Data Vault ReConnect Speed Presenting PM Part FourHans Hultgren
Â
Third set of 5x5 Speed Presenting Updates:
1) Selling Data Vault - Elwyn Lloyd Jones
2) DV as Leverage for Data Migration - Antoine Stelma
3) QUIPU - Juan-JosĂŠ van der Linden
4) Volvo Data Vault Automation - Frederik Naessens
5) Are we ready for a DV Conversion Standard - Frederik Naessens, Stijn Roelens, Kristof Vanduren
A strong relationship with the founder
of Data Vault for over 3 years now.
Supporting your business with 40+
certified consultants.
Incorporated as the preferred
Enterprise Data Warehouse modelling
paradigm in the Logica BI Framework.
Satisfied customers in many countries
and industry sectors
Data Vault ReConnect Speed Presenting PM Part ThreeHans Hultgren
Â
The document discusses how Data Vault can be efficiently implemented in SAP Hana, noting that while SAP Hana benefits from its columnar architecture by using a single broad satellite per hub, splitting satellites based on rate-of-change can also be efficient for storage, and multiple satellites are preferable if data comes from multiple sources to improve write efficiency. It also recommends creating one processed information table per hub rather than SQL views to allow for efficient referential joins in Hana.
Data vault seminar May 5-6 Dommel - The factory and the workshopjohannesvdb
Â
This document discusses an open source metadata driven data warehousing project at a Dutch waterboard. It describes the basic architecture of using open source tools like Pentaho and Quipu to build a source data vault, business data vault, and generate data marts. It also covers lessons learned, including that it is possible to quickly build an EDW with open source software, but some challenges around automation and performance still remain. The goal is to deliver added business value in a cost effective way.
A Lean Data Warehouse, compared with a traditional one (with a dimensional or 3rd normal form model), is faster to deliver, freer of waste, and inherently more adaptable to change. From my experience in the trenches, each of these benefits fit squarely in the 'must have' category. Data Vault is an excellent logical architecture with which to design a Lean Data Warehouse. This article describes the priorities of a Lean Data Warehouse, and compares the two traditional modeling methods with Data Vault, concluding that Data Vault is more suited to deliver on those Lean priorities.
Präsentation auf der DOAG Konferenz
Metadaten sind ein häufig vernachlässigtes Thema, da Metadaten als langweilig betrachtet oder auch nicht bewusst wahr genommen werden. Auch die eher abstrakten Beschreibungen wie "Metadaten sind Daten ßber Daten" sind nicht gerade hilfreich.
In der Präsentation werden die verschiedenen Arten von Metadaten (fachlich, technisch, prozessual) besprochen. Es wird darauf eingegangen, wie diese in einem Data Vault Projekt genutzt wurden, um z.B. Vorgaben festzulegen oder Code zu generieren.
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Â
Part 4(4)
The slides contain a DWH lecture given for students in 5th semester. Content:
- Introduction DWH and Business Intelligence
- DWH architecture
- DWH project phases
- Logical DWH Data Model
- Multidimensional data modeling
- Data import strategies / data integration / ETL
- Frontend: Reporting and anaylsis, information design
- OLAP
This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of the house.
IF you want to use these slides, please put (C) Dan Linstedt, all rights reserved, http://LearnDataVault.com
Agile Methods and Data Warehousing (2016 update)Kent Graziano
Â
This presentation takes a look at the Agile Manifesto and the 12 Principles of Agile Development and discusses how these apply to Data Warehousing and Business Intelligence projects. Several examples and details from my past experience are included. Includes more details on using Data Vault as well. (I gave this presentation at OUGF14 in Helsinki, Finland and again in 2016 for TDWI Nashville.)
Agile BI via Data Vault and ModelstormingDaniel Upton
Â
Audience: Business Intelligence Architects, Project Managers and Sponsors. This slideshow accompanies a video presentation of the same name, available at http://youtu.be/e0cHFdeGEeE.
DAMA, Oregon Chapter, 2012 presentation - an introduction to Data Vault modeling. I will be covering parts of the methodology, comparison and contrast of issues in general for the EDW space. Followed by a brief technical introduction of the Data Vault modeling method.
After the presentation i I will be providing a demonstration of the ETL loading layers, LIVE!
You can find more on-line training at: http://LearnDataVault.com/training
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Â
The document provides information about Andreas Buckenhofer and Daimler TSS. It discusses Daimler TSS's locations, what attendees will learn about data warehouse data modeling and OLAP, and an overview of data modeling for OLTP applications, Codd's normal forms, and dimensional modeling for data marts.
Data Vault: Data Warehouse Design Goes AgileDaniel Upton
Â
Data Warehouse (especially EDW) design needs to get Agile. This whitepaper introduces Data Vault to newcomers, and describes how it adds agility to DW best practices.
Agile Data Warehouse Design for Big Data PresentationVishal Kumar
Â
Synopsis:
[Video link: http://www.youtube.com/watch?v=ZNrTxSU5IQ0 ]
Jim Stagnitto and John DiPietro of consulting firm a2c) will discuss Agile Data Warehouse Design - a step-by-step method for data warehousing / business intelligence (DW/BI) professionals to better collect and translate business intelligence requirements into successful dimensional data warehouse designs.
The method utilizes BEAMⲠ(Business Event Analysis and Modeling) - an agile approach to dimensional data modeling that can be used throughout analysis and design to improve productivity and communication between DW designers and BI stakeholders. BEAMⲠbuilds upon the body of mature "best practice" dimensional DW design techniques, and collects "just enough" non-technical business process information from BI stakeholders to allow the modeler to slot their business needs directly and simply into proven DW design patterns.
BEAMⲠencourages DW/BI designers to move away from the keyboard and their entity relationship modeling tools and begin "white board" modeling interactively with BI stakeholders. With the right guidance, BI stakeholders can and should model their own BI data requirements, so that they can fully understand and govern what they will be able to report on and analyze.
The BEAMⲠmethod is fully described in
Agile Data Warehouse Design - a text co-written by Lawrence Corr and Jim Stagnitto.
About the speaker:
Jim Stagnitto Director of a2c Data Services Practice
Data Warehouse Architect: specializing in powerful designs that extract the maximum business benefit from Intelligence and Insight investments.
Master Data Management (MDM) and Customer Data Integration (CDI) strategist and architect.
Data Warehousing, Data Quality, and Data Integration thought-leader: co-author with Lawrence Corr of "Agile Data Warehouse Design", guest author of Ralph Kimballâs âData Warehouse Designerâ column, and contributing author to Ralph and Joe Caserta's latest book: âThe DW ETL Toolkitâ.
John DiPietro Chief Technology Officer at A2C IT Consulting
John DiPietro is the Chief Technology Officer for a2c. Mr. DiPietro is responsible
for setting the vision, strategy, delivery, and methodologies for a2câs Solution
Practice Offerings for all national accounts. The a2c CTO brings with him an
expansive depth and breadth of specialized skills in his field.
Sponsor Note:
Thanks to:
Microsoft NERD for providing awesome venue for the event.
http://A2C.com IT Consulting for providing the food/drinks.
http://Cognizeus.com for providing book to give away as raffle.
The recent focus on Big Data in the data management community brings with it a paradigm shiftâfrom the more traditional top-down, âdesign then buildâ approach to data warehousing and business intelligence, to the more bottom up, âdiscover and analyzeâ approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data âA Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
This document discusses key performance indicators (KPIs) for measuring agile projects. It begins by defining metrics and KPIs, noting that KPIs should be tied to strategic objectives and have defined targets. It then discusses characteristics of good KPIs and provides examples of both traditional and agile KPIs related to time, effort, scope, and quality. The document cautions that too many KPIs can be useless and advocates keeping metrics simple. It also discusses challenges like cheating on metrics and provides tips for using tools and dashboards to effectively measure agile performance.
Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for the last 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with a detailed introduction to the technical components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics for how to build, and design structures when using the Data Vault modeling technique. The target audience is anyone wishing to explore implementing a Data Vault style data model for an Enterprise Data Warehouse, Operational Data Warehouse, or Dynamic Data Integration Store. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Data Warehousing in the Cloud: Practical Migration Strategies SnapLogic
Â
Dave Wells of Eckerson Group discusses why cloud data warehousing has become popular, the many benefits, and the corresponding challenges. Migrating an existing data warehouse to the cloud is a complex process of moving schema, data, and ETL. The complexity increases when architectural modernization, restructuring of database schema, or rebuilding of data pipelines is needed.
Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...TeamQuest Corporation
Â
IT organizations have a wealth of Service Management and Service Delivery tools, processes and metrics that typically exist in relative isolation. This session will present detailed real-life examples of how existing tools and metrics can be brought together using big data techniques to optimize costs and performance of IT environments.
Vijay Kumar Singh is a software engineer with 3 years of experience in SQL/PLSQL development on Oracle and SQL Server databases. He has worked on projects in banking, capital markets, and retail domains. His responsibilities have included developing stored procedures and functions, preparing test cases, analyzing requirements, and coordinating with clients. He is seeking new assignments involving database development with Oracle SQL/PLSQL.
IT Professional with 9 years of Data Warehousing experience in the areas of ETL design and Development.Excellent Experience in Requirement Gathering, Designing, Developing, Documenting, Testing of ETL jobs and mappings in Parallel jobs using Data Stage to populate tables in Data Warehouse and Data marts.
Parthiban Loganathan is a software engineer with over 4 years of experience in software development using technologies like Informatica Power Center, Unix Shell Scripting, and SQL Server. He has extensive experience in ETL processes involving data extraction, transformation, loading, and maintenance. His most recent role is as an ETL Developer using Informatica to generate extracts from various healthcare databases for clients like IBM Castlight and MVP.
Bringing Agility and Flexibility to Data Design and IntegrationDATAVERSITY
Â
Phasic Systems Inc provides agile data solutions to help organizations overcome challenges with data integration and governance. Their methods treat the entire data lifecycle as a continuous process to provide flexibility and adaptability. Phasic Systems uses agile methodologies, tools like DataStar Discovery and DataStar Unifier, and a hybrid data model called Corporate NoSQL to integrate data in days rather than months while maintaining governance. Their approach helps organizations access the right data at the right time to support business needs.
This professional summary outlines Shashank Jain's 5+ years of experience in Microsoft Business Intelligence and Data Warehousing tools like SQL Server Integration Services, Reporting Services, and Analysis Services. He has experience designing, developing, and supporting BI solutions, including ETL development, cube modeling, report scheduling, and more. Shashank is seeking a role where he can utilize his skills and experience in MSBI tools, databases, and programming languages like Java and SQL.
The document discusses optimizing a data warehouse by offloading some workloads and data to Hadoop. It identifies common challenges with data warehouses like slow transformations and queries. Hadoop can help by handling large-scale data processing, analytics, and long-term storage more cost effectively. The document provides examples of how customers benefited from offloading workloads to Hadoop. It then outlines a process for assessing an organization's data warehouse ecosystem, prioritizing workloads for migration, and developing an optimization plan.
Modern data warehouses need to be modernized to handle big data, integrate multiple data silos, reduce costs, and reduce time to market. A modern data warehouse blueprint includes a data lake to land and ingest structured, unstructured, external, social, machine, and streaming data alongside a traditional data warehouse. Key challenges for modernization include making data discoverable and usable for business users, rethinking ETL to allow for data blending, and enabling self-service BI over Hadoop. Common tactics for modernization include using a data lake as a landing zone, offloading infrequently accessed data to Hadoop, and exploring data in Hadoop to discover new insights.
This document discusses an agile approach to developing a data warehouse. It advocates using an Agile Enterprise Data Model to provide vision and guidance. The "Spock Approach" is described, which uses an operational data store, dimensional data warehouse, and iterative development of data marts. Data visualization techniques like data hexes are recommended to improve planning and visibility. Leadership, version control, adaptability, refinement, and refactoring are identified as important ongoing processes for an agile data warehouse project.
- The document contains the resume of Vivek Kumar detailing his IT experience including 5+ years as a Siebel EIM Consultant, EIM Developer, Siebel Batch Interface Developer, PL/SQL Developer, Informatica Developer and Business Analyst.
- It lists his technical skills like Siebel, Informatica, SQL, and various projects he has worked on including data migration projects from legacy systems to Siebel for clients like GE, KPMG, and Global Benefits Group.
- Provides summaries of some of these projects outlining his responsibilities like requirements gathering, data mapping, ETL design, testing and support.
Vasudevan Venkatraman has over 11 years of experience working in the IT industry, including 7+ years of experience with Oracle PL/SQL, data warehousing, and 3+ years in performance consulting and applications database administration. He has experience designing and developing applications using Oracle PL/SQL, Hadoop, and big data technologies. Currently he works as an Assistant Consultant at TCS focusing on data warehousing projects using Oracle and Hadoop.
This resume summarizes Ketan Jalan's experience as a developer with skills in Java, Unix scripting, SQL, and middleware tools like SeeBeyond SRE, Datastage, Axway, and Reactivity. He has over 6 years of experience in roles supporting manufacturing domains. His current role involves sustaining Exchange applications that use various technologies. Previous roles include developing enhancements, analyzing code and interfaces, and providing production support. He also has experience in team management, requirements gathering, and customer interaction.
In this document, we will present a very brief introduction to BigData (what is BigData?), Hadoop (how does Hadoop fits the picture?) and Cloudera Hadoop (what is the difference between Cloudera Hadoop and regular Hadoop?).
Please note that this document is for Hadoop beginners looking for a place to start.
- The document provides a summary of a candidate's experience as a Test Engineer including 2.5 years of experience in software testing, specifically testing of data warehouse/ETL processes. Key tools used include Informatica, Oracle, Quality Centre, SQL, and UNIX. Details are given about 3 projects involving ETL testing for various clients across different industries.
Ashish Maheshwari has over 8 years of experience in data modeling, integration, analytics, mining, ETL processes, BI applications, and data warehousing. He currently works as a Technical Lead at Ness Technologies, Bangalore and has experience working on projects for clients such as FGL, Marks, and Target. His skills include Oracle SQL, PL/SQL, Datastage, data warehousing concepts, R programming, and Agile development processes.
This document contains a resume for Chakravarthy Uppara. It summarizes his contact information, objective, 6+ years of experience in database development using SQL Server and SSIS. It details his roles and responsibilities in projects for Tesco and Accenture developing ETL processes and interfaces to integrate various systems. His technical skills include SQL Server, SSIS, SSAS and .NET. He holds a B-Tech in Information Technology and is currently employed as a Senior Software Engineer at Tesco in Bangalore, India.
Vikas Sogani is seeking a position as a lead data analyst with over 8 years of experience in data warehousing, ETL design, and business analysis. He has extensive experience working on projects for clients like Barclays and GE Healthcare. Currently, he is working as a lead data analyst for Barclays on their retail banking data warehouse. His responsibilities include requirements gathering, data modeling, ETL development and testing, and resolving data quality issues. He has strong skills in Teradata, Informatica, and business domains like retail banking and risk management.
This document provides a summary of William (Bill) Gulley's professional experience and qualifications. He has over 10 years of experience as a Business/Systems Analyst with a focus on data warehousing, ETL, and Agile methodologies. His technical skills include SQL, SSIS, Informatica, and working with technologies like Teradata, SQL Server, Oracle, and Hadoop. He has experience leading requirements gathering and analysis in both Agile and waterfall projects across multiple industries.
Alok Singh is seeking challenging assignments in Business Intelligence/Data warehousing. He has nearly 7 years of experience in BI/DW, ETL, data integration, and data warehousing solution design. He is proficient in SQL, ETL tools like Informatica and SSIS, and visualization tools like QlikView and Tableau. He has experience designing and developing ETL solutions, requirements gathering, and data analysis. His past roles include positions at Technologia, Subex, and Reliance Communications where he worked on projects involving Teradata, Oracle, billing systems, and fraud detection. He has a bachelor's degree in electronics and telecommunications.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Â
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
Â
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
đ Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
đ¨âđŤ Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
HCL Nomad Web â Best Practices and Managing Multiuser Environmentspanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed âautomaticallyâ in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browserâs cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
Â
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Â
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Hereâs how:
⢠Optimized Torizon OS & Yocto Support â Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
⢠Seamless Integration with i.MX 8M Plus and i.MX 95 â Toradex SMARC solutions leverage NXPâs i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
⢠Secure and Reliable â With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
⢠Containerized Workflows for AI & IoT â Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
⢠Strong Ecosystem & Developer Support â Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradexâs Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Â
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, âThe Coding War Games.â
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we donât find ourselves having the same discussion again in a decade?
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Â
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where weâll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, weâll cover how Rustâs unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Mobile App Development Company in Saudi ArabiaSteve Jonas
Â
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Â
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
đ Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
đ Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
Â
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Â
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just toolsâthey're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
Â
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
Â
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
HCL Nomad Web â Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden âautomatischâ im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und LĂśsung häufiger Probleme in HCL Nomad Web untersuchen, einschlieĂlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
Â
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
Â
Data Warehouse Agility Array Conference2011
1. 25568 Genesee Trail Rd
Golden, Colorado 80401
(303) 526-0340
ďś Data Vault Modeling and Approach ďś DW2.0 and Unstructured Data ďś Master Data Management and Metadata
Data Warehousing Agility
BI-Event May 17
Hans Hultgren
Š 2011 Genesee Academy, LLC
25568 Genesee Trail Rd
Golden, Colorado 80401
Š 2011 Genesee Academy, LLC
2. Welcome
⢠Definition of agility
⢠Types of agility
⢠Discuss current approaches
⢠Hyper-agility
⢠Observations from the field
â Also topics of operational data warehousing, operational bi, agile project
management techniques, agility oriented tools, and operational integration
3. Data Warehouse Agility
⢠Agility
â The overall measure of adaptability in terms of speed
& scope.
â Overall performance in adapting to change.
NOTE: Not warehouse machine throughput, near real time (NRT)
processing, and operational DW performanceâŚ
Ability of the data warehouse to adapt to change
Versus
Performance of an existing (steady state) warehouse
4. Data Warehouse Agility
⢠Agility
â Agile in IT
⢠Agile Project Management
⢠Agile Software Development
â Agile Manifesto
We are uncovering better ways of developing software by doing it and helping others do it.
Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.
⢠Agile Modeling Driven Design (AMDD)
⢠Test-Driven Design (TDD)
5. Data Warehouse Agility
⢠Agility in the Data Warehouse
â Agility in terms of Data Warehousing is related to the ability to
build incrementally.
â The approach today is more concerned with the development
of a business intelligence, data warehousing program â the
capability to increment (adapt and grow).
â Since the business is always changing (new reporting needs,
new business processes, new business units, new data
sources, etc.) the EDW program is an ongoing initiative that
needs to focus on adapting to these changes.
â Note: distinguish between operational integration and data
warehousing.
6. Types of Data Warehouse Agility
Change DW
New Source
New Mart
Data Warehouse
New Attribute
New Subject Area
7. Types of Data Warehouse Agility
â Presentation Layer Agility â ability to adapt to new business requirements
based on existing data elements in the EDW.
⢠Bottom Line: Ability to quickly and flexibly spin off new data marts
â New Data Source Agility â ability to assimilate new data sources into the
EDW architecture from stage to CDW+ and existing data marts.
⢠Bottom Line: Ability to quickly adapt to new data sources * using existing structures
â New Attribute Agility â ability to absorb new attributes into the EDW
architecture such that they can be loaded from the sources and integrate
new attributes in terms of business context.
⢠Bottom Line: Ability to quickly incorporate new attributes in the EDW and
apply business context to these attributes
â EDW Machine Agility â ability of the EDW machine (business and
technical) to accommodate a new subject area from stage to mart.
⢠Bottom Line: EDW response time; a function of people, process & tools
â Changes in the DW â ability to absorb other changes such as
integration logic, mappings, and business rules. Current
Š 2011 Genesee Academy, LLC
8. Presentation Layer Agility
â Presentation Layer Agility - ability to adapt to new business requirements
based on existing data elements in the EDW.
⢠Bottom Line: Ability to quickly and flexibly spin off new data marts
â In this layer, agility is measured as a function of the time it takes to design,
construct and deliver a new data mart.
â Variables in this layer include:
⢠Strength of the BI team to capture requirements and define data mart.
⢠Ability of ETL integration team to understand EDW model and mart.
⢠Strength and repeatability of ETL processes for sourcing the EDW.
⢠Strength and repeatability of ETL development, testing and delivery.
â Constraints:
⢠Dependent upon the existence of the data in the EDW.
⢠Dependent upon the level of business alignment of the data in the EDW.
Š 2011 Genesee Academy, LLC
9. New Data Source Agility
â New Data Source Agility - ability to assimilate new data sources into the
EDW architecture from stage to CDW+ and existing data marts.
⢠Bottom Line: Ability to quickly adapt to new data sources * using existing structures
â In this layer, agility is measured as a function of the time it takes to design,
model, build and load data into the EDW from a new source.
â Variables in this layer include:
⢠Strength of the DW team to design the required model changes.
⢠Strength and repeatability of EDW development, testing and delivery.
⢠Ability of ETL integration team to understand new EDW model.
⢠Strength and repeatability of ETL processes for mapping and loading new
source into the EDW.
â Constraints:
⢠Level of alignment of the new source data with the existing model.
⢠Dependent upon the level of business alignment with the data in the EDW
Š 2011 Genesee Academy, LLC
10. New Attribute Technical Agility
â New Attribute (Technical) Agility - ability to absorb new attributes into
the EDW architecture such that they can be loaded from the sources.
⢠Bottom Line: Ability to quickly incorporate new attributes in the EDW
â In this layer, agility is measured as a function of the time it takes to design,
map, add and load a new attribute from a source.
â Variables in this layer include:
⢠Strength of the DW team to design the required model changes.
⢠Strength and repeatability of EDW development, testing and delivery.
⢠Ability of ETL integration team to understand new EDW attribute(s).
⢠Strength and repeatability of ETL processes for mapping and loading new
source attributes into the EDW.
â Constraints:
⢠Level of alignment of the new attribute with the existing model.
⢠Dependent upon business context being defined.
Š 2011 Genesee Academy, LLC
11. New Attribute Business Context
â New Attribute (Business) Context Agility - ability to integrate new
attributes in terms of business context.
⢠Bottom Line: Ability to quickly apply business context to new attributes
â In this layer, agility is measured as a function of the time it takes to align
business context with a new attribute from a source.
â Variables in this layer include:
⢠Ability of the BI / DW team to accurately assess the business context of the
new source attribute.
â Constraints:
⢠Level of alignment of the new attribute with the existing model.
⢠Dependent upon the level of business alignment with the data in the EDW
Š 2011 Genesee Academy, LLC
12. EDW Machine Agility
â EDW Machine Agility â ability of the EDW machine (business and
technical) to accommodate a new subject area from stage to mart.
⢠Bottom Line: EDW response time; a function of people, process & tools
â In this layer, agility is measured as an overall function of the EDW machine
to integrate a new subject area from stage to mart.
â Variables in this layer include:
⢠Strength of the BI / DW development team.
⢠Strength and repeatability of EDW development, testing and delivery.
⢠Strength and ability of ETL integration team.
⢠Strength and repeatability of all BI / DW processes.
â Constraints:
⢠Executive sponsorship of the EDW program.
⢠Well defined organizational structure for BIW, BICC, Architecture and
Governance.
Š 2011 Genesee Academy, LLC
14. DW Agility Current Approaches
â Incremental Data Warehouse Development
⢠Data Vault modeling, 2G, Anchor, etc.
â Agile BI Programs (People, Process, Models & Data)
⢠Methodologies (Centennium, Platon, etc.)
⢠Templates, Tools & Automation (Wherescape, etc.)
â Alternate & New Paradigms for the Agile DW
Š 2011 Genesee Academy, LLC
15. DW Agility Components
â Absorb Changes
⢠Capture the Change
⢠Understand the Change
â A major constraint on agility is the required data
warehouse modeling changes...
⢠So we can capture the data (create the buckets)
⢠So we can understand the data (context, meaning)
â Align to business keys, classify, describe (metadata)
Š 2011 Genesee Academy, LLC
16. Data Warehouse Agility
⢠Why create a Data Model for the DW?
⢠Model Data versus Meaning?
â Separate the capture of data from the meaning?
â The structure of a table versus the semantics
â Business meaning versus data loading
â As XML is to EDI
18. Concept of Name/Value Pair
Cust_ID Lname Fname Add City State Zip Bdate
121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977
123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982
139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988
119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967
120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976
Each Value or âdata itemâ (record value for each attribute), is provided in a
List format paired with the corresponding Name or âfield nameâ (column
header) from the normalized table structure.
Moving to Name / Value PairâŚ
19. Concept of Name/Value Pair
Name Value
Cust_ID Lname Fname Add City State Zip Bdate
121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977
Cust_ID Lname Fname Add City State Zip Bdate
123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982
Cust_ID Lname Fname Add City State Zip Bdate
139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988
Cust_ID Lname Fname Add City State Zip Bdate
119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967
Cust_ID Lname Fname Add City State Zip Bdate
120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976
20. Moving to Name/Value Pair
Cust_ID Lname Fname Add City State Zip Bdate
121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977
123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982
139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988
119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967
120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976
V
N
A
A
L
M
U
E
E
Transpose
âŚwith column headingsâŚ
21. Name Value
Cust_ID
Lname
121202
Lundquist
Name/Value Pair
Fname Carl
Add 22 Bird St
City NYC
State NY
Zip 98291
Bdate 10/9/1977
Cust_ID 123335
Lname Dahlgren
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
Cust_ID 139090
Lname Lundberg
Fname Scott
22. Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
The concept of the ârecordâ is effectively
City NYC
lost in this transformation.
State NY
Zip 98291 Now a RECORD is a set of Name/Value Pair
Bdate 10/9/1977 instancesâŚ
Cust_ID 123335
Lname Dahlgren
CON Lose resolution on the record.
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
Cust_ID 139090
Lname Lundberg
Fname Scott
23. Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
City NYC
State NY
Zip 98291
Bdate 10/9/1977
Cust_ID 123335
Lname Dahlgren
Fname Eva Also, the attributes are not defined in
Add 7 Academy advance â we donât know what to expect and
City Madison we canât check for attribute meaning,
State NJ definitions, domain values or data types.
Zip 7940
Bdate 2/12/1982
CON Attributes are not pre-defined.
Cust_ID 139090
Lname Lundberg
Fname Scott
24. Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
New attributes that are introduced into the
City NYC
source feed are added instantly to the DW.
State NY
There is no modeling delay, no code
Zip 98291
change, and no ETL impactâŚ
Bdate 10/9/1977
CustClass Big
Cust_ID 123335 PRO Absorb new attributes instantly.
Lname Dahlgren
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
CustClass Small
Cust_ID 139090
25. Hyper Agility
⢠The solution to deal with these issues requires a further level of
abstraction which in effect moves the persisted (historized,
permanent, integrated) data store even further away from the
business context that it is intended to represent.
⢠The DW model â the data model itself â is then not readable (not
understandable). In fact ETL professionals will also find themselves
further removed from this model. To the extent that a model is
intuitive, self-descriptive, and aligned with business meaning, this
approach takes a step in the other direction.
⢠Moving towards addressing these business driven agility
requirements casues the model itself to move much further away
(an order of magnitude away) from the business. So far as to
become effectively a technical solution utilizing only abstract
representations.
26. Hyper Agility
⢠The context â the meaning of the data â will in these cases need to
be managed in a different way.
⢠This can include a form of persisted and historized metadata
concerning the mappings and business rules. In effect a form of
EAI within the DW.
⢠Or it might include a more traditional secondary DW layer.
27. DW AGILITY SUMMARY
⢠Consider specific Agility Requirements
⢠Classify Agility Types and consider Alternatives
⢠Distinguish between operational integration and DW
⢠Look to modeling techniques optimized for Data Warehouse
⢠Look at entire picture â people, process, models and data
⢠Consider specific methodologies, templates and tools
⢠Determine if hyper agility is a requirement
28. Questions?
www.GeneseeAcademy.com
CDVDM Certification Seminar
June 23-24
October 27-28
Š 2011 Genesee Academy, LLC [email protected]
25568 Genesee Trail Rd USA +1 303.526.0340
Golden, Colorado 80401 Sverige 070 250 2102
28