Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
The Microsoft Well Architected Framework For Data AnalyticsStephanie Locke
With more than a decade of organizations running large data & analytics workloads in the cloud, Microsoft have extended their architecture framework to provide best practices and guidance for businesses. In this session, we’ll introduce the 'Well Architected Framework', go into detail about effective data architectures, and give you concrete next steps you can take whether you already have a cloud data architecture or are planning your first implementation.
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...Lace Lofranco
Talk Description:
The Modern Data Warehouse architecture is a response to the emergence of Big Data, Machine Learning and Advanced Analytics. DevOps is a key aspect of successfully operationalising a multi-source Modern Data Warehouse.
While there are many examples of how to build CI/CD pipelines for traditional applications, applying these concepts to Big Data Analytical Pipelines is a relatively new and emerging area. In this demo heavy session, we will see how to apply DevOps principles to an end-to-end Data Pipeline built on the Microsoft Azure Data Platform with technologies such as Data Factory, Databricks, Data Lake Gen2, Azure Synapse, and AzureDevOps.
Resources: https://aka.ms/mdw-dataops
The data architecture of solutions is frequently not given the attention it deserves or needs. Frequently, too little attention is paid to designing and specifying the data architecture within individual solutions and their constituent components. This is due to the behaviours of both solution architects ad data architects.
Solution architecture tends to concern itself with functional, technology and software components of the solution
Data architecture tends not to get involved with the data aspects of technology solutions, leaving a data architecture gap. Combined with the gap where data architecture tends not to get involved with the data aspects of technology solutions, there is also frequently a solution architecture data gap. Solution architecture also frequently omits the detail of data aspects of solutions leading to a solution data architecture gap. These gaps result in a data blind spot for the organisation.
Data architecture tends to concern itself with post-individual solutions. Data architecture needs to shift left into the domain of solutions and their data and more actively engage with the data dimensions of individual solutions. Data architecture can provide the lead in sealing these data gaps through a shift-left of its scope and activities as well providing standards and common data tooling for solution data architecture
The objective of data design for solutions is the same as that for overall solution design:
• To capture sufficient information to enable the solution design to be implemented
• To unambiguously define the data requirements of the solution and to confirm and agree those requirements with the target solution consumers
• To ensure that the implemented solution meets the requirements of the solution consumers and that no deviations have taken place during the solution implementation journey
Solution data architecture avoids problems with solution operation and use:
• Poor and inconsistent data quality
• Poor performance, throughput, response times and scalability
• Poorly designed data structures can lead to long data update times leading to long response times, affecting solution usability, loss of productivity and transaction abandonment
• Poor reporting and analysis
• Poor data integration
• Poor solution serviceability and maintainability
• Manual workarounds for data integration, data extract for reporting and analysis
Data-design-related solution problems frequently become evident and manifest themselves only after the solution goes live. The benefits of solution data architecture are not always evident initially.
A journey into the business world of artificial intelligence. Explore at a high-level ongoing business experiments in creating new value.
* Review AI as a priority for value generation
* Explore ongoing experimentation
* Touch on how businesses are monetising AI
* Understand the intent of adoption by industries
* Discuss on the state of customer trust in AI
Part 1 of a 9 Part Research Series named "What matters in AI" published on https://www.andremuscat.com
The document discusses data mesh vs data fabric architectures. It defines data mesh as a decentralized data processing architecture with microservices and event-driven integration of enterprise data assets across multi-cloud environments. The key aspects of data mesh are that it is decentralized, processes data at the edge, uses immutable event logs and streams for integration, and can move all types of data reliably. The document then provides an overview of how data mesh architectures have evolved from hub-and-spoke models to more distributed designs using techniques like kappa architecture and describes some use cases for event streaming and complex event processing.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
Welcome to my post on ‘Architecting Modern Data Platforms’, here I will be discussing how to design cutting edge data analytics platforms which meet the ever-evolving data & analytics needs for the business.
https://www.ankitrathi.com
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
Learn about using a semantic layer to enable actionable insights for everyone and streamline data and analytics access throughout your organization. This session will offer practical advice based on a decade of experience making semantic layers work for Enterprise customers.
Attend this session to learn about:
- Delivering critical business data to users faster than ever at scale using a semantic layer
- Enabling data teams to model and deliver a semantic layer on data in the cloud.
- Maintaining a single source of governed metrics and business data
- Achieving speed of thought query performance and consistent KPIs across any BI/AI tool like Excel, Power BI, Tableau, Looker, DataRobot, Databricks and more.
- Providing dimensional analysis capability that accelerates performance with no need to extract data from the cloud data warehouse
Who should attend this session?
Data & Analytics leaders and practitioners (e.g., Chief Data Officers, data scientists, data literacy, business intelligence, and analytics professionals).
This document discusses data mesh, a distributed data management approach for microservices. It outlines the challenges of implementing microservice architecture including data decoupling, sharing data across domains, and data consistency. It then introduces data mesh as a solution, describing how to build the necessary infrastructure using technologies like Kubernetes and YAML to quickly deploy data pipelines and provision data across services and applications in a distributed manner. The document provides examples of how data mesh can be used to improve legacy system integration, batch processing efficiency, multi-source data aggregation, and cross-cloud/environment integration.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The document discusses the challenges of modern data, analytics, and AI workloads. Most enterprises struggle with siloed data systems that make integration and productivity difficult. The future of data lies with a data lakehouse platform that can unify data engineering, analytics, data warehousing, and machine learning workloads on a single open platform. The Databricks Lakehouse platform aims to address these challenges with its open data lake approach and capabilities for data engineering, SQL analytics, governance, and machine learning.
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace – from digital transformation, to marketing, to customer centricity, to population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
This document discusses challenges with centralized data architectures and proposes a data mesh approach. It outlines 4 challenges: 1) centralized teams fail to scale sources and consumers, 2) point-to-point data sharing is difficult to decouple, 3) bridging operational and analytical systems is complex, and 4) legacy data stacks rely on outdated paradigms. The document then proposes a data mesh architecture with domain data as products and an operational data platform to address these challenges by decentralizing control and improving data sharing, discovery, and governance.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Data Literacy is an increasing concern, as organizations look to become more data-driven. As the rise of the citizen data scientist and self-service data analytics becomes increasingly common, the need for business users to understand core Data Management fundamentals is more important than ever. At the same time, technical roles need a strong foundation in Data Architecture principles and best practices. Join this webinar to understand the key components of Data Literacy, and practical ways to implement a Data Literacy program in your organization.
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
Business intelligence (BI) and data analytics are increasing in popularity as more organizations are looking to become more data-driven. Many tools have powerful visualization techniques that can create dynamic displays of critical information. To ensure that the data displayed on these visualizations is accurate and timely, a strong Data Architecture is needed. Join this webinar to understand how to create a robust Data Architecture for BI and data analytics that takes both business and technology needs into consideration.
Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
IBM's Big Data platform provides tools for managing and analyzing large volumes of structured, unstructured, and streaming data. It includes Hadoop for storage and processing, InfoSphere Streams for real-time streaming analytics, InfoSphere BigInsights for analytics on data at rest, and PureData System for Analytics (formerly Netezza) for high performance data warehousing. The platform enables businesses to gain insights from all available data to capitalize on information resources and make data-driven decisions.
IBM's Big Data platform provides tools for managing and analyzing large volumes of data from various sources. It allows users to cost effectively store and process structured, unstructured, and streaming data. The platform includes products like Hadoop for storage, MapReduce for processing large datasets, and InfoSphere Streams for analyzing real-time streaming data. Business users can start with critical needs and expand their use of big data over time by leveraging different products within the IBM Big Data platform.
Welcome to my post on ‘Architecting Modern Data Platforms’, here I will be discussing how to design cutting edge data analytics platforms which meet the ever-evolving data & analytics needs for the business.
https://www.ankitrathi.com
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
Learn about using a semantic layer to enable actionable insights for everyone and streamline data and analytics access throughout your organization. This session will offer practical advice based on a decade of experience making semantic layers work for Enterprise customers.
Attend this session to learn about:
- Delivering critical business data to users faster than ever at scale using a semantic layer
- Enabling data teams to model and deliver a semantic layer on data in the cloud.
- Maintaining a single source of governed metrics and business data
- Achieving speed of thought query performance and consistent KPIs across any BI/AI tool like Excel, Power BI, Tableau, Looker, DataRobot, Databricks and more.
- Providing dimensional analysis capability that accelerates performance with no need to extract data from the cloud data warehouse
Who should attend this session?
Data & Analytics leaders and practitioners (e.g., Chief Data Officers, data scientists, data literacy, business intelligence, and analytics professionals).
This document discusses data mesh, a distributed data management approach for microservices. It outlines the challenges of implementing microservice architecture including data decoupling, sharing data across domains, and data consistency. It then introduces data mesh as a solution, describing how to build the necessary infrastructure using technologies like Kubernetes and YAML to quickly deploy data pipelines and provision data across services and applications in a distributed manner. The document provides examples of how data mesh can be used to improve legacy system integration, batch processing efficiency, multi-source data aggregation, and cross-cloud/environment integration.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The document discusses the challenges of modern data, analytics, and AI workloads. Most enterprises struggle with siloed data systems that make integration and productivity difficult. The future of data lies with a data lakehouse platform that can unify data engineering, analytics, data warehousing, and machine learning workloads on a single open platform. The Databricks Lakehouse platform aims to address these challenges with its open data lake approach and capabilities for data engineering, SQL analytics, governance, and machine learning.
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace – from digital transformation, to marketing, to customer centricity, to population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
This document discusses challenges with centralized data architectures and proposes a data mesh approach. It outlines 4 challenges: 1) centralized teams fail to scale sources and consumers, 2) point-to-point data sharing is difficult to decouple, 3) bridging operational and analytical systems is complex, and 4) legacy data stacks rely on outdated paradigms. The document then proposes a data mesh architecture with domain data as products and an operational data platform to address these challenges by decentralizing control and improving data sharing, discovery, and governance.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Data Literacy is an increasing concern, as organizations look to become more data-driven. As the rise of the citizen data scientist and self-service data analytics becomes increasingly common, the need for business users to understand core Data Management fundamentals is more important than ever. At the same time, technical roles need a strong foundation in Data Architecture principles and best practices. Join this webinar to understand the key components of Data Literacy, and practical ways to implement a Data Literacy program in your organization.
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
Business intelligence (BI) and data analytics are increasing in popularity as more organizations are looking to become more data-driven. Many tools have powerful visualization techniques that can create dynamic displays of critical information. To ensure that the data displayed on these visualizations is accurate and timely, a strong Data Architecture is needed. Join this webinar to understand how to create a robust Data Architecture for BI and data analytics that takes both business and technology needs into consideration.
Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
IBM's Big Data platform provides tools for managing and analyzing large volumes of structured, unstructured, and streaming data. It includes Hadoop for storage and processing, InfoSphere Streams for real-time streaming analytics, InfoSphere BigInsights for analytics on data at rest, and PureData System for Analytics (formerly Netezza) for high performance data warehousing. The platform enables businesses to gain insights from all available data to capitalize on information resources and make data-driven decisions.
IBM's Big Data platform provides tools for managing and analyzing large volumes of data from various sources. It allows users to cost effectively store and process structured, unstructured, and streaming data. The platform includes products like Hadoop for storage, MapReduce for processing large datasets, and InfoSphere Streams for analyzing real-time streaming data. Business users can start with critical needs and expand their use of big data over time by leveraging different products within the IBM Big Data platform.
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
Element Fleet has the largest benchmark database in our industry and we needed a robust and linearly scalable platform to turn this data into actionable insights for our customers. The platform needed to support advanced analytics, streaming data sets, and traditional business intelligence use cases.
In this presentation, we will discuss how we built a single, unified platform for both Advanced Analytics and traditional Business Intelligence using Cassandra on DSE. With Cassandra as our foundation, we are able to plug in the appropriate technology to meet varied use cases. The platform we’ve built supports real-time streaming (Spark Streaming/Kafka), batch and streaming analytics (PySpark, Spark Streaming), and traditional BI/data warehousing (C*/FiloDB). In this talk, we are going to explore the entire tech stack and the challenges we faced trying support the above use cases. We will specifically discuss how we ingest and analyze IoT (vehicle telematics data) in real-time and batch, combine data from multiple data sources into to single data model, and support standardized and ah-hoc reporting requirements.
About the Speaker
Jim Peregord Vice President - Analytics, Business Intelligence, Data Management, Element Corp.
If your business is heavily dependent on the Internet, you may be facing an unprecedented level of network traffic analytics data. How to make the most of that data is the challenge. This presentation from Kentik VP Product and former EMA analyst Jim Frey explores the evolving need, the architecture and key use cases for BGP and NetFlow analysis based on scale-out cloud computing and Big Data technologies.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
This document discusses how organizations can save money on database management systems (DBMS) by moving from expensive commercial DBMS to more affordable open-source options like PostgreSQL. It notes that PostgreSQL has matured and can now handle mission critical workloads. The document recommends partnering with EnterpriseDB to take advantage of their commercial support and features for PostgreSQL. It highlights how customers have seen cost savings of 35-80% by switching to PostgreSQL and been able to reallocate funds to new business initiatives.
Which Change Data Capture Strategy is Right for You?Precisely
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
• Advantages and disadvantages of different CDC methods
• The replication latency your project requires
• How to keep data current in Big Data technologies like Hadoop
This document discusses big data management for OSS/BSS applications. It defines big data and describes the Hadoop framework for distributed processing of large, complex data sets. The document outlines using a big data solution with Hadoop to provide data warehousing, reporting, and revenue assurance across usage, provisioning, billing, and network data for telecom applications. Key benefits include a scalable, low-cost solution for insights, monitoring, and reconciling various systems and records.
The document outlines a multi-month implementation plan for a BI project with the following key stages:
1) Preparation and Planning in Month 1 involving prioritization, hardware installation, staffing, and software procurement.
2) ETL development from Month 1-3 involving requirement analysis, design, development and testing of the ETL processes.
3) Initial deployment from Month 2-3 setting up the metadata framework and data governance with report reductions.
4) Ongoing development from Month 4-10 involving further report reductions, incremental deployments, building the data library and dashboards. Headcount savings also take effect during this stage.
5) Long term operations starting from Month 11 involving targeting
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
The recent boom in big data processing and democratization of the big data space has been enabled by the fact that most of the concepts originated in the research labs of companies such as Google, Amazon, Yahoo and Facebook are now available as open source. Technologies such as Hadoop, Cassandra let businesses around the world to become more data driven and tap into their massive data feeds to mine valuable insights.
At the same time, we are still at a certain stage of the maturity curve of these new big data technologies and of the entire big data technology stack. Many of the technologies originated from a particular use case and attempts to apply them in a more generic fashion are hitting the limits of their technological foundations. In some areas, there are several competing technologies for the same set of use cases, which increases risks and costs of big data implementations.
We will show how GoodData solves the entire big data pipeline today, starting from raw data feeds all the way up to actionable business insights. All this provided as a hosted multi-tenant environment letting its customers to solve their particular analytical use case or many analytical use cases for thousands of their customers all using the same platform and tools while abstracting them away from the technological details of the big data stack.
Pete Zybrick will discuss techniques for analyzing, extracting, and validating large datasets using tools from Cloudera and AWS. He will provide examples using the Federal Reserve Economic Database (FRED) and SiteCatalyst data. The presentation will cover programmatically analyzing the data structures, defining extraction and validation rules, bulk importing data into Impala and Redshift, and productivity tools for business users to access subsets of large datasets.
Case study showing problem statement and target architecture for very high volume external event data pipeline and on premise Teradata EDW integration pipeline with Cloud OLAP Google BigQuery, Amazon Athena to support batch and real time analytics
3 Things to Learn:
-How data is driving digital transformation to help businesses innovate rapidly
-How Choice Hotels (one of largest hoteliers) is using Cloudera Enterprise to gain meaningful insights that drive their business
-How Choice Hotels has transformed business through innovative use of Apache Hadoop, Cloudera Enterprise, and deployment in the cloud — from developing customer experiences to meeting IT compliance requirements
The document discusses optimizing a data warehouse by offloading some workloads and data to Hadoop. It identifies common challenges with data warehouses like slow transformations and queries. Hadoop can help by handling large-scale data processing, analytics, and long-term storage more cost effectively. The document provides examples of how customers benefited from offloading workloads to Hadoop. It then outlines a process for assessing an organization's data warehouse ecosystem, prioritizing workloads for migration, and developing an optimization plan.
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
Postgres is the leading open source database management system that is being developed by a very active community for more than 15 years. Gaby Schilders is Sales Engineer at EnterpriseDB, supplier of the EDB Postgres data platform.
Gaby Schilders, Sales Engineer at EnterpriseDB, will be explaining why companies take open source as the centerpiece for modernising their IT infrastructure, thus increasing their scalability and taking full advantage today's technologies offer them.
Experimentation plays a vital role in business growth at eBay by providing valuable insights and prediction on how users will reach to changes made to the eBay website and applications. On a given day, eBay has several hundred experiments running at the same time. Our experimentation data processing pipeline handles billions of rows user behavioral and transactional data per day to generate detailed reports covering 100+ metrics over 50 dimensions.
In this session, we will share our journey of how we moved this complex process from Data warehouse to Hadoop. We will give an overview of the experimentation platform and data processing pipeline. We will highlight the challenges and learnings we faced implementing this platform in Hadoop and how this transformation led us to build a scalable, flexible and reliable data processing workflow in Hadoop. We will cover our work done on performance optimizations, methods to establish resilience and configurability, efficient storage formats and choices of different frameworks used in the pipeline.
The document provides an overview of an experimentation platform built on Hadoop. It discusses experimentation workflows, why Hadoop was chosen as the framework, the system architecture, and challenges faced and lessons learned. Key points include:
- The platform supports A/B testing and reporting on hundreds of metrics and dimensions for experiments.
- Data is ingested from various sources and stored in Hadoop for analysis using technologies like Hive, Spark, and Scoobi.
- Challenges included optimizing joins and jobs for large datasets, addressing data skew, and ensuring job resiliency. Tuning configuration parameters and job scheduling helped improve performance.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...ScyllaDB
With over a billion Indians set to shop online, Meesho is redefining e-commerce by making it accessible, affordable, and inclusive at an unprecedented scale. But scaling for Bharat isn’t just about growth—it’s about building a tech backbone that can handle massive traffic surges, dynamic pricing, real-time recommendations, and seamless user experiences. In this session, we’ll take you behind the scenes of Meesho’s journey in democratizing e-commerce while operating at Monster Scale. Discover how ScyllaDB plays a crucial role in handling millions of transactions, optimizing catalog ranking, and ensuring ultra-low-latency operations. We’ll deep dive into our real-world use cases, performance optimizations, and the key architectural decisions that have helped us scale effortlessly.
Navigating common mistakes and critical success factors
Is your team considering or starting a database migration? Learn from the frontline experience gained guiding hundreds of high-stakes migration projects – from startups to Google and Twitter. Join us as Miles Ward and Tim Koopmans have a candid chat about what tends to go wrong and how to steer things right.
We will explore:
- What really pushes teams to the database migration tipping point
- How to scope and manage the complexity of a migration
- Proven migration strategies and antipatterns
- Where complications commonly arise and ways to prevent them
Expect plenty of war stories, along with pragmatic ways to make your own migration as “blissfully boring” as possible.
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsScyllaDB
Explore critical strategies – and antipatterns – for achieving low latency at extreme scale
If you’re getting started with ScyllaDB, you’re probably intrigued by its potential to achieve predictable low latency at extreme scale. But how do you ensure that you’re maximizing that potential for your team’s specific workloads and technical requirements?
This webinar offers practical advice for navigating the various decision points you’ll face as you evaluate ScyllaDB for your project and move into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
Join us for an inside look at the lessons learned across thousands of real-world distributed database projects.
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...ScyllaDB
Cloudflare’s boot infrastructure dynamically generates and signs boot artifacts for nodes worldwide, ensuring secure, scalable, and customizable deployments. This talk dives into its architecture, scaling decisions, and how it enables seamless testing while maintaining a strong chain of trust.
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn IsarathamScyllaDB
Learn about Agoda's performance tuning strategies for ScyllaDB. Worakarn shares how they optimized disk performance, fine-tuned compaction strategies, and adjusted SSTable settings to match their workload for peak efficiency.
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd ColemanScyllaDB
Yieldmo processes hundreds of billions of ad requests daily with subsecond latency. Initially using DynamoDB for its simplicity and stability, they faced rising costs, suboptimal latencies, and cloud provider lock-in. This session explores their journey to ScyllaDB’s DynamoDB-compatible API.
There’s a common adage that it takes 10 years to develop a file system. As ScyllaDB reaches that 10 year milestone in 2025, it’s the perfect time to reflect on the last decade of ScyllaDB development – both hits and misses. It’s especially appropriate given that our project just reached a critical mass with certain scalability and elasticity goals that we dreamed up years ago. This talk will cover how we arrived at ScyllaDB X Cloud achieving our initial vision, and share where we’re heading next.
Reduce Your Cloud Spend with ScyllaDB by Tzach LivyatanScyllaDB
This talk will explore why ScyllaDB Cloud is a cost-effective alternative to DynamoDB, highlighting efficient design implementations like shared compute, local NVMe storage, and storage compression. It will also discuss new X Cloud features, better plans and pricing, and a direct cost comparison between ScyllaDB and DynamoDB
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence LiuScyllaDB
Terence share how Clearview AI's infra needs evolved and why they chose ScyllaDB after first-principles research. From fast ingestion to production queries, the talk explores their journey with Rust, embedded DB readers, and the ScyllaDB Rust driver—plus config tips for bulk ingestion and achieving data parity.
Vector Search with ScyllaDB by Szymon WasikScyllaDB
Vector search is an essential element of contemporary machine learning pipelines and AI tools. This talk will share preliminary results on the forthcoming vector storage and search features in ScyllaDB. By leveraging Scylla's scalability and USearch library's performance, we have designed a system with exceptional query latency and throughput. The talk will cover vector search use cases, our roadmap, and a comparison of our initial implementation with other vector databases.
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...ScyllaDB
Workload Prioritization is a ScyllaDB exclusive feature for controlling how different workloads compete for system resources. It's used to prioritize urgent application requests that require immediate response times versus others that can tolerate slighter delays (e.g., large scans). Join this session for a demo of how applying workload prioritization reduces infrastructure costs while ensuring predictable performance at scale.
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...ScyllaDB
Should you move code to data or data to code? Conventional wisdom favors the former, but cloud trends push the latter. This session by the creator of PACELC explores the shift, its risks, and the ongoing debate in data virtualization between push- and pull-based processing.
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...ScyllaDB
Scaling from 66M to 25B+ records in a core financial system is tough—every number must be right, and data must be fresh. In this session, Dmytro shares real-world strategies to balance accuracy with real-time performance and avoid scaling pitfalls. It's purely practical, no-BS insights for engineers.
Object Storage in ScyllaDB by Ran Regev, ScyllaDBScyllaDB
In this talk we take a look at how Object Storage is used by Scylla. We focus on current usage, namely - for backup, and we look at the shift in implementation from an external tool to native Scylla. We take a close look at the complexity of backup and restore mostly in the face of topology changes and token assignments. We also take a glimpse to the future and see how Scylla is going to use Object Storage as its native storage. We explore a few possible applications of it and understand the tradeoffs.
Lessons Learned from Building a Serverless Notifications System by Srushith R...ScyllaDB
Reaching your audience isn’t just about email. Learn how we built a scalable, cost-efficient notifications system using AWS serverless—handling SMS, WhatsApp, and more. From architecture to throttling challenges, this talk dives into key decisions for high-scale messaging.
A Dist Sys Programmer's Journey into AI by Piotr SarnaScyllaDB
This talk explores the culture shock of transitioning from distributed databases to AI. While AI operates at massive scale, distributed storage and compute remain essential. Discover key differences, unexpected parallels, and how database expertise applies in the AI world.
High Availability: Lessons Learned by Paul PreuveneersScyllaDB
How does ScyllaDB keep your data safe, and your mission critical applications running smoothly, even in the face of disaster? In this talk we’ll discuss what we have learned about High Availability, how it is implemented within ScyllaDB and what that means for your business. You’ll learn about ScyllaDB cloud architecture design, consistency, replication and even load balancing and much more.
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...ScyllaDB
Natura, a top global cosmetics brand with 3M+ beauty consultants in Latin America, processes massive data for orders, campaigns, and analytics. In this talk, Rodrigo Luchini & Marcus Monteiro share how Natura leverages ScyllaDB’s CDC Source Connector for real-time sales insights.
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...ScyllaDB
This is a case study on managing mutable big data: Exploring the evolution of the persistence layer in a processing graph, tackling design challenges, and refining key operational principles along the way.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat
The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell
With expertise in data architecture, performance tracking, and revenue forecasting, Andrew Marnell plays a vital role in aligning business strategies with data insights. Andrew Marnell’s ability to lead cross-functional teams ensures businesses achieve sustainable growth and operational excellence.
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
5. Framework Pillars
Operational Excellence
05
● Serviceability - Easy Operations & Maint
● Maintenance - Data Pipeline Maintenance
● Reduced Ops Activities & Cost
Efficiency
04 ● Performance Efficiency
● Cost Efficiency - Cost Optimized
Availability
03
● Reliability
● Resiliency of System
● Availability - System Time UP
Scalability
02
● Horizontal Scaling
● Vertical Scaling
● Auto Scaling
Security
01
● Access Management & Controls
● Data Protection - Encryption , Data Masking
● Compliance - ISO, HIPPA , PCI DSS
● Data Governance
5
6. ■ Cloud Migration / Adoption - 5Rs of transformation
■ Rehost , Refactor , Revise , Rebuild and Replace
Data Modernization Journey
Data Discovery
Analysis of existing Data Architecture,
System Design and evaluating the need,
requirements of new data system
Data Architecture & Assessment
Designing new data platform, assessment of data
modelling, Data Governance and Security
Data Architecture &
Engineering
Data Platform implementation, Data
Pipeline development and enhancement
POCs. Designing end to end cycle.
Go Live & DataOps
Soft launch/ early cut off to integrate with
other systems and signing off from
business users. Implementing operations
of new platform and modified pipelines
Data Migration & Pipeline
Development/Conversion
Actual pipeline development, conversion on
new platform. Implementing , testing and
validating pipelines/data on new platform.
05
01
02 03
04
05
01
02 03
04
6
7. Design Framework Pillars & Considerations
7
Teams
Architects Engineering Operations
Who?
When? How?
Business Drive
Technology Drive
Management &
Engagement Drive
What?
End User SLAs Assessment
& SLA Setting
System Assessment &
Technology Evaluations
Signed Up Services vs Open Source vs
Hybrid Evaluations
Business Assessment
Technology Evangelist
Sign Up for Services
Business Teams
9. There are various offerings to implement Data Platform by Public Cloud
providers for DB / DW / Data Lake / Data Mesh / SQL / NoSQL etc.
Cloud Native
● AWS Glue
● EMR
● Kinesis, Opensearch
● RDS , Aurora
● Redshift
● DynamoDB , DocumentDB
AWS
● Azure Data Factory
● HDInsight
● Azure Stream Analytics
● Azure SQL, Managed SQL
● SQL Server, PostgreSQL
● MariaDB, CosmosDB, Managed
Cassandra
Azure
● DataFlow, Data Fusion
● DataProc
● Pub/Sub, Stream
● Cloud SQL , Cloud Spanner
● BigQuery , BigLake
● Bigtable, Firestore, Memorystore
GCP
9
10. There are a variety of offerings to implement data platform and design data
pipelines using native and open source services.
Data on Cloud - Common Offerings
10
12. Evaluation - Pre-Requisites
Evaluation
Criteria
Existing/Cross
Application Platform to
be Evaluated
New Platforms to be
Explored
Platform Offerings
Existing Support Tier/
Billing Plans
Platform Offerings
Probable Platform to be
Evaluated, Cost
Comparisons Done?
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators- BI, OPS tools
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators - BI, OPS tools
Specific Evaluation or Open
Evaluation to Select Best Fit
for Given Use Case
12
13. 13
Evaluation - Inputs
Capex vs Opex
% of Data Scan vs
Processed
Compute vs
Storage
Utilization
Data Challenges
System
Challenges
Capex vs Opex
% Storage vs Scan vs Processed
Compute vs Storage Utilization
% Data Challenges
System Challenges
14. Evaluation - CheckPoints
1
Data Operations & Business
Critical Requirements
● Data Pipeline Management - Monitoring & Operations
● Business Requirements - 24X7 Monitoring vs SLAs
● Critical Applications - Availability & SLAs
3
Business Checkpoint
● Data Availability - SLAs
● End User Agreements
● Business Requirements - Specific to Tooling
● Existing Cost utilization
● Performance Ratio - Current vs Expected
● Modernization Drive
5 Data Platform Checkpoint
● Type of Data - Structured, Semi-Structured,
Unstructured
● Sources of Data - Files , DBs, ioTs, Devices,
APIs
● Consumers of Data - Users vs System
● Frequency of Data - Batch, RealTime
● Data Storage - Active vs Passive
● Data Modelling - Schema, Tables , DB
Objects
2
Data Analytics Checkpoint
● Data Analytics - BI Tooling
● Predictive Analytics - Algorithms, Tools,
Libraries used
● AI/ML Use Cases - Customer Facing vs
In-House
● Enterprise vs Cloud Native
4
Data Processing Checkpoint
● Target Systems Integrations
● Data Usage - Hot Data vs Cold Data
● Data Stored vs Data Processed vs Reads
● Data Pipelines - Batch vs Streaming
● Data Pipeline Complexity - S/M/C/VC
● Data Pipeline Scheduling - Tools , Cron jobs,
Native Schedulers, Event based
● ETL vs ELT Requirements
14
15. 15
Evaluation - Metrics
Checkpoint Category Metrics
Data
Checkpoint
Data
Integrators
No of Sources
No of Target
No of Specific Systems
Total Storage Volume
Daily Delta Volume
Data
Modelling
Frequency of Schema
Evolution
No of Objects
% of NoSQL Objects
% of PL SQL Objects
Data
Processing
Data
Pipelines
No of S/M/C Jobs
No of External Functions
Integrated (Java/Python/SQL)
No of ETL Jobs (Tool Based)
No of Compute Intense Jobs
No of Storage Intense Jobs
Checkpoint Category Metrics
Business
Checkpoint
Operations No of Times SLA Challenged
No of End Users Affected
Reliability No of times Data compromised
No of DR activities
No of end users impacted
Performance
Efficiency
Total Batch Time
No of Times Batch SLA Impacted
No of End User Reports
No of End Users/Consumers
No of Poor Performing Reports/Queries
Cost
Utilization
Overall Billing ( Capex )
Total Operations, Maint cost
Data
Operations
Monitoring No of Support Team Members
No of Monitoring Dashboards
Data Analytics Analytics No of ML Jobs/Algorithms
ML Integrators
17. Evaluation - Pre-Requisites
17
Evaluation
Criteria
Existing/Cross
Application Platform to
be Evaluated
New Platforms to be
Explored
Platform Offerings
Existing Support Tier/
Billing Plans
Platform Offerings
Probable Platform to be
Evaluated, Cost
Comparisons Done?
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators- BI, OPS tools
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators - BI, OPS tools
Specific Evaluation or Open
Evaluation to Select Best Fit
for Given Use Case
18. Evaluation - Inputs
■ Domain - Retail , DW - Teradata, ETL - DataStage
■ Platform - Recently Signed up for Google Cloud Platform
■ Data Platform - Evaluate GCP Services to Setup Data Warehouse Platform
■ DW Size - 120TB (70 TB Active + 50 TB Passive )
■ Daily Volume - 1TB ( 80% Batch + 20% Streaming )
■ Data - Structured & Semi-structured (JSON, XML)
■ Data Pipelines - Mostly ELT - Datastage to Teradata (landing layer), Teradata SQL to Transform Data
■ Data Analytics - Tableau Reports - Customer Reports
■ Enterprise Scheduler - Control-M , Ticketing Tool - JIRA , Alerting via Slack, Email
■ Monitoring Dashboards , 24X7 Support Team
18
19. DW - Google BigQuery vs Azure Synapse
BigQuery Synapse Observations
● Supports More Than 90% of Requirements
● SaaS Offering , Cloud Managed
● Very Well Integrated
1 Data Platform
Checkpoint
● Native Drivers to Support Batch & Stream
● Highest Data Processing Speed
● Storage vs Compute - Scaling In and Out
● Automatic Scaling, Performance Efficient
2 Data Processing
Checkpoint
● Can Be Integrated With Any BI Tools
● Support AI/ML Libraries and Jobs
● Performance Efficient - Data Processing , Scanning
3 Data Analytics
Checkpoint
● Customized Logging & Monitoring
● Native vs Customized Dashboards
● Integration With Various Alerting, Messaging Tools
5 Data Operations
● High Availability
● Automatic Failover , No DR Required
● Performance & Cost Efficient
● Pay as You Go vs Commitment Comparison Based on
Overall Usage
4 Business Checkpoint
19
20. Evaluation - Final Report
Approach 1 Approach 2
DW BigQuery BigQuery
ETL + ELT Pipelines
Modify DS jobs to use BQ connector to load data to BQ
landing layer
Convert DS load jobs to BQ load jobs to pull data from source
and load to BQ
(this is depending on types of source systems and integration
complexity)
Data Storage
Store active data in BQ native tables with roll up policies
and store passive datasets on GCS layer depending on
usage of tables. External tables can be built on GCS
datasets.
Store active data in BQ native tables with roll up policies and
store passive datasets on GCS layer depending on usage of
tables.External tables can be built on GCS datasets.
Data Analytics Tableau connections can be replaced with BQ connections Tableau connections can be replaced with BQ connections
Data Pipeline Scheduler &
Maint
Control-M can be used to trigger the pipelines,
Orchestration can be done using Composer. Existing
ticketing tools, alerting tools can be used as is
Control-M can be used to trigger the pipelines, Orchestration can
be done using Composer.Existing ticketing tools, alerting tools
can be used as is
BigQuery is opted here post evaluation which is completely based on the initial sign up to GCP as well as data storage % ratio between active and
passive storage. Azure Synapse can offer the same capabilities however choices here are business & enterprise driven.
20