SlideShare a Scribd company logo
Data Platform
Architecture Principles
and Evaluation Criteria
Pooja Kelgaonkar,
Senior Data Architect at Rackspace Technology
Pooja Kelgaonkar
■ Senior Data Architect - GCP, Snowflake
■ Specialist in Data Modernization Implementations
■ Expertise in “Data” domain
■ Learner, Tech Blogger, Tech evangelist
■ Reading, listening Indian classical music
■ Architecture Principles
■ Data Modernization
■ Data Platform Offerings
■ Evaluation Criteria
■ Sample Use Case - Evaluation Comparison
Agenda
3
Data Architecture
Principles
Framework Pillars
Operational Excellence
05
● Serviceability - Easy Operations & Maint
● Maintenance - Data Pipeline Maintenance
● Reduced Ops Activities & Cost
Efficiency
04 ● Performance Efficiency
● Cost Efficiency - Cost Optimized
Availability
03
● Reliability
● Resiliency of System
● Availability - System Time UP
Scalability
02
● Horizontal Scaling
● Vertical Scaling
● Auto Scaling
Security
01
● Access Management & Controls
● Data Protection - Encryption , Data Masking
● Compliance - ISO, HIPPA , PCI DSS
● Data Governance
5
■ Cloud Migration / Adoption - 5Rs of transformation
■ Rehost , Refactor , Revise , Rebuild and Replace
Data Modernization Journey
Data Discovery
Analysis of existing Data Architecture,
System Design and evaluating the need,
requirements of new data system
Data Architecture & Assessment
Designing new data platform, assessment of data
modelling, Data Governance and Security
Data Architecture &
Engineering
Data Platform implementation, Data
Pipeline development and enhancement
POCs. Designing end to end cycle.
Go Live & DataOps
Soft launch/ early cut off to integrate with
other systems and signing off from
business users. Implementing operations
of new platform and modified pipelines
Data Migration & Pipeline
Development/Conversion
Actual pipeline development, conversion on
new platform. Implementing , testing and
validating pipelines/data on new platform.
05
01
02 03
04
05
01
02 03
04
6
Design Framework Pillars & Considerations
7
Teams
Architects Engineering Operations
Who?
When? How?
Business Drive
Technology Drive
Management &
Engagement Drive
What?
End User SLAs Assessment
& SLA Setting
System Assessment &
Technology Evaluations
Signed Up Services vs Open Source vs
Hybrid Evaluations
Business Assessment
Technology Evangelist
Sign Up for Services
Business Teams
Data Platform
Offerings
There are various offerings to implement Data Platform by Public Cloud
providers for DB / DW / Data Lake / Data Mesh / SQL / NoSQL etc.
Cloud Native
● AWS Glue
● EMR
● Kinesis, Opensearch
● RDS , Aurora
● Redshift
● DynamoDB , DocumentDB
AWS
● Azure Data Factory
● HDInsight
● Azure Stream Analytics
● Azure SQL, Managed SQL
● SQL Server, PostgreSQL
● MariaDB, CosmosDB, Managed
Cassandra
Azure
● DataFlow, Data Fusion
● DataProc
● Pub/Sub, Stream
● Cloud SQL , Cloud Spanner
● BigQuery , BigLake
● Bigtable, Firestore, Memorystore
GCP
9
There are a variety of offerings to implement data platform and design data
pipelines using native and open source services.
Data on Cloud - Common Offerings
10
Data Platform -
Evaluation Criteria
(Assessment Phase)
Evaluation - Pre-Requisites
Evaluation
Criteria
Existing/Cross
Application Platform to
be Evaluated
New Platforms to be
Explored
Platform Offerings
Existing Support Tier/
Billing Plans
Platform Offerings
Probable Platform to be
Evaluated, Cost
Comparisons Done?
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators- BI, OPS tools
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators - BI, OPS tools
Specific Evaluation or Open
Evaluation to Select Best Fit
for Given Use Case
12
13
Evaluation - Inputs
Capex vs Opex
% of Data Scan vs
Processed
Compute vs
Storage
Utilization
Data Challenges
System
Challenges
Capex vs Opex
% Storage vs Scan vs Processed
Compute vs Storage Utilization
% Data Challenges
System Challenges
Evaluation - CheckPoints
1
Data Operations & Business
Critical Requirements
● Data Pipeline Management - Monitoring & Operations
● Business Requirements - 24X7 Monitoring vs SLAs
● Critical Applications - Availability & SLAs
3
Business Checkpoint
● Data Availability - SLAs
● End User Agreements
● Business Requirements - Specific to Tooling
● Existing Cost utilization
● Performance Ratio - Current vs Expected
● Modernization Drive
5 Data Platform Checkpoint
● Type of Data - Structured, Semi-Structured,
Unstructured
● Sources of Data - Files , DBs, ioTs, Devices,
APIs
● Consumers of Data - Users vs System
● Frequency of Data - Batch, RealTime
● Data Storage - Active vs Passive
● Data Modelling - Schema, Tables , DB
Objects
2
Data Analytics Checkpoint
● Data Analytics - BI Tooling
● Predictive Analytics - Algorithms, Tools,
Libraries used
● AI/ML Use Cases - Customer Facing vs
In-House
● Enterprise vs Cloud Native
4
Data Processing Checkpoint
● Target Systems Integrations
● Data Usage - Hot Data vs Cold Data
● Data Stored vs Data Processed vs Reads
● Data Pipelines - Batch vs Streaming
● Data Pipeline Complexity - S/M/C/VC
● Data Pipeline Scheduling - Tools , Cron jobs,
Native Schedulers, Event based
● ETL vs ELT Requirements
14
15
Evaluation - Metrics
Checkpoint Category Metrics
Data
Checkpoint
Data
Integrators
No of Sources
No of Target
No of Specific Systems
Total Storage Volume
Daily Delta Volume
Data
Modelling
Frequency of Schema
Evolution
No of Objects
% of NoSQL Objects
% of PL SQL Objects
Data
Processing
Data
Pipelines
No of S/M/C Jobs
No of External Functions
Integrated (Java/Python/SQL)
No of ETL Jobs (Tool Based)
No of Compute Intense Jobs
No of Storage Intense Jobs
Checkpoint Category Metrics
Business
Checkpoint
Operations No of Times SLA Challenged
No of End Users Affected
Reliability No of times Data compromised
No of DR activities
No of end users impacted
Performance
Efficiency
Total Batch Time
No of Times Batch SLA Impacted
No of End User Reports
No of End Users/Consumers
No of Poor Performing Reports/Queries
Cost
Utilization
Overall Billing ( Capex )
Total Operations, Maint cost
Data
Operations
Monitoring No of Support Team Members
No of Monitoring Dashboards
Data Analytics Analytics No of ML Jobs/Algorithms
ML Integrators
Data Platform -
Evaluation Use Case
Evaluation - Pre-Requisites
17
Evaluation
Criteria
Existing/Cross
Application Platform to
be Evaluated
New Platforms to be
Explored
Platform Offerings
Existing Support Tier/
Billing Plans
Platform Offerings
Probable Platform to be
Evaluated, Cost
Comparisons Done?
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators- BI, OPS tools
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators - BI, OPS tools
Specific Evaluation or Open
Evaluation to Select Best Fit
for Given Use Case
Evaluation - Inputs
■ Domain - Retail , DW - Teradata, ETL - DataStage
■ Platform - Recently Signed up for Google Cloud Platform
■ Data Platform - Evaluate GCP Services to Setup Data Warehouse Platform
■ DW Size - 120TB (70 TB Active + 50 TB Passive )
■ Daily Volume - 1TB ( 80% Batch + 20% Streaming )
■ Data - Structured & Semi-structured (JSON, XML)
■ Data Pipelines - Mostly ELT - Datastage to Teradata (landing layer), Teradata SQL to Transform Data
■ Data Analytics - Tableau Reports - Customer Reports
■ Enterprise Scheduler - Control-M , Ticketing Tool - JIRA , Alerting via Slack, Email
■ Monitoring Dashboards , 24X7 Support Team
18
DW - Google BigQuery vs Azure Synapse
BigQuery Synapse Observations
● Supports More Than 90% of Requirements
● SaaS Offering , Cloud Managed
● Very Well Integrated
1 Data Platform
Checkpoint
● Native Drivers to Support Batch & Stream
● Highest Data Processing Speed
● Storage vs Compute - Scaling In and Out
● Automatic Scaling, Performance Efficient
2 Data Processing
Checkpoint
● Can Be Integrated With Any BI Tools
● Support AI/ML Libraries and Jobs
● Performance Efficient - Data Processing , Scanning
3 Data Analytics
Checkpoint
● Customized Logging & Monitoring
● Native vs Customized Dashboards
● Integration With Various Alerting, Messaging Tools
5 Data Operations
● High Availability
● Automatic Failover , No DR Required
● Performance & Cost Efficient
● Pay as You Go vs Commitment Comparison Based on
Overall Usage
4 Business Checkpoint
19
Evaluation - Final Report
Approach 1 Approach 2
DW BigQuery BigQuery
ETL + ELT Pipelines
Modify DS jobs to use BQ connector to load data to BQ
landing layer
Convert DS load jobs to BQ load jobs to pull data from source
and load to BQ
(this is depending on types of source systems and integration
complexity)
Data Storage
Store active data in BQ native tables with roll up policies
and store passive datasets on GCS layer depending on
usage of tables. External tables can be built on GCS
datasets.
Store active data in BQ native tables with roll up policies and
store passive datasets on GCS layer depending on usage of
tables.External tables can be built on GCS datasets.
Data Analytics Tableau connections can be replaced with BQ connections Tableau connections can be replaced with BQ connections
Data Pipeline Scheduler &
Maint
Control-M can be used to trigger the pipelines,
Orchestration can be done using Composer. Existing
ticketing tools, alerting tools can be used as is
Control-M can be used to trigger the pipelines, Orchestration can
be done using Composer.Existing ticketing tools, alerting tools
can be used as is
BigQuery is opted here post evaluation which is completely based on the initial sign up to GCP as well as data storage % ratio between active and
passive storage. Azure Synapse can offer the same capabilities however choices here are business & enterprise driven.
20
Thank You
Stay in Touch
Pooja Kelgaonkar
poojakelgaonkar@gmail.com & pooja.kelgaonkar@rackspace.com
www.linkedin.com/in/poojakelgaonkar
poojakelgaonkar.medium.com
Ad

More Related Content

What's hot (20)

Architecting Modern Data Platforms
Architecting Modern Data PlatformsArchitecting Modern Data Platforms
Architecting Modern Data Platforms
Ankit Rathi
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
DATAVERSITY
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Data Mesh
Data MeshData Mesh
Data Mesh
Piethein Strengholt
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
DATAVERSITY
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
DATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
DATAVERSITY
 
Architecting Modern Data Platforms
Architecting Modern Data PlatformsArchitecting Modern Data Platforms
Architecting Modern Data Platforms
Ankit Rathi
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
DATAVERSITY
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
DATAVERSITY
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
DATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
DATAVERSITY
 

Similar to Data Platform Architecture Principles and Evaluation Criteria (20)

Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
Alex Henthorn-Iwane
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
Precisely
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSS
Keyur Thakore
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
sharpan
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Jaroslav Gergic
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extraction
pzybrick
 
Acme data engineering case study
Acme data engineering case studyAcme data engineering case study
Acme data engineering case study
Mukul Sood
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
Kangaroot
 
Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!
Smart ERP Solutions, Inc.
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
Tony Ng
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
DataWorks Summit
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
Alex Henthorn-Iwane
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
Precisely
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSS
Keyur Thakore
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
sharpan
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Jaroslav Gergic
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extraction
pzybrick
 
Acme data engineering case study
Acme data engineering case studyAcme data engineering case study
Acme data engineering case study
Mukul Sood
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
Kangaroot
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
Tony Ng
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
DataWorks Summit
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Ad

More from ScyllaDB (20)

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
Leading a High-Stakes Database MigrationLeading a High-Stakes Database Migration
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsAchieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn IsarathamHow Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd ColemanHow Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB: 10 Years and Beyond by Dor LaorScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Reduce Your Cloud Spend with ScyllaDB by Tzach LivyatanReduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence LiuMigrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
Vector Search with ScyllaDB by Szymon WasikVector Search with ScyllaDB by Szymon Wasik
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Object Storage in ScyllaDB by Ran Regev, ScyllaDBObject Storage in ScyllaDB by Ran Regev, ScyllaDB
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Lessons Learned from Building a Serverless Notifications System by Srushith R...Lessons Learned from Building a Serverless Notifications System by Srushith R...
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
A Dist Sys Programmer's Journey into AI by Piotr Sarna
A Dist Sys Programmer's Journey into AI by Piotr SarnaA Dist Sys Programmer's Journey into AI by Piotr Sarna
A Dist Sys Programmer's Journey into AI by Piotr Sarna
ScyllaDB
 
High Availability: Lessons Learned by Paul Preuveneers
High Availability: Lessons Learned by Paul PreuveneersHigh Availability: Lessons Learned by Paul Preuveneers
High Availability: Lessons Learned by Paul Preuveneers
ScyllaDB
 
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
ScyllaDB
 
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
ScyllaDB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
Leading a High-Stakes Database MigrationLeading a High-Stakes Database Migration
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsAchieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn IsarathamHow Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd ColemanHow Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB: 10 Years and Beyond by Dor LaorScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Reduce Your Cloud Spend with ScyllaDB by Tzach LivyatanReduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence LiuMigrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
Vector Search with ScyllaDB by Szymon WasikVector Search with ScyllaDB by Szymon Wasik
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Object Storage in ScyllaDB by Ran Regev, ScyllaDBObject Storage in ScyllaDB by Ran Regev, ScyllaDB
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Lessons Learned from Building a Serverless Notifications System by Srushith R...Lessons Learned from Building a Serverless Notifications System by Srushith R...
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
A Dist Sys Programmer's Journey into AI by Piotr Sarna
A Dist Sys Programmer's Journey into AI by Piotr SarnaA Dist Sys Programmer's Journey into AI by Piotr Sarna
A Dist Sys Programmer's Journey into AI by Piotr Sarna
ScyllaDB
 
High Availability: Lessons Learned by Paul Preuveneers
High Availability: Lessons Learned by Paul PreuveneersHigh Availability: Lessons Learned by Paul Preuveneers
High Availability: Lessons Learned by Paul Preuveneers
ScyllaDB
 
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
ScyllaDB
 
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
ScyllaDB
 
Ad

Recently uploaded (20)

Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 

Data Platform Architecture Principles and Evaluation Criteria

  • 1. Data Platform Architecture Principles and Evaluation Criteria Pooja Kelgaonkar, Senior Data Architect at Rackspace Technology
  • 2. Pooja Kelgaonkar ■ Senior Data Architect - GCP, Snowflake ■ Specialist in Data Modernization Implementations ■ Expertise in “Data” domain ■ Learner, Tech Blogger, Tech evangelist ■ Reading, listening Indian classical music
  • 3. ■ Architecture Principles ■ Data Modernization ■ Data Platform Offerings ■ Evaluation Criteria ■ Sample Use Case - Evaluation Comparison Agenda 3
  • 5. Framework Pillars Operational Excellence 05 ● Serviceability - Easy Operations & Maint ● Maintenance - Data Pipeline Maintenance ● Reduced Ops Activities & Cost Efficiency 04 ● Performance Efficiency ● Cost Efficiency - Cost Optimized Availability 03 ● Reliability ● Resiliency of System ● Availability - System Time UP Scalability 02 ● Horizontal Scaling ● Vertical Scaling ● Auto Scaling Security 01 ● Access Management & Controls ● Data Protection - Encryption , Data Masking ● Compliance - ISO, HIPPA , PCI DSS ● Data Governance 5
  • 6. ■ Cloud Migration / Adoption - 5Rs of transformation ■ Rehost , Refactor , Revise , Rebuild and Replace Data Modernization Journey Data Discovery Analysis of existing Data Architecture, System Design and evaluating the need, requirements of new data system Data Architecture & Assessment Designing new data platform, assessment of data modelling, Data Governance and Security Data Architecture & Engineering Data Platform implementation, Data Pipeline development and enhancement POCs. Designing end to end cycle. Go Live & DataOps Soft launch/ early cut off to integrate with other systems and signing off from business users. Implementing operations of new platform and modified pipelines Data Migration & Pipeline Development/Conversion Actual pipeline development, conversion on new platform. Implementing , testing and validating pipelines/data on new platform. 05 01 02 03 04 05 01 02 03 04 6
  • 7. Design Framework Pillars & Considerations 7 Teams Architects Engineering Operations Who? When? How? Business Drive Technology Drive Management & Engagement Drive What? End User SLAs Assessment & SLA Setting System Assessment & Technology Evaluations Signed Up Services vs Open Source vs Hybrid Evaluations Business Assessment Technology Evangelist Sign Up for Services Business Teams
  • 9. There are various offerings to implement Data Platform by Public Cloud providers for DB / DW / Data Lake / Data Mesh / SQL / NoSQL etc. Cloud Native ● AWS Glue ● EMR ● Kinesis, Opensearch ● RDS , Aurora ● Redshift ● DynamoDB , DocumentDB AWS ● Azure Data Factory ● HDInsight ● Azure Stream Analytics ● Azure SQL, Managed SQL ● SQL Server, PostgreSQL ● MariaDB, CosmosDB, Managed Cassandra Azure ● DataFlow, Data Fusion ● DataProc ● Pub/Sub, Stream ● Cloud SQL , Cloud Spanner ● BigQuery , BigLake ● Bigtable, Firestore, Memorystore GCP 9
  • 10. There are a variety of offerings to implement data platform and design data pipelines using native and open source services. Data on Cloud - Common Offerings 10
  • 11. Data Platform - Evaluation Criteria (Assessment Phase)
  • 12. Evaluation - Pre-Requisites Evaluation Criteria Existing/Cross Application Platform to be Evaluated New Platforms to be Explored Platform Offerings Existing Support Tier/ Billing Plans Platform Offerings Probable Platform to be Evaluated, Cost Comparisons Done? Managed/Native Services / BYOL services Existing System Licenses, Integrators- BI, OPS tools Managed/Native Services / BYOL services Existing System Licenses, Integrators - BI, OPS tools Specific Evaluation or Open Evaluation to Select Best Fit for Given Use Case 12
  • 13. 13 Evaluation - Inputs Capex vs Opex % of Data Scan vs Processed Compute vs Storage Utilization Data Challenges System Challenges Capex vs Opex % Storage vs Scan vs Processed Compute vs Storage Utilization % Data Challenges System Challenges
  • 14. Evaluation - CheckPoints 1 Data Operations & Business Critical Requirements ● Data Pipeline Management - Monitoring & Operations ● Business Requirements - 24X7 Monitoring vs SLAs ● Critical Applications - Availability & SLAs 3 Business Checkpoint ● Data Availability - SLAs ● End User Agreements ● Business Requirements - Specific to Tooling ● Existing Cost utilization ● Performance Ratio - Current vs Expected ● Modernization Drive 5 Data Platform Checkpoint ● Type of Data - Structured, Semi-Structured, Unstructured ● Sources of Data - Files , DBs, ioTs, Devices, APIs ● Consumers of Data - Users vs System ● Frequency of Data - Batch, RealTime ● Data Storage - Active vs Passive ● Data Modelling - Schema, Tables , DB Objects 2 Data Analytics Checkpoint ● Data Analytics - BI Tooling ● Predictive Analytics - Algorithms, Tools, Libraries used ● AI/ML Use Cases - Customer Facing vs In-House ● Enterprise vs Cloud Native 4 Data Processing Checkpoint ● Target Systems Integrations ● Data Usage - Hot Data vs Cold Data ● Data Stored vs Data Processed vs Reads ● Data Pipelines - Batch vs Streaming ● Data Pipeline Complexity - S/M/C/VC ● Data Pipeline Scheduling - Tools , Cron jobs, Native Schedulers, Event based ● ETL vs ELT Requirements 14
  • 15. 15 Evaluation - Metrics Checkpoint Category Metrics Data Checkpoint Data Integrators No of Sources No of Target No of Specific Systems Total Storage Volume Daily Delta Volume Data Modelling Frequency of Schema Evolution No of Objects % of NoSQL Objects % of PL SQL Objects Data Processing Data Pipelines No of S/M/C Jobs No of External Functions Integrated (Java/Python/SQL) No of ETL Jobs (Tool Based) No of Compute Intense Jobs No of Storage Intense Jobs Checkpoint Category Metrics Business Checkpoint Operations No of Times SLA Challenged No of End Users Affected Reliability No of times Data compromised No of DR activities No of end users impacted Performance Efficiency Total Batch Time No of Times Batch SLA Impacted No of End User Reports No of End Users/Consumers No of Poor Performing Reports/Queries Cost Utilization Overall Billing ( Capex ) Total Operations, Maint cost Data Operations Monitoring No of Support Team Members No of Monitoring Dashboards Data Analytics Analytics No of ML Jobs/Algorithms ML Integrators
  • 17. Evaluation - Pre-Requisites 17 Evaluation Criteria Existing/Cross Application Platform to be Evaluated New Platforms to be Explored Platform Offerings Existing Support Tier/ Billing Plans Platform Offerings Probable Platform to be Evaluated, Cost Comparisons Done? Managed/Native Services / BYOL services Existing System Licenses, Integrators- BI, OPS tools Managed/Native Services / BYOL services Existing System Licenses, Integrators - BI, OPS tools Specific Evaluation or Open Evaluation to Select Best Fit for Given Use Case
  • 18. Evaluation - Inputs ■ Domain - Retail , DW - Teradata, ETL - DataStage ■ Platform - Recently Signed up for Google Cloud Platform ■ Data Platform - Evaluate GCP Services to Setup Data Warehouse Platform ■ DW Size - 120TB (70 TB Active + 50 TB Passive ) ■ Daily Volume - 1TB ( 80% Batch + 20% Streaming ) ■ Data - Structured & Semi-structured (JSON, XML) ■ Data Pipelines - Mostly ELT - Datastage to Teradata (landing layer), Teradata SQL to Transform Data ■ Data Analytics - Tableau Reports - Customer Reports ■ Enterprise Scheduler - Control-M , Ticketing Tool - JIRA , Alerting via Slack, Email ■ Monitoring Dashboards , 24X7 Support Team 18
  • 19. DW - Google BigQuery vs Azure Synapse BigQuery Synapse Observations ● Supports More Than 90% of Requirements ● SaaS Offering , Cloud Managed ● Very Well Integrated 1 Data Platform Checkpoint ● Native Drivers to Support Batch & Stream ● Highest Data Processing Speed ● Storage vs Compute - Scaling In and Out ● Automatic Scaling, Performance Efficient 2 Data Processing Checkpoint ● Can Be Integrated With Any BI Tools ● Support AI/ML Libraries and Jobs ● Performance Efficient - Data Processing , Scanning 3 Data Analytics Checkpoint ● Customized Logging & Monitoring ● Native vs Customized Dashboards ● Integration With Various Alerting, Messaging Tools 5 Data Operations ● High Availability ● Automatic Failover , No DR Required ● Performance & Cost Efficient ● Pay as You Go vs Commitment Comparison Based on Overall Usage 4 Business Checkpoint 19
  • 20. Evaluation - Final Report Approach 1 Approach 2 DW BigQuery BigQuery ETL + ELT Pipelines Modify DS jobs to use BQ connector to load data to BQ landing layer Convert DS load jobs to BQ load jobs to pull data from source and load to BQ (this is depending on types of source systems and integration complexity) Data Storage Store active data in BQ native tables with roll up policies and store passive datasets on GCS layer depending on usage of tables. External tables can be built on GCS datasets. Store active data in BQ native tables with roll up policies and store passive datasets on GCS layer depending on usage of tables.External tables can be built on GCS datasets. Data Analytics Tableau connections can be replaced with BQ connections Tableau connections can be replaced with BQ connections Data Pipeline Scheduler & Maint Control-M can be used to trigger the pipelines, Orchestration can be done using Composer. Existing ticketing tools, alerting tools can be used as is Control-M can be used to trigger the pipelines, Orchestration can be done using Composer.Existing ticketing tools, alerting tools can be used as is BigQuery is opted here post evaluation which is completely based on the initial sign up to GCP as well as data storage % ratio between active and passive storage. Azure Synapse can offer the same capabilities however choices here are business & enterprise driven. 20
  • 21. Thank You Stay in Touch Pooja Kelgaonkar [email protected] & [email protected] www.linkedin.com/in/poojakelgaonkar poojakelgaonkar.medium.com