SlideShare a Scribd company logo
Growing into a proactive Data Platform
Yaar Reuveni & Nir Hedvat
Becoming a Proactive Data
Platform
Yaar Reuveni
• 6 Years at Liveperson
• 1 Reporting & BI
• 3 Data Platform
• 2 Data Platform team lead
• I love to travel
• And
Nir Hedvat
• Software Engineer B.Sc
• 3 years as a C++ Developer
at IBM Rational Rhapsody™
• 1.5 years at LivePerson
• Cloud and Parallel Computing
Enthusiast
• Love Math and Powerlifting
Agenda
• Our Scale & Operation
• Evolution in becoming proactive
i. Hope & Low awareness
ii. Storming & Troubleshooting
iii. Fortifying
iv. Internalization & Comprehension
v. Being Proactive
• Showcases
• Implementation
Our Scale
• 2 M Daily chats
• 100 M Daily monitored visitor sessions
• 20 B Events per day
• 2 TB Raw data per day
• 2 PB Total in Hadoop clusters
• Hundreds producers * event types * consumers
LivePerson technology stack
Stage 1: Hope & Low awareness
We built it and it’s awesome
Online
producer
Offline
producer
local
files
DSPT
Jobs
Raw
Data
* DSPT - Data single point of truth
Stage 1: Hope & Low awareness
We’ve got customers
Dashboards
Data Science
Apps
Reporting
Data ScienceData Access
Ad-Hoc
Queries
Stage 2: Storming & Troubleshooting
You’ve got NOC & SCS on speed dial
Issues arise:
• Data loss
• Data delays
• Partial data out of frame
• Missing/faulty calculations for consumers
• One producer does not send for over a week
Stage 2: Storming & Troubleshooting
You’ve got NOC & SCS on speed dial
Common issues types and generators:
• Hadoop ops
• Production ops
• Events schema
• New data producers
• High new features rate (LE2.0)
• Data stuck in pipeline
• Bugs
Stage 3: Fortifying
Every interruption derives a new protection
Stage 3: Fortifying
Every interruption derives a new protection
Stage 3: Fortifying
Every interruption derives a new protection
• Monitors on jobs, failures, success rate
• Monitors on service status
• Simple data freshness checks e.g. measure the
newest event
• Measure latency of specific parts of the pipeline
Stage 4: Internalization & Comprehension
Auditing requirements
• Measure principles:
– Loss
• How much?
• Which customer?
• What Type?
• Where in the pipeline?
– Freshness
• Percentiles
• Trends
– Statistics
• Event type count
• Event per LP customer
• Trends
Producer
Audit DB
Audit
Aggregator
Audit
Loader
Stage 4: Internalization & Comprehension
Auditing architecture
Producer
Producer
Events Audit
Events
Control
Freshness
Stage 4: Internalization & Comprehension
Mechanism
Data
Common Header
Audit Header
1. Enrich events with
audit metadata
Control Event -
Audit aggregation
Common Header
Audit Header
2. Send control
events per x minutes
Stage 4: Internalization & Comprehension
Mechanism
Data
Common Header
Data
Common Header
Data
Common Header
Data
Common Header
Data
Common Header
Data
Common Header
Audit Header
Control Event - Audit
aggregation
Common Header
Audit Header
Control Event - Audit
aggregation
Common Header
Audit Header
Data
Common Header
Audit Header
Data
Common Header
Audit Header
Data
Common Header
Audit Header
Data
Common Header
Audit Header
Data
Common Header
Old Data Flow
Audited Data Flow
Stage 4: Internalization & Comprehension
How to measure loss?
• Tag all events going through our API with an
auditing header:
<host_name>:<bulk_id>:<sequence_id>
When:
• host_name - the logical identification of the producer server
• bulk_id - an arbitrary unique number that should identify a bulk (changes every X
minutes)
• sequence_id - auto incremented persistent number used to identify missing bulks
• Every X minutes send an audit control event:
{
eventType: AuditControlEvent,
Bulks: [{bulk_id:“srv-xyz:111:97”, data_tier:”shark producer”, total_count:785},
{bulk_id:“srv-xyz:112:98”, data_tier:”shark producer”, total_count:1715}]
}
Stage 4: Internalization & Comprehension
What’s next?
• Immediate gain: enables research loss straight
on the raw data
Next:
• Count events per auditing bulk
• Load into some DB for dashboarding:
In this example, assuming you look at the table after 11:34, and we refer to more than 3 hours as loss, we can see that from server
srv-xyz at bulk_id 1a2b3c we can see 750 events were created and only 405+250 = 655 events arrived within 3 hours this means
we can detect a loss of 95 events from this server.
Audit metadata Data Tier Insertion time Events count
srv-xyz:1a2b3c:25 Producer 08:34 750
srv-xyz:1a2b3c:25 HDFS 09:05 405
srv-xyz:1a2b3c:25 HDFS 10:13 250
Stage 4: Internalization & Comprehension
How to measure freshness?
• Run incremental on the raw data
• Group events by
– Total
– Event type
– LP customer
• Per event calculate
Insertion time - creation time
• Per group:
– Total count
– Min, max & average
– Count into time buckets (0-30; 30-60; 60-120; 120-∞)
Stage 5: Being Proactive
Tools - loss dashboard
Stage 5: Being Proactive
Tools - loss detailed dashboard
Stage 5: Being Proactive
Tools - loss trends
Stage 5: Being Proactive
Tools - freshness
Stage 5: Being Proactive
Tools - freshness
Stage 5: Being Proactive
Tools - data statistics
Showcase I
Bug in a new producer
Showcase II
Deployment issue
• Constant loss
• Only in one farm
• Depends on traffic
• Only a specific producer type
• From all of its nodes
Showcase III
Consumer jobs issues
• Our auditing detected a loss in Alpha
• Data stuck in a job failure dir
• Functional monitoring missed it
• We streamed the stuck data
Showcase IV
Producer issues
• Offline producer gets stuck
• Functional monitoring misses
Implementation
Auditing architecture
Producer
Audit DB
Audit
Aggregator
Audit
Loader
Producer
Producer
Events Audit
Events
Control
Freshness
Implementation
Auditing architecture
Producer
Audit DB
Audit
Aggregator
Audit
Loader
Producer
Producer
Events Audit
Events
Control
Freshness
• Storm topology
• Load audit events from Kafka to MySql
Bulk Tier TS Count
xyz:123 WRPA 08:34 750
xyz:123 DSPT 09:05 405
xyz:123 DSPT 10:13 250
Implementation
Audit Loader
Audit DB
Audit
Loader
Audit
Events
Implementation
Auditing architecture
Producer
Audit DB
Audit
Aggregator
Audit
Loader
Producer
Producer
Events Audit
Events
Control
Freshness
• Load data from HDFS
• Aggregate events according to audit metadata
• Save aggregated audit data to MySql
• Spark implementation
Implementation
Audit Aggregator
HDFS
DB
Data
Aggregate
#1 #2 #3
∑ #1 = N1 ∑ #2 = N2 ∑ #3 = N3
Collect & Save
ZooKeeper
Offset
Audit Aggregator job
First Generation
• Our jobs work incrementally or manually
• Offset management by ZooKeeper
• Failing during saving stage leads to lost offset
• Saving data and offset on same stream
Audit Aggregator job
Overcoming Pitfalls
Audit Aggregator job
Revised Design
HDFS
DB
Aggregate
#1 #2 #3
∑ #1 = N1 ∑ #2 = N2 ∑ #3 = N3
Collect & Save
Data
Offset
Bulk Tier TS Count
xyz:123 WRPA 08:34 750
xyz:123 DSPT 09:05 405
xyz:123 DSPT 10:13 250
• Precedent - Spark Streaming for online auditing
• We see our future with Spark
• Cluster utilization
• Performance
– In-memory computation
– Supports multiple shuffles
– Unified data processing: batch/streaming
Audit Aggregator job
Why Spark
Implementation
Auditing architecture
Producer
Audit DB
Audit
Aggregator
Audit
Loader
Producer
Producer
Events Audit
Events
Control
Freshness
• End-to-end latency assessment
• Freshness per criteria
• Output - various stats
Implementation
Data Freshness
Freshness job
Design
Map
Reduce
HDFS
Total LP Customer Event Type
Min Max Avg BucketsCount
Event
Event
Event
Event
Freshness job
Mechanism
• Driver
– Collects LP events from HDFS
• Map
– Compute freshness latencies
– Segmentize events per criteria by generating
a composite kay
• Reduce
– Compute count, min, max, avg and buckets
– Write stats to HDFS
Freshness job
Output usage
Hadoop Platform
Overcoming Pitfalls
• Our data model is built over Avro
• Avro comes with schema evolution
• Avro data is stored along with its schema
• High model-modification rate
• LOBs schema changes are synchronized
Producer → Consumer
Hadoop Platform
Overcoming Pitfalls
• MR/Spark job is revision-compiled when using
SpecificRecord
• Using GenericRecord removes the burden of
recompiling each time schema changes
Implementation
Auditing architecture
Producer
Audit DB
Audit
Aggregator
Audit
Loader
Producer
Producer
Events Audit
Events
Control
Freshness
THANK YOU!
We are hiring
YouTube.com/LivePersonDev
Twitter.com/LivePersonDev
Facebook.com/LivePersonDev
Slideshare.net/LivePersonDev
Ad

More Related Content

What's hot (20)

Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
confluent
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
confluent
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
DevOps.com
 
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward
 
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
HostedbyConfluent
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward
 
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
confluent
 
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
confluent
 
Data governance and discoverability at AO.com | Jon Vines, AO.com and Christo...
Data governance and discoverability at AO.com | Jon Vines, AO.com and Christo...Data governance and discoverability at AO.com | Jon Vines, AO.com and Christo...
Data governance and discoverability at AO.com | Jon Vines, AO.com and Christo...
HostedbyConfluent
 
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
confluent
 
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart LabsJun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
confluent
 
Cisco’s E-Commerce Transformation Using Kafka
Cisco’s E-Commerce Transformation Using Kafka Cisco’s E-Commerce Transformation Using Kafka
Cisco’s E-Commerce Transformation Using Kafka
confluent
 
Events Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public SectorEvents Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public Sector
confluent
 
Achieving end-to-end visibility into complex event-sourcing transactions usin...
Achieving end-to-end visibility into complex event-sourcing transactions usin...Achieving end-to-end visibility into complex event-sourcing transactions usin...
Achieving end-to-end visibility into complex event-sourcing transactions usin...
HostedbyConfluent
 
Should we manage events like APIs? | Kim Clark, IBM
Should we manage events like APIs? | Kim Clark, IBMShould we manage events like APIs? | Kim Clark, IBM
Should we manage events like APIs? | Kim Clark, IBM
HostedbyConfluent
 
Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka
confluent
 
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
HostedbyConfluent
 
Building Value - Understanding the TCO and ROI of Apache Kafka & Confluent
Building Value  - Understanding the TCO and ROI of Apache Kafka & ConfluentBuilding Value  - Understanding the TCO and ROI of Apache Kafka & Confluent
Building Value - Understanding the TCO and ROI of Apache Kafka & Confluent
confluent
 
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
confluent
 
Blockchain and Kafka - A Modern Love Story | Suhavi Sandhu, Guidewire Software
Blockchain and Kafka - A Modern Love Story | Suhavi Sandhu, Guidewire SoftwareBlockchain and Kafka - A Modern Love Story | Suhavi Sandhu, Guidewire Software
Blockchain and Kafka - A Modern Love Story | Suhavi Sandhu, Guidewire Software
HostedbyConfluent
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
confluent
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
confluent
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
DevOps.com
 
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward
 
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
HostedbyConfluent
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward
 
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
confluent
 
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
confluent
 
Data governance and discoverability at AO.com | Jon Vines, AO.com and Christo...
Data governance and discoverability at AO.com | Jon Vines, AO.com and Christo...Data governance and discoverability at AO.com | Jon Vines, AO.com and Christo...
Data governance and discoverability at AO.com | Jon Vines, AO.com and Christo...
HostedbyConfluent
 
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
confluent
 
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart LabsJun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
confluent
 
Cisco’s E-Commerce Transformation Using Kafka
Cisco’s E-Commerce Transformation Using Kafka Cisco’s E-Commerce Transformation Using Kafka
Cisco’s E-Commerce Transformation Using Kafka
confluent
 
Events Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public SectorEvents Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public Sector
confluent
 
Achieving end-to-end visibility into complex event-sourcing transactions usin...
Achieving end-to-end visibility into complex event-sourcing transactions usin...Achieving end-to-end visibility into complex event-sourcing transactions usin...
Achieving end-to-end visibility into complex event-sourcing transactions usin...
HostedbyConfluent
 
Should we manage events like APIs? | Kim Clark, IBM
Should we manage events like APIs? | Kim Clark, IBMShould we manage events like APIs? | Kim Clark, IBM
Should we manage events like APIs? | Kim Clark, IBM
HostedbyConfluent
 
Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka
confluent
 
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
HostedbyConfluent
 
Building Value - Understanding the TCO and ROI of Apache Kafka & Confluent
Building Value  - Understanding the TCO and ROI of Apache Kafka & ConfluentBuilding Value  - Understanding the TCO and ROI of Apache Kafka & Confluent
Building Value - Understanding the TCO and ROI of Apache Kafka & Confluent
confluent
 
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
confluent
 
Blockchain and Kafka - A Modern Love Story | Suhavi Sandhu, Guidewire Software
Blockchain and Kafka - A Modern Love Story | Suhavi Sandhu, Guidewire SoftwareBlockchain and Kafka - A Modern Love Story | Suhavi Sandhu, Guidewire Software
Blockchain and Kafka - A Modern Love Story | Suhavi Sandhu, Guidewire Software
HostedbyConfluent
 

Viewers also liked (20)

Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
LivePerson
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
LivePerson
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8
LivePerson
 
Continuous Testing Meets the Classroom at Code.org
Continuous Testing Meets the Classroom at Code.orgContinuous Testing Meets the Classroom at Code.org
Continuous Testing Meets the Classroom at Code.org
Sauce Labs
 
Pivotal Failure - Lessons Learned from Lean Startup Machine DC
Pivotal Failure - Lessons Learned from Lean Startup Machine DCPivotal Failure - Lessons Learned from Lean Startup Machine DC
Pivotal Failure - Lessons Learned from Lean Startup Machine DC
Dave Haeffner
 
Selenium
SeleniumSelenium
Selenium
Bryan Mikaelian
 
Web testing with Selenium
Web testing with SeleniumWeb testing with Selenium
Web testing with Selenium
XBOSoft
 
The Testable Web
The Testable WebThe Testable Web
The Testable Web
Dave Haeffner
 
Agile testing for mere mortals
Agile testing for mere mortalsAgile testing for mere mortals
Agile testing for mere mortals
Dave Haeffner
 
How To Use Selenium Successfully
How To Use Selenium SuccessfullyHow To Use Selenium Successfully
How To Use Selenium Successfully
Dave Haeffner
 
KISS Automation.py
KISS Automation.pyKISS Automation.py
KISS Automation.py
Iakiv Kramarenko
 
Full Stack Testing Done Well
Full Stack Testing Done WellFull Stack Testing Done Well
Full Stack Testing Done Well
Dave Haeffner
 
Selenium Basics
Selenium BasicsSelenium Basics
Selenium Basics
Dave Haeffner
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybara
Iakiv Kramarenko
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - EN
Iakiv Kramarenko
 
Cross Platform Appium Tests: How To
Cross Platform Appium Tests: How ToCross Platform Appium Tests: How To
Cross Platform Appium Tests: How To
GlobalLogic Ukraine
 
Polyglot automation - QA Fest - 2015
Polyglot automation - QA Fest - 2015Polyglot automation - QA Fest - 2015
Polyglot automation - QA Fest - 2015
Iakiv Kramarenko
 
Getting Started with Selenium
Getting Started with SeleniumGetting Started with Selenium
Getting Started with Selenium
Dave Haeffner
 
Three Simple Chords of Alternative PageObjects and Hardcore of LoadableCompon...
Three Simple Chords of Alternative PageObjects and Hardcore of LoadableCompon...Three Simple Chords of Alternative PageObjects and Hardcore of LoadableCompon...
Three Simple Chords of Alternative PageObjects and Hardcore of LoadableCompon...
Iakiv Kramarenko
 
Bdd lessons-learned
Bdd lessons-learnedBdd lessons-learned
Bdd lessons-learned
Dave Haeffner
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
LivePerson
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
LivePerson
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8
LivePerson
 
Continuous Testing Meets the Classroom at Code.org
Continuous Testing Meets the Classroom at Code.orgContinuous Testing Meets the Classroom at Code.org
Continuous Testing Meets the Classroom at Code.org
Sauce Labs
 
Pivotal Failure - Lessons Learned from Lean Startup Machine DC
Pivotal Failure - Lessons Learned from Lean Startup Machine DCPivotal Failure - Lessons Learned from Lean Startup Machine DC
Pivotal Failure - Lessons Learned from Lean Startup Machine DC
Dave Haeffner
 
Web testing with Selenium
Web testing with SeleniumWeb testing with Selenium
Web testing with Selenium
XBOSoft
 
Agile testing for mere mortals
Agile testing for mere mortalsAgile testing for mere mortals
Agile testing for mere mortals
Dave Haeffner
 
How To Use Selenium Successfully
How To Use Selenium SuccessfullyHow To Use Selenium Successfully
How To Use Selenium Successfully
Dave Haeffner
 
Full Stack Testing Done Well
Full Stack Testing Done WellFull Stack Testing Done Well
Full Stack Testing Done Well
Dave Haeffner
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybara
Iakiv Kramarenko
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - EN
Iakiv Kramarenko
 
Cross Platform Appium Tests: How To
Cross Platform Appium Tests: How ToCross Platform Appium Tests: How To
Cross Platform Appium Tests: How To
GlobalLogic Ukraine
 
Polyglot automation - QA Fest - 2015
Polyglot automation - QA Fest - 2015Polyglot automation - QA Fest - 2015
Polyglot automation - QA Fest - 2015
Iakiv Kramarenko
 
Getting Started with Selenium
Getting Started with SeleniumGetting Started with Selenium
Getting Started with Selenium
Dave Haeffner
 
Three Simple Chords of Alternative PageObjects and Hardcore of LoadableCompon...
Three Simple Chords of Alternative PageObjects and Hardcore of LoadableCompon...Three Simple Chords of Alternative PageObjects and Hardcore of LoadableCompon...
Three Simple Chords of Alternative PageObjects and Hardcore of LoadableCompon...
Iakiv Kramarenko
 
Ad

Similar to Growing into a proactive Data Platform (20)

Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Hadoop Summit 2014: Processing Complex Workflows in Advertising Using Hadoop
Hadoop Summit 2014: Processing Complex Workflows in Advertising Using HadoopHadoop Summit 2014: Processing Complex Workflows in Advertising Using Hadoop
Hadoop Summit 2014: Processing Complex Workflows in Advertising Using Hadoop
Bernardo de Seabra
 
Processing Complex Workflows in Advertising using Hadoop
Processing Complex Workflows in Advertising using HadoopProcessing Complex Workflows in Advertising using Hadoop
Processing Complex Workflows in Advertising using Hadoop
DataWorks Summit
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
DataWorks Summit/Hadoop Summit
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
JeremyOtt5
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
Sid Anand
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
idan_by
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live Data
DevOps for Enterprise Systems
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
BI Brainz
 
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
InfluxData
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache Spark
Oren Raboy
 
Sybase BAM Overview
Sybase BAM OverviewSybase BAM Overview
Sybase BAM Overview
Xu Jiang
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
Qingsong Yao
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Sid Anand
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Hadoop Summit 2014: Processing Complex Workflows in Advertising Using Hadoop
Hadoop Summit 2014: Processing Complex Workflows in Advertising Using HadoopHadoop Summit 2014: Processing Complex Workflows in Advertising Using Hadoop
Hadoop Summit 2014: Processing Complex Workflows in Advertising Using Hadoop
Bernardo de Seabra
 
Processing Complex Workflows in Advertising using Hadoop
Processing Complex Workflows in Advertising using HadoopProcessing Complex Workflows in Advertising using Hadoop
Processing Complex Workflows in Advertising using Hadoop
DataWorks Summit
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
JeremyOtt5
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
Sid Anand
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
idan_by
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live Data
DevOps for Enterprise Systems
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
BI Brainz
 
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
InfluxData
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache Spark
Oren Raboy
 
Sybase BAM Overview
Sybase BAM OverviewSybase BAM Overview
Sybase BAM Overview
Xu Jiang
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
Qingsong Yao
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Sid Anand
 
Ad

More from LivePerson (16)

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
LivePerson
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
LivePerson
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
LivePerson
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?
LivePerson
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
LivePerson
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
LivePerson
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
LivePerson
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
LivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
LivePerson
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
LivePerson
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
LivePerson
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
LivePerson
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
LivePerson
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
How can A/B testing go wrong?
How can A/B testing go wrong?How can A/B testing go wrong?
How can A/B testing go wrong?
LivePerson
 
Introduction to Vertica (Architecture & More)
Introduction to Vertica (Architecture & More)Introduction to Vertica (Architecture & More)
Introduction to Vertica (Architecture & More)
LivePerson
 
Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
LivePerson
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
LivePerson
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
LivePerson
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?
LivePerson
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
LivePerson
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
LivePerson
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
LivePerson
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
LivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
LivePerson
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
LivePerson
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
LivePerson
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
LivePerson
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
LivePerson
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
How can A/B testing go wrong?
How can A/B testing go wrong?How can A/B testing go wrong?
How can A/B testing go wrong?
LivePerson
 
Introduction to Vertica (Architecture & More)
Introduction to Vertica (Architecture & More)Introduction to Vertica (Architecture & More)
Introduction to Vertica (Architecture & More)
LivePerson
 

Recently uploaded (20)

Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 

Growing into a proactive Data Platform

  • 2. Yaar Reuveni & Nir Hedvat Becoming a Proactive Data Platform
  • 3. Yaar Reuveni • 6 Years at Liveperson • 1 Reporting & BI • 3 Data Platform • 2 Data Platform team lead • I love to travel • And
  • 4. Nir Hedvat • Software Engineer B.Sc • 3 years as a C++ Developer at IBM Rational Rhapsody™ • 1.5 years at LivePerson • Cloud and Parallel Computing Enthusiast • Love Math and Powerlifting
  • 5. Agenda • Our Scale & Operation • Evolution in becoming proactive i. Hope & Low awareness ii. Storming & Troubleshooting iii. Fortifying iv. Internalization & Comprehension v. Being Proactive • Showcases • Implementation
  • 6. Our Scale • 2 M Daily chats • 100 M Daily monitored visitor sessions • 20 B Events per day • 2 TB Raw data per day • 2 PB Total in Hadoop clusters • Hundreds producers * event types * consumers
  • 8. Stage 1: Hope & Low awareness We built it and it’s awesome Online producer Offline producer local files DSPT Jobs Raw Data * DSPT - Data single point of truth
  • 9. Stage 1: Hope & Low awareness We’ve got customers Dashboards Data Science Apps Reporting Data ScienceData Access Ad-Hoc Queries
  • 10. Stage 2: Storming & Troubleshooting You’ve got NOC & SCS on speed dial Issues arise: • Data loss • Data delays • Partial data out of frame • Missing/faulty calculations for consumers • One producer does not send for over a week
  • 11. Stage 2: Storming & Troubleshooting You’ve got NOC & SCS on speed dial Common issues types and generators: • Hadoop ops • Production ops • Events schema • New data producers • High new features rate (LE2.0) • Data stuck in pipeline • Bugs
  • 12. Stage 3: Fortifying Every interruption derives a new protection
  • 13. Stage 3: Fortifying Every interruption derives a new protection
  • 14. Stage 3: Fortifying Every interruption derives a new protection • Monitors on jobs, failures, success rate • Monitors on service status • Simple data freshness checks e.g. measure the newest event • Measure latency of specific parts of the pipeline
  • 15. Stage 4: Internalization & Comprehension Auditing requirements • Measure principles: – Loss • How much? • Which customer? • What Type? • Where in the pipeline? – Freshness • Percentiles • Trends – Statistics • Event type count • Event per LP customer • Trends
  • 16. Producer Audit DB Audit Aggregator Audit Loader Stage 4: Internalization & Comprehension Auditing architecture Producer Producer Events Audit Events Control Freshness
  • 17. Stage 4: Internalization & Comprehension Mechanism Data Common Header Audit Header 1. Enrich events with audit metadata Control Event - Audit aggregation Common Header Audit Header 2. Send control events per x minutes
  • 18. Stage 4: Internalization & Comprehension Mechanism Data Common Header Data Common Header Data Common Header Data Common Header Data Common Header Data Common Header Audit Header Control Event - Audit aggregation Common Header Audit Header Control Event - Audit aggregation Common Header Audit Header Data Common Header Audit Header Data Common Header Audit Header Data Common Header Audit Header Data Common Header Audit Header Data Common Header Old Data Flow Audited Data Flow
  • 19. Stage 4: Internalization & Comprehension How to measure loss? • Tag all events going through our API with an auditing header: <host_name>:<bulk_id>:<sequence_id> When: • host_name - the logical identification of the producer server • bulk_id - an arbitrary unique number that should identify a bulk (changes every X minutes) • sequence_id - auto incremented persistent number used to identify missing bulks • Every X minutes send an audit control event: { eventType: AuditControlEvent, Bulks: [{bulk_id:“srv-xyz:111:97”, data_tier:”shark producer”, total_count:785}, {bulk_id:“srv-xyz:112:98”, data_tier:”shark producer”, total_count:1715}] }
  • 20. Stage 4: Internalization & Comprehension What’s next? • Immediate gain: enables research loss straight on the raw data Next: • Count events per auditing bulk • Load into some DB for dashboarding: In this example, assuming you look at the table after 11:34, and we refer to more than 3 hours as loss, we can see that from server srv-xyz at bulk_id 1a2b3c we can see 750 events were created and only 405+250 = 655 events arrived within 3 hours this means we can detect a loss of 95 events from this server. Audit metadata Data Tier Insertion time Events count srv-xyz:1a2b3c:25 Producer 08:34 750 srv-xyz:1a2b3c:25 HDFS 09:05 405 srv-xyz:1a2b3c:25 HDFS 10:13 250
  • 21. Stage 4: Internalization & Comprehension How to measure freshness? • Run incremental on the raw data • Group events by – Total – Event type – LP customer • Per event calculate Insertion time - creation time • Per group: – Total count – Min, max & average – Count into time buckets (0-30; 30-60; 60-120; 120-∞)
  • 22. Stage 5: Being Proactive Tools - loss dashboard
  • 23. Stage 5: Being Proactive Tools - loss detailed dashboard
  • 24. Stage 5: Being Proactive Tools - loss trends
  • 25. Stage 5: Being Proactive Tools - freshness
  • 26. Stage 5: Being Proactive Tools - freshness
  • 27. Stage 5: Being Proactive Tools - data statistics
  • 28. Showcase I Bug in a new producer
  • 29. Showcase II Deployment issue • Constant loss • Only in one farm • Depends on traffic • Only a specific producer type • From all of its nodes
  • 30. Showcase III Consumer jobs issues • Our auditing detected a loss in Alpha • Data stuck in a job failure dir • Functional monitoring missed it • We streamed the stuck data
  • 31. Showcase IV Producer issues • Offline producer gets stuck • Functional monitoring misses
  • 34. • Storm topology • Load audit events from Kafka to MySql Bulk Tier TS Count xyz:123 WRPA 08:34 750 xyz:123 DSPT 09:05 405 xyz:123 DSPT 10:13 250 Implementation Audit Loader Audit DB Audit Loader Audit Events
  • 36. • Load data from HDFS • Aggregate events according to audit metadata • Save aggregated audit data to MySql • Spark implementation Implementation Audit Aggregator
  • 37. HDFS DB Data Aggregate #1 #2 #3 ∑ #1 = N1 ∑ #2 = N2 ∑ #3 = N3 Collect & Save ZooKeeper Offset Audit Aggregator job First Generation
  • 38. • Our jobs work incrementally or manually • Offset management by ZooKeeper • Failing during saving stage leads to lost offset • Saving data and offset on same stream Audit Aggregator job Overcoming Pitfalls
  • 39. Audit Aggregator job Revised Design HDFS DB Aggregate #1 #2 #3 ∑ #1 = N1 ∑ #2 = N2 ∑ #3 = N3 Collect & Save Data Offset Bulk Tier TS Count xyz:123 WRPA 08:34 750 xyz:123 DSPT 09:05 405 xyz:123 DSPT 10:13 250
  • 40. • Precedent - Spark Streaming for online auditing • We see our future with Spark • Cluster utilization • Performance – In-memory computation – Supports multiple shuffles – Unified data processing: batch/streaming Audit Aggregator job Why Spark
  • 42. • End-to-end latency assessment • Freshness per criteria • Output - various stats Implementation Data Freshness
  • 43. Freshness job Design Map Reduce HDFS Total LP Customer Event Type Min Max Avg BucketsCount Event Event Event Event
  • 44. Freshness job Mechanism • Driver – Collects LP events from HDFS • Map – Compute freshness latencies – Segmentize events per criteria by generating a composite kay • Reduce – Compute count, min, max, avg and buckets – Write stats to HDFS
  • 46. Hadoop Platform Overcoming Pitfalls • Our data model is built over Avro • Avro comes with schema evolution • Avro data is stored along with its schema • High model-modification rate • LOBs schema changes are synchronized Producer → Consumer
  • 47. Hadoop Platform Overcoming Pitfalls • MR/Spark job is revision-compiled when using SpecificRecord • Using GenericRecord removes the burden of recompiling each time schema changes