SlideShare a Scribd company logo
© Rocana, Inc. All Rights Reserved. | 1
Joey Echeverria, Platform Technical Lead
Strata+Hadoop World, March 31st 2016
San Jose, CA
Embeddable data transformation for
real-time streams
© Rocana, Inc. All Rights Reserved. | 2 http://j.mp/hw-questions
Slides
http://j.mp/rocana-transform-slides
© Rocana, Inc. All Rights Reserved. | 3 http://j.mp/hw-questions
Questions
http://j.mp/hw-questions
© Rocana, Inc. All Rights Reserved. | 4 http://j.mp/hw-questions
Context
© Rocana, Inc. All Rights Reserved. | 5 http://j.mp/hw-questions
Joey
• Where I work: Rocana – Platform Technical Lead
• Where I used to work: Cloudera (’11-’15), NSA
• Distributed systems, security, data processing, big data
© Rocana, Inc. All Rights Reserved. | 6
Signing today at 1pm at the
Cloudera booth
© Rocana, Inc. All Rights Reserved. | 7 http://j.mp/hw-questions
History
© Rocana, Inc. All Rights Reserved. | 8 http://j.mp/hw-questions
Spark
Impala
“Legacy” data architecture
HDFS
Avro/Parquet FilesFlume/Sqoop
Data Producers
MapReduc
e
Visualization/Query
© Rocana, Inc. All Rights Reserved. | 9 http://j.mp/hw-questions
Flink
Storm
Stream data architecture
Kafka
Avro Serialized
Recrods
Data Producers Spark Streaming
Real-time Visualization
HDFS
Avro/Parquet FilesKafka Consumers
© Rocana, Inc. All Rights Reserved. | 10 http://j.mp/hw-questions
Flink
Storm
Stream data architecture
Kafka
Avro Serialized
Recrods
Data Producers Spark Streaming
Real-time Visualization
HDFS
Avro/Parquet FilesKafka Consumers
© Rocana, Inc. All Rights Reserved. | 11 http://j.mp/hw-questions
Stream processing
A primer
© Rocana, Inc. All Rights Reserved. | 12 http://j.mp/hw-questions
Stream processing
• Filter
• Extract
• Project
• Aggregate
• Join
• Model
© Rocana, Inc. All Rights Reserved. | 13 http://j.mp/hw-questions
Stream processing
• Filter
• Extract
• Project
• Aggregate
• Join
• Model
© Rocana, Inc. All Rights Reserved. | 14 http://j.mp/hw-questions
Stream processing
• Filter
• Extract
• Project
• Aggregate
• Join
• Model
• Data transformation
© Rocana, Inc. All Rights Reserved. | 15 http://j.mp/hw-questions
Apache Storm
• "Distributed real-time computation system"
• Applications packaged into topologies (think MapReduce job)
• Topologies operate over streams of tuples
• Spout: source of a stream
• Bolt: arbitrary operation such as filtering, aggregating, joining, or
executing arbitrary functions
© Rocana, Inc. All Rights Reserved. | 16 http://j.mp/hw-questions
Apache Spark
• Supports batch and stream processing
• Continuous stream of records discretized into a DStream
• DStream: a sequence of RDDs (batches of records)
• Micro-batch
© Rocana, Inc. All Rights Reserved. | 17 http://j.mp/hw-questions
Apache Flink
• Supports batch and stream processing
• DataStream: unbounded collection of records
• Operations can apply to individual records or windows of records
• Supports record-at-a-time processing (like Storm)
© Rocana, Inc. All Rights Reserved. | 18 http://j.mp/hw-questions
Apache Kafka
• Pub-sub messaging system implemented as a distributed commit log
• Popular as a source and sink for data streams
• Scalability, durability, and easy-to-understand delivery guarantees
• Can do stream processing directly in Kafka consumers
© Rocana, Inc. All Rights Reserved. | 19 http://j.mp/hw-questions
Data transformation
© Rocana, Inc. All Rights Reserved. | 20 http://j.mp/hw-questions
Filter
filter
© Rocana, Inc. All Rights Reserved. | 21 http://j.mp/hw-questions
Extract
127.0.0.1 Mozilla/5.0 laura [31/Mar/2016] "GET /index.html HTTP/1.0" 200 2326
ts: 1436576671000
body: <binary blob>
event_type_id: 100
...
extract
ts: 1436576671000
body: <binary blob>
event_type_id: 100
attributes: {
ip: "127.0.0.1"
user_agent: "Mozilla/5.0"
user_id: "laura"
date: "[31/March/2016]"
request: "GET /index.html HTTP/1.0"
status_code: "200"
size: "2326"
}
© Rocana, Inc. All Rights Reserved. | 22 http://j.mp/hw-questions
Project
ts: 1436576671000
body: <binary blob>
event_type_id: 100
attributes: {
ip: "127.0.0.1"
user_agent: "Mozilla/5.0"
user_id: "laura"
date: "[31/March/2016]"
request: "GET /index.html HTTP/1.0"
status_code: "200"
size: "2326"
}
ts: 1459444413000
ip: "127.0.0.1"
user_agent: "Mozilla/5.0"
user_id: "laura"
request: "GET /index.html HTTP/1.0"
status_code: 200
size: 2326
project
© Rocana, Inc. All Rights Reserved. | 23 http://j.mp/hw-questions
Problem
© Rocana, Inc. All Rights Reserved. | 24 http://j.mp/hw-questions
Who
• Developers
• Data engineers
• Sysadmins
• Analysts
© Rocana, Inc. All Rights Reserved. | 25 http://j.mp/hw-questions
Tools
© Rocana, Inc. All Rights Reserved. | 26 http://j.mp/hw-questions
The dark art of data science
• Feature engineering
• “Getting a mess of raw data that can be used as input to a machine
learning algorithm” - @josh_wills
• Video from Midwest.io 2014
© Rocana, Inc. All Rights Reserved. | 27 http://j.mp/hw-questions
Data transformation for all
© Rocana, Inc. All Rights Reserved. | 28 http://j.mp/hw-questions
Rocana Transform
• Library
• Java
• Rocana configuration
• JSON + comments + specific numeric types - excess quoting
© Rocana, Inc. All Rights Reserved. | 29 http://j.mp/hw-questions
Data model
• Event schema
• id: A globally unique identifier for this event
• ts: Epoch timestamp in milliseconds
• event_type_id: ID indicating the type of the event
• location: Location from which the event was generated
• host: Hostname, IP, or other device identifier from which the event was
generated
• service: Service or process from which the event was generated
• body: Raw event content in bytes
• attributes: Event type-specific key/value pairs
© Rocana, Inc. All Rights Reserved. | 30 http://j.mp/hw-questions
Example event
{
"id": "JRHAIDMLCKLEAPMIQDHFLO3MXYXV7NVBEJNDKZGS2XVSEINGGBHA====",
"event_type_id": 100,
"ts": 1436576671000,
"location": "aws/us-west-2a",
"host": "example01.rocana.com",
"service": "dhclient",
"body": "<36>Jul 10 18:04:31 gs09.example.com dhclient[865] DHCPACK from …",
"attributes": {
"syslog_timestamp": "1436576671000",
"syslog_process": "dhclient",
"syslog_pid": "865",
"syslog_facility": "3",
"syslog_severity": "6",
"syslog_hostname": "example01",
"syslog_message": "DHCPACK from 10.10.1.1 (xid=0x5c64bdb0)"
}
}
© Rocana, Inc. All Rights Reserved. | 31 http://j.mp/hw-questions
Filter, extract, and flatten
© Rocana, Inc. All Rights Reserved. | 32 http://j.mp/hw-questions
Filter, extract, and flatten
• Filter out events without type id 100
• Filter out events without hostname prefix "ex"
• Extract a numeric prefix from the syslog message
• Flatten syslog attributes to top-level fields in a different avro schema
© Rocana, Inc. All Rights Reserved. | 33 http://j.mp/hw-questions
Filter, extract, and flatten
{
load-event: {},
// Filter by event_type_id
filter: { expression: "${event_type_id == 100}" },
// Extract hostname prefix
regex: { ... },
filter: { expression: "${host_prefix.match.group.1 == 'ex'}",
// Extract a numeric prefix from the syslog message
regex: { ... },
// Build flattened record
build-avro-record: { ... },
// Accumulate output record
accumulate-output: {
value: "${output_record}"
}
}
© Rocana, Inc. All Rights Reserved. | 34 http://j.mp/hw-questions
Extract hostname prefix
{
load-event: {},
filter: { expression: "${event_type_id == 100}" },
regex: {
pattern: "^(.{2}).*$",
value: "${attr.syslog_hostname}",
destination: "host_prefix"
},
filter: { expression: "${host_prefix.match.group.1 == 'ex'}",
...
}
© Rocana, Inc. All Rights Reserved. | 35 http://j.mp/hw-questions
Extract numeric prefix
...
filter: { expression: "${host_prefix.match.group.1 == 'ex'}",
regex: {
pattern: "^([0-9]*)",
value: "${attributes['syslog_message']}",
destination: "msg",
match-actions: {
set-values: { extracted_field: "${msg.match.group.1}" }
},
no-match-actions: {
set-values: { extracted_field: "" }
}
},
...
© Rocana, Inc. All Rights Reserved. | 36 http://j.mp/hw-questions
Build flattened record
...
build-avro-record: {
schema-uri: "resource:avro-schemas/flattened-syslog.avsc",
destination: "output_record",
field-mapping: {
ts: "${ts}",
event_type_id: "${event_type_id}",
source: "${source}",
syslog_facility: "${convert:toInt(attributes['syslog_facility'])}",
syslog_severity: "${convert:toInt(attributes['syslog_severity'])}",
...
syslog_message: "${attributes['syslog_message']}",
syslog_pid: "${convert:toInt(attributes['syslog_pid)}",
extracted_field: "${extracted_field}"
},
},
...
© Rocana, Inc. All Rights Reserved. | 37 http://j.mp/hw-questions
Extract metrics from log data
© Rocana, Inc. All Rights Reserved. | 38 http://j.mp/hw-questions
Extract metrics
• Input: HTTP status logs
• Extract request latency
• Extract counts by HTTP status code
• Metric types
• Guage: A value that varies over time (think latency, CPU %, etc.)
• Counter: A value that accumulates over time (think event volume, status codes,
etc.)
© Rocana, Inc. All Rights Reserved. | 39 http://j.mp/hw-questions
Example metric event
{
"id": "JRHAIDMLCKLEAPMIQDHFLO3MXBBQ7NVBEJNDKZGS2XVSEINGGBHA====",
"event_type_id": 107,
"ts": 1436576671000,
"location": "aws/us-west-2a",
"host": "web01.rocana.com",
"service": "httpd",
"attributes": {
"m.http.request.latency": "4.2000000000E1|g",
"m.http.status.401.count": "1.0000000000E0|c",
}
}
© Rocana, Inc. All Rights Reserved. | 40 http://j.mp/hw-questions
Extract metrics
{
load-event: {},
build-metric: {
gauge-mapping: {
http.request.latency: "${convert:toDouble(attributes['latency'])}"
},
destination: "latency_metric"
},
accumulate-output: { value: "${latency_metric}" },
build-metric: {
dynamic-counter-mapping: [
"${string:format('http.status.%s.count', attributes['sc_status'])}", 1D
],
destination: "status_metric"
},
accumulate-output: { value: "${status_metric}" }
}
© Rocana, Inc. All Rights Reserved. | 41 http://j.mp/hw-questions
Architecture
© Rocana, Inc. All Rights Reserved. | 42 http://j.mp/hw-questions
Java action objects
Architecture
Configuration file Java action objects Context
Variables
Driver
1. Parse config
2. Initialize
context
5. Copy output
3. Execute actions
4. Read/write
variables
© Rocana, Inc. All Rights Reserved. | 43 http://j.mp/hw-questions
Custom actions
• Actions loaded at runtime using Java services framework
• Add your jar to the classpath
• Custom actions appear as top-level keywords just like regular actions
• Implement the execute() method of the Action interface
• Implement the build() method of the ActionBuilder interface
© Rocana, Inc. All Rights Reserved. | 44 http://j.mp/hw-questions
Custom actions
• Parse custom log formats
• Cisco ACS
• Citrix
• Juniper
• Customer-specific formats
• Lookup IP addresses in the MaxMind GeoIP2 database
• Reference dataset lookups
• Device id to device name
© Rocana, Inc. All Rights Reserved. | 45 http://j.mp/hw-questions
Putting it all together
• Stream processing is causing us to re-think how we analyze data
• Limiting accessibility of data transformation side increases costs and
decreases velocity
• Reduce your reliance on developers to code custom pipelines
• Re-use transformation configuration in any stream processing framework
or batch job
© Rocana, Inc. All Rights Reserved. | 46 http://j.mp/hw-questions
Coming soon
• Rocana transform will be released under the ASL 2.0
• The base configuration library is available today:
• https://github.com/scalingdata/rocana-configuration
© Rocana, Inc. All Rights Reserved. | 47 http://j.mp/hw-questions
Questions?
• Signing "Hadoop Security" today at 1pm at the Cloudera booth

More Related Content

What's hot (20)

Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
DataWorks Summit/Hadoop Summit
 
Introduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormIntroduction to Apache NiFi And Storm
Introduction to Apache NiFi And Storm
Jungtaek Lim
 
LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data
DataWorks Summit/Hadoop Summit
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
DataWorks Summit/Hadoop Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
DataWorks Summit
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Introduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormIntroduction to Apache NiFi And Storm
Introduction to Apache NiFi And Storm
Jungtaek Lim
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
DataWorks Summit/Hadoop Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
DataWorks Summit
 

Viewers also liked (14)

Hybrid & Logical Data Warehouse
Hybrid & Logical Data WarehouseHybrid & Logical Data Warehouse
Hybrid & Logical Data Warehouse
Heungsoon Yang
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Denodo
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentry
mozillazg
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data Virtualization
Denodo
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
DataWorks Summit/Hadoop Summit
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Denodo
 
빅데이터 플랫폼 새로운 미래
빅데이터 플랫폼 새로운 미래빅데이터 플랫폼 새로운 미래
빅데이터 플랫폼 새로운 미래
Wooseung Kim
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
Denodo
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
Den Reymer
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
DataWorks Summit/Hadoop Summit
 
Real-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and ChallengesReal-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and Challenges
DataWorks Summit/Hadoop Summit
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Hybrid & Logical Data Warehouse
Hybrid & Logical Data WarehouseHybrid & Logical Data Warehouse
Hybrid & Logical Data Warehouse
Heungsoon Yang
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Denodo
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentry
mozillazg
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data Virtualization
Denodo
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Denodo
 
빅데이터 플랫폼 새로운 미래
빅데이터 플랫폼 새로운 미래빅데이터 플랫폼 새로운 미래
빅데이터 플랫폼 새로운 미래
Wooseung Kim
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
Denodo
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
Den Reymer
 
Real-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and ChallengesReal-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and Challenges
DataWorks Summit/Hadoop Summit
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 

Similar to Embeddable data transformation for real time streams (20)

Streaming ETL for All
Streaming ETL for AllStreaming ETL for All
Streaming ETL for All
Joey Echeverria
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015
Felicia Haggarty
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
Joey Echeverria
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
Hari Shreedharan
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
Eric Sammer
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
Treasure Data, Inc.
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Eric Sammer
 
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
cdmaxime
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
SATOSHI TAGOMORI
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
HostedbyConfluent
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
iguazio
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
Antonio Peric-Mazar
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
hadooparchbook
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)
Eran Duchan
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
Timothy Spann
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015
Felicia Haggarty
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
Joey Echeverria
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
Eric Sammer
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
Treasure Data, Inc.
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Eric Sammer
 
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
cdmaxime
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
SATOSHI TAGOMORI
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
HostedbyConfluent
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
iguazio
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
hadooparchbook
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)
Eran Duchan
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
Timothy Spann
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 

More from Joey Echeverria (10)

Debugging Apache Spark
Debugging Apache SparkDebugging Apache Spark
Debugging Apache Spark
Joey Echeverria
 
The Future of Apache Hadoop Security
The Future of Apache Hadoop SecurityThe Future of Apache Hadoop Security
The Future of Apache Hadoop Security
Joey Echeverria
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
Joey Echeverria
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and Cloudera
Joey Echeverria
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoop
Joey Echeverria
 
Big data security
Big data securityBig data security
Big data security
Joey Echeverria
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
Joey Echeverria
 
Scratching your own itch
Scratching your own itchScratching your own itch
Scratching your own itch
Joey Echeverria
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
Joey Echeverria
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real world
Joey Echeverria
 
The Future of Apache Hadoop Security
The Future of Apache Hadoop SecurityThe Future of Apache Hadoop Security
The Future of Apache Hadoop Security
Joey Echeverria
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
Joey Echeverria
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and Cloudera
Joey Echeverria
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoop
Joey Echeverria
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
Joey Echeverria
 
Scratching your own itch
Scratching your own itchScratching your own itch
Scratching your own itch
Joey Echeverria
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
Joey Echeverria
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real world
Joey Echeverria
 

Recently uploaded (20)

Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 

Embeddable data transformation for real time streams

  • 1. © Rocana, Inc. All Rights Reserved. | 1 Joey Echeverria, Platform Technical Lead Strata+Hadoop World, March 31st 2016 San Jose, CA Embeddable data transformation for real-time streams
  • 2. © Rocana, Inc. All Rights Reserved. | 2 http://j.mp/hw-questions Slides http://j.mp/rocana-transform-slides
  • 3. © Rocana, Inc. All Rights Reserved. | 3 http://j.mp/hw-questions Questions http://j.mp/hw-questions
  • 4. © Rocana, Inc. All Rights Reserved. | 4 http://j.mp/hw-questions Context
  • 5. © Rocana, Inc. All Rights Reserved. | 5 http://j.mp/hw-questions Joey • Where I work: Rocana – Platform Technical Lead • Where I used to work: Cloudera (’11-’15), NSA • Distributed systems, security, data processing, big data
  • 6. © Rocana, Inc. All Rights Reserved. | 6 Signing today at 1pm at the Cloudera booth
  • 7. © Rocana, Inc. All Rights Reserved. | 7 http://j.mp/hw-questions History
  • 8. © Rocana, Inc. All Rights Reserved. | 8 http://j.mp/hw-questions Spark Impala “Legacy” data architecture HDFS Avro/Parquet FilesFlume/Sqoop Data Producers MapReduc e Visualization/Query
  • 9. © Rocana, Inc. All Rights Reserved. | 9 http://j.mp/hw-questions Flink Storm Stream data architecture Kafka Avro Serialized Recrods Data Producers Spark Streaming Real-time Visualization HDFS Avro/Parquet FilesKafka Consumers
  • 10. © Rocana, Inc. All Rights Reserved. | 10 http://j.mp/hw-questions Flink Storm Stream data architecture Kafka Avro Serialized Recrods Data Producers Spark Streaming Real-time Visualization HDFS Avro/Parquet FilesKafka Consumers
  • 11. © Rocana, Inc. All Rights Reserved. | 11 http://j.mp/hw-questions Stream processing A primer
  • 12. © Rocana, Inc. All Rights Reserved. | 12 http://j.mp/hw-questions Stream processing • Filter • Extract • Project • Aggregate • Join • Model
  • 13. © Rocana, Inc. All Rights Reserved. | 13 http://j.mp/hw-questions Stream processing • Filter • Extract • Project • Aggregate • Join • Model
  • 14. © Rocana, Inc. All Rights Reserved. | 14 http://j.mp/hw-questions Stream processing • Filter • Extract • Project • Aggregate • Join • Model • Data transformation
  • 15. © Rocana, Inc. All Rights Reserved. | 15 http://j.mp/hw-questions Apache Storm • "Distributed real-time computation system" • Applications packaged into topologies (think MapReduce job) • Topologies operate over streams of tuples • Spout: source of a stream • Bolt: arbitrary operation such as filtering, aggregating, joining, or executing arbitrary functions
  • 16. © Rocana, Inc. All Rights Reserved. | 16 http://j.mp/hw-questions Apache Spark • Supports batch and stream processing • Continuous stream of records discretized into a DStream • DStream: a sequence of RDDs (batches of records) • Micro-batch
  • 17. © Rocana, Inc. All Rights Reserved. | 17 http://j.mp/hw-questions Apache Flink • Supports batch and stream processing • DataStream: unbounded collection of records • Operations can apply to individual records or windows of records • Supports record-at-a-time processing (like Storm)
  • 18. © Rocana, Inc. All Rights Reserved. | 18 http://j.mp/hw-questions Apache Kafka • Pub-sub messaging system implemented as a distributed commit log • Popular as a source and sink for data streams • Scalability, durability, and easy-to-understand delivery guarantees • Can do stream processing directly in Kafka consumers
  • 19. © Rocana, Inc. All Rights Reserved. | 19 http://j.mp/hw-questions Data transformation
  • 20. © Rocana, Inc. All Rights Reserved. | 20 http://j.mp/hw-questions Filter filter
  • 21. © Rocana, Inc. All Rights Reserved. | 21 http://j.mp/hw-questions Extract 127.0.0.1 Mozilla/5.0 laura [31/Mar/2016] "GET /index.html HTTP/1.0" 200 2326 ts: 1436576671000 body: <binary blob> event_type_id: 100 ... extract ts: 1436576671000 body: <binary blob> event_type_id: 100 attributes: { ip: "127.0.0.1" user_agent: "Mozilla/5.0" user_id: "laura" date: "[31/March/2016]" request: "GET /index.html HTTP/1.0" status_code: "200" size: "2326" }
  • 22. © Rocana, Inc. All Rights Reserved. | 22 http://j.mp/hw-questions Project ts: 1436576671000 body: <binary blob> event_type_id: 100 attributes: { ip: "127.0.0.1" user_agent: "Mozilla/5.0" user_id: "laura" date: "[31/March/2016]" request: "GET /index.html HTTP/1.0" status_code: "200" size: "2326" } ts: 1459444413000 ip: "127.0.0.1" user_agent: "Mozilla/5.0" user_id: "laura" request: "GET /index.html HTTP/1.0" status_code: 200 size: 2326 project
  • 23. © Rocana, Inc. All Rights Reserved. | 23 http://j.mp/hw-questions Problem
  • 24. © Rocana, Inc. All Rights Reserved. | 24 http://j.mp/hw-questions Who • Developers • Data engineers • Sysadmins • Analysts
  • 25. © Rocana, Inc. All Rights Reserved. | 25 http://j.mp/hw-questions Tools
  • 26. © Rocana, Inc. All Rights Reserved. | 26 http://j.mp/hw-questions The dark art of data science • Feature engineering • “Getting a mess of raw data that can be used as input to a machine learning algorithm” - @josh_wills • Video from Midwest.io 2014
  • 27. © Rocana, Inc. All Rights Reserved. | 27 http://j.mp/hw-questions Data transformation for all
  • 28. © Rocana, Inc. All Rights Reserved. | 28 http://j.mp/hw-questions Rocana Transform • Library • Java • Rocana configuration • JSON + comments + specific numeric types - excess quoting
  • 29. © Rocana, Inc. All Rights Reserved. | 29 http://j.mp/hw-questions Data model • Event schema • id: A globally unique identifier for this event • ts: Epoch timestamp in milliseconds • event_type_id: ID indicating the type of the event • location: Location from which the event was generated • host: Hostname, IP, or other device identifier from which the event was generated • service: Service or process from which the event was generated • body: Raw event content in bytes • attributes: Event type-specific key/value pairs
  • 30. © Rocana, Inc. All Rights Reserved. | 30 http://j.mp/hw-questions Example event { "id": "JRHAIDMLCKLEAPMIQDHFLO3MXYXV7NVBEJNDKZGS2XVSEINGGBHA====", "event_type_id": 100, "ts": 1436576671000, "location": "aws/us-west-2a", "host": "example01.rocana.com", "service": "dhclient", "body": "<36>Jul 10 18:04:31 gs09.example.com dhclient[865] DHCPACK from …", "attributes": { "syslog_timestamp": "1436576671000", "syslog_process": "dhclient", "syslog_pid": "865", "syslog_facility": "3", "syslog_severity": "6", "syslog_hostname": "example01", "syslog_message": "DHCPACK from 10.10.1.1 (xid=0x5c64bdb0)" } }
  • 31. © Rocana, Inc. All Rights Reserved. | 31 http://j.mp/hw-questions Filter, extract, and flatten
  • 32. © Rocana, Inc. All Rights Reserved. | 32 http://j.mp/hw-questions Filter, extract, and flatten • Filter out events without type id 100 • Filter out events without hostname prefix "ex" • Extract a numeric prefix from the syslog message • Flatten syslog attributes to top-level fields in a different avro schema
  • 33. © Rocana, Inc. All Rights Reserved. | 33 http://j.mp/hw-questions Filter, extract, and flatten { load-event: {}, // Filter by event_type_id filter: { expression: "${event_type_id == 100}" }, // Extract hostname prefix regex: { ... }, filter: { expression: "${host_prefix.match.group.1 == 'ex'}", // Extract a numeric prefix from the syslog message regex: { ... }, // Build flattened record build-avro-record: { ... }, // Accumulate output record accumulate-output: { value: "${output_record}" } }
  • 34. © Rocana, Inc. All Rights Reserved. | 34 http://j.mp/hw-questions Extract hostname prefix { load-event: {}, filter: { expression: "${event_type_id == 100}" }, regex: { pattern: "^(.{2}).*$", value: "${attr.syslog_hostname}", destination: "host_prefix" }, filter: { expression: "${host_prefix.match.group.1 == 'ex'}", ... }
  • 35. © Rocana, Inc. All Rights Reserved. | 35 http://j.mp/hw-questions Extract numeric prefix ... filter: { expression: "${host_prefix.match.group.1 == 'ex'}", regex: { pattern: "^([0-9]*)", value: "${attributes['syslog_message']}", destination: "msg", match-actions: { set-values: { extracted_field: "${msg.match.group.1}" } }, no-match-actions: { set-values: { extracted_field: "" } } }, ...
  • 36. © Rocana, Inc. All Rights Reserved. | 36 http://j.mp/hw-questions Build flattened record ... build-avro-record: { schema-uri: "resource:avro-schemas/flattened-syslog.avsc", destination: "output_record", field-mapping: { ts: "${ts}", event_type_id: "${event_type_id}", source: "${source}", syslog_facility: "${convert:toInt(attributes['syslog_facility'])}", syslog_severity: "${convert:toInt(attributes['syslog_severity'])}", ... syslog_message: "${attributes['syslog_message']}", syslog_pid: "${convert:toInt(attributes['syslog_pid)}", extracted_field: "${extracted_field}" }, }, ...
  • 37. © Rocana, Inc. All Rights Reserved. | 37 http://j.mp/hw-questions Extract metrics from log data
  • 38. © Rocana, Inc. All Rights Reserved. | 38 http://j.mp/hw-questions Extract metrics • Input: HTTP status logs • Extract request latency • Extract counts by HTTP status code • Metric types • Guage: A value that varies over time (think latency, CPU %, etc.) • Counter: A value that accumulates over time (think event volume, status codes, etc.)
  • 39. © Rocana, Inc. All Rights Reserved. | 39 http://j.mp/hw-questions Example metric event { "id": "JRHAIDMLCKLEAPMIQDHFLO3MXBBQ7NVBEJNDKZGS2XVSEINGGBHA====", "event_type_id": 107, "ts": 1436576671000, "location": "aws/us-west-2a", "host": "web01.rocana.com", "service": "httpd", "attributes": { "m.http.request.latency": "4.2000000000E1|g", "m.http.status.401.count": "1.0000000000E0|c", } }
  • 40. © Rocana, Inc. All Rights Reserved. | 40 http://j.mp/hw-questions Extract metrics { load-event: {}, build-metric: { gauge-mapping: { http.request.latency: "${convert:toDouble(attributes['latency'])}" }, destination: "latency_metric" }, accumulate-output: { value: "${latency_metric}" }, build-metric: { dynamic-counter-mapping: [ "${string:format('http.status.%s.count', attributes['sc_status'])}", 1D ], destination: "status_metric" }, accumulate-output: { value: "${status_metric}" } }
  • 41. © Rocana, Inc. All Rights Reserved. | 41 http://j.mp/hw-questions Architecture
  • 42. © Rocana, Inc. All Rights Reserved. | 42 http://j.mp/hw-questions Java action objects Architecture Configuration file Java action objects Context Variables Driver 1. Parse config 2. Initialize context 5. Copy output 3. Execute actions 4. Read/write variables
  • 43. © Rocana, Inc. All Rights Reserved. | 43 http://j.mp/hw-questions Custom actions • Actions loaded at runtime using Java services framework • Add your jar to the classpath • Custom actions appear as top-level keywords just like regular actions • Implement the execute() method of the Action interface • Implement the build() method of the ActionBuilder interface
  • 44. © Rocana, Inc. All Rights Reserved. | 44 http://j.mp/hw-questions Custom actions • Parse custom log formats • Cisco ACS • Citrix • Juniper • Customer-specific formats • Lookup IP addresses in the MaxMind GeoIP2 database • Reference dataset lookups • Device id to device name
  • 45. © Rocana, Inc. All Rights Reserved. | 45 http://j.mp/hw-questions Putting it all together • Stream processing is causing us to re-think how we analyze data • Limiting accessibility of data transformation side increases costs and decreases velocity • Reduce your reliance on developers to code custom pipelines • Re-use transformation configuration in any stream processing framework or batch job
  • 46. © Rocana, Inc. All Rights Reserved. | 46 http://j.mp/hw-questions Coming soon • Rocana transform will be released under the ASL 2.0 • The base configuration library is available today: • https://github.com/scalingdata/rocana-configuration
  • 47. © Rocana, Inc. All Rights Reserved. | 47 http://j.mp/hw-questions Questions? • Signing "Hadoop Security" today at 1pm at the Cloudera booth