SlideShare a Scribd company logo
Hadoop 101 ETL
+ Automation Smackdown
Learning Big Data: 

	 Which approach makes me the most valuable as developer?
Bio - Pete Carapetyan
• Java dev last 15 years, dev 20 years
• Grew up automating in a different industry
• Apparent obsession with systems & automation
• Since 2000 as dataFundamentals, now 2 man shop
Special Skills - Special Snowflakes
• Let me show you these Hadoop & Avro skills.
• Then, we code for the special snowflakes. (data)
• Thus we are more valuable, and can up our bill rates!
• This is Approach #1: Manual or Special Snowflake
My 2013
Manual Hadoop
Story
• 15 ETL jobs [Partial scope]

• Brilliant, ninja level team

• 1 year of competitive NIH* 

copy paste spaghetti coding -
AKA special snowflake
approach

• Not a fun year
*NIH: Not Invented Here
[Demo Basics of ETL Job]
Special Snowflake Approach:	Human drama!
What limitations of this manual 

special skills special snowflakes

approach do we observe?
How To Un-Pack Either Approach?
What if we remove the human drama?
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Now, what happens if we automate?
Automated Approach
Carrie
Our own internal project for
automating big data.



Name inspired by the horror film…
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Also inspired by 

The Phoenix Project
• Results, not drama

• Focus only on bottleneck

• Brent as bottleneck
On Brent
• Brent is a team’s best asset!
Brent is a ninja.

• Brent is my dark side only
when treating every situation
like a special snowflake.

• Brent enjoys the attention.

• Brent is not the drama queen,
others bring the drama to him.
Brent?
Automation Basics
1. Brent spends time on clean
design, not NIH*

• [Camel] - Integration Server
2. Brent automates the rule,
codes the exception

• Apply metadata to templates
• Automated VM dev infrastructure
* NIH: Not Invented Here
Demo Clean
• Clean project folder

• Clean hadoop file system

• Clean hadoop DDL
https://www.youtube.com/watch?v=qR7XTzv5P_M&index=2&list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
Later Demo
Integration Server
• Raw linux OS (Centos)

• Java

• Maven

• Ruby

• networking

• maven repo - binaries

• [created with vagrant]
https://www.youtube.com/watch?v=xgheERvulqw&index=3&list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
Demo Metadata
Collection
• Simple properties

• Collected using a cheesy UI

• UI written in Ruby
Demo Generated
Code
• Camel ETL binary

• OSGi, versioned, modular jar

• Only 3 primary outputs!

• simple

• clean

• well designed (?)

• JUnit/integration tested

• Supporting scripting

• messy
Demo Server Deploy
• One line deploy/run command

• Compiles on server with Maven

• Also runnable as jar
Does it work?
• Make custom file

• Drop into ETL folder

• Inspect
Demo - Review
• Schema created

• DDL run

• Avro binary (JSON) transform

• Data Migration

• FTP to server

• Into HDFS partition

• Alter Table: Date Partition
Transform to Avro
• Not detailed in this talk

• Demo’d here as a binary

• Code listed at end of talk
Modular Binaries
• Each ETL

• Own binary, OSGi

• Own codebase

• Fully versioned

• Fully customizable after
generation

• Runs alone or as part of Camel
container(s)

• Tests on build

• Contains own supporting
scripts
Takeaways
• Brent coding the exception manually, rule by template.
• Brent has time to focus on design.
• Brent may lose some amount of desired attention :(
• Resulting code is
• clean
• consistent, easy to maintain
• But is there a Home Run?
• defined as not possible via special snowflake approach
Home Run 1: Infrastructure As Code Demo
• [Jeff]
Home Run 2: Big Data, Beyond Hadoop!
1. Pick your provider
• Hadoop
• Cassandra
• Couchbase
• etc
2. Adopt your templates,
VMs, etc
Home Run 3: Idempotent Effort
• Idempotent effort? Each subsequent run doesn’t have bad effect.
• Walkup - The 10 minute test
• Walkaway - Requirements
• Features
• Testing, technical debt, already in place for code
• VMs and recipes for dev, test, prod
• OSGi etc modularity for binaries
• Does what we see here pass this test?
What to leave with
• De-mystify: how to Avro/Hadoop a delimited file
• Review motives for automating this process
• Code automation basics
• Infrastructure automation basics
• Code for above
Further Hadoop Tutuorial Resources
• Hortonworks
• best free stuff? Except networking vas
• Cloudera
• Lots but appear to prefer to get paid
• Apache Hadoop
• haven’t tried but it is Apache
Wish To See More?
• In office demos
• Your data
Code, Content, Contacts
• This Slide Deck: http://www.slideshare.net/datafundamentals/hadoop-big-data-35762308
• or just remember slideshare.net/datafundamentals it may be the only one there
• Youtube - 11 minute version of code demo - https://www.youtube.com/playlist?list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
• Dev Code
• Carrie (ruby UI and generator) https://github.com/datafundamentals/df_ui_carrie
• Avro from delimited https://bitbucket.org/datafundamentals/avro_from_delimited
• Camel-Avro https://bitbucket.org/datafundamentals/camel-avro-etl
• Ops Code - cookbook recipes
• https://github.com/datafundamentals
• Contact
• pete@datafundamentals.com, jeff@datafundamentals.com
Be careful out there!
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014

More Related Content

What's hot (20)

Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Fwdays
 
Write Once, Run Everywhere - Ember.js Munich
Write Once, Run Everywhere - Ember.js MunichWrite Once, Run Everywhere - Ember.js Munich
Write Once, Run Everywhere - Ember.js Munich
Mike North
 
Cloud tools
Cloud toolsCloud tools
Cloud tools
John McCaffrey
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
Reuven Lerner
 
Integration Testing with Selenium
Integration Testing with SeleniumIntegration Testing with Selenium
Integration Testing with Selenium
All Things Open
 
Webcomponents are your frameworks best friend
Webcomponents are your frameworks best friendWebcomponents are your frameworks best friend
Webcomponents are your frameworks best friend
Filip Bruun Bech-Larsen
 
Frameworks and webcomponents
Frameworks and webcomponentsFrameworks and webcomponents
Frameworks and webcomponents
Filip Bruun Bech-Larsen
 
淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2
Wen-Tien Chang
 
CI/CD at bol.com
CI/CD at bol.comCI/CD at bol.com
CI/CD at bol.com
Maarten Dirkse
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
Supakit Kiatrungrit
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuning
John McCaffrey
 
Coscup
CoscupCoscup
Coscup
Giivee The
 
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
bol.com Dutch Container Day presentation
bol.com Dutch Container Day presentationbol.com Dutch Container Day presentation
bol.com Dutch Container Day presentation
Maarten Dirkse
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
Avi Kedar
 
Python to go
Python to goPython to go
Python to go
Weng Wei
 
Freelancing and side-projects on Rails
Freelancing and side-projects on RailsFreelancing and side-projects on Rails
Freelancing and side-projects on Rails
John McCaffrey
 
User-percieved performance
User-percieved performanceUser-percieved performance
User-percieved performance
Mike North
 
Capybara + RSpec - ruby dsl-based web ui qa automation
Capybara + RSpec - ruby dsl-based web ui qa automationCapybara + RSpec - ruby dsl-based web ui qa automation
Capybara + RSpec - ruby dsl-based web ui qa automation
COMAQA.BY
 
Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8
Derek Jacoby
 
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Fwdays
 
Write Once, Run Everywhere - Ember.js Munich
Write Once, Run Everywhere - Ember.js MunichWrite Once, Run Everywhere - Ember.js Munich
Write Once, Run Everywhere - Ember.js Munich
Mike North
 
Integration Testing with Selenium
Integration Testing with SeleniumIntegration Testing with Selenium
Integration Testing with Selenium
All Things Open
 
Webcomponents are your frameworks best friend
Webcomponents are your frameworks best friendWebcomponents are your frameworks best friend
Webcomponents are your frameworks best friend
Filip Bruun Bech-Larsen
 
淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2
Wen-Tien Chang
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuning
John McCaffrey
 
bol.com Dutch Container Day presentation
bol.com Dutch Container Day presentationbol.com Dutch Container Day presentation
bol.com Dutch Container Day presentation
Maarten Dirkse
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
Avi Kedar
 
Python to go
Python to goPython to go
Python to go
Weng Wei
 
Freelancing and side-projects on Rails
Freelancing and side-projects on RailsFreelancing and side-projects on Rails
Freelancing and side-projects on Rails
John McCaffrey
 
User-percieved performance
User-percieved performanceUser-percieved performance
User-percieved performance
Mike North
 
Capybara + RSpec - ruby dsl-based web ui qa automation
Capybara + RSpec - ruby dsl-based web ui qa automationCapybara + RSpec - ruby dsl-based web ui qa automation
Capybara + RSpec - ruby dsl-based web ui qa automation
COMAQA.BY
 
Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8
Derek Jacoby
 

Viewers also liked (15)

Pril 1
Pril 1Pril 1
Pril 1
mishytka
 
ใบความรู้ กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
ใบความรู้  กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1pageใบความรู้  กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
ใบความรู้ กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
Prachoom Rangkasikorn
 
2 diarecreacionalcomfandi10 1
2 diarecreacionalcomfandi10 12 diarecreacionalcomfandi10 1
2 diarecreacionalcomfandi10 1
Heimer Perez
 
Institute of Clinical Research India
Institute of Clinical Research IndiaInstitute of Clinical Research India
Institute of Clinical Research India
Institute of Clinical Research India
 
гимназисты 5 б
гимназисты 5 бгимназисты 5 б
гимназисты 5 б
Olga Gorbenko
 
Rassegna Stampa2_AZ Holding
Rassegna Stampa2_AZ HoldingRassegna Stampa2_AZ Holding
Rassegna Stampa2_AZ Holding
Carmine Evangelista
 
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Innovation Enterprise
 
Resume
ResumeResume
Resume
mustuprince
 
Pedestriantv media-kit-2013
Pedestriantv media-kit-2013Pedestriantv media-kit-2013
Pedestriantv media-kit-2013
Samantha Anderson
 
Year 9
Year 9Year 9
Year 9
hodder
 
Web design winter start
Web design  winter startWeb design  winter start
Web design winter start
Konrad Roeder
 
Affordable e waste recycling for the marketing agencies of sydney
Affordable e waste recycling for the marketing agencies of sydneyAffordable e waste recycling for the marketing agencies of sydney
Affordable e waste recycling for the marketing agencies of sydney
smtwastebrokers
 
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati
 
ใบความรู้ กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
ใบความรู้  กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1pageใบความรู้  กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
ใบความรู้ กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
Prachoom Rangkasikorn
 
2 diarecreacionalcomfandi10 1
2 diarecreacionalcomfandi10 12 diarecreacionalcomfandi10 1
2 diarecreacionalcomfandi10 1
Heimer Perez
 
гимназисты 5 б
гимназисты 5 бгимназисты 5 б
гимназисты 5 б
Olga Gorbenko
 
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Innovation Enterprise
 
Year 9
Year 9Year 9
Year 9
hodder
 
Web design winter start
Web design  winter startWeb design  winter start
Web design winter start
Konrad Roeder
 
Affordable e waste recycling for the marketing agencies of sydney
Affordable e waste recycling for the marketing agencies of sydneyAffordable e waste recycling for the marketing agencies of sydney
Affordable e waste recycling for the marketing agencies of sydney
smtwastebrokers
 
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati
 

Similar to Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014 (20)

Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael Collins
Devopsdays
 
Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®
Hannes Lowette
 
Automated Acceptance Testing from Scratch
Automated Acceptance Testing from ScratchAutomated Acceptance Testing from Scratch
Automated Acceptance Testing from Scratch
Excella
 
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's WorkbenchAugust Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
Howard Greenberg
 
Setting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce AppsSetting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce Apps
Daniel Stange
 
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirstBuilding CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Jun-ichi Sakamoto
 
Steamlining your puppet development workflow
Steamlining your puppet development workflowSteamlining your puppet development workflow
Steamlining your puppet development workflow
Tomas Doran
 
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet
 
Open stack jobs avoiding the axe
Open stack jobs   avoiding the axeOpen stack jobs   avoiding the axe
Open stack jobs avoiding the axe
Jim Leitch
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast Websites
Jonathan Klein
 
Simplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query ToolSimplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query Tool
DataWorks Summit
 
Stackato
StackatoStackato
Stackato
Jonas Brømsø
 
From Heroku to Amazon AWS
From Heroku to Amazon AWSFrom Heroku to Amazon AWS
From Heroku to Amazon AWS
Sebastian Schleicher
 
Test Automation with Twist and Sahi
Test Automation with Twist and SahiTest Automation with Twist and Sahi
Test Automation with Twist and Sahi
ericjamesblackburn
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
Kelly Looney
 
Continuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneContinuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at Dashlane
Dashlane
 
Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.
Junichi Ishida
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
ConSol Consulting & Solutions Software GmbH
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
ConSol Consulting & Solutions Software GmbH
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Kelly Looney
 
Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael Collins
Devopsdays
 
Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®
Hannes Lowette
 
Automated Acceptance Testing from Scratch
Automated Acceptance Testing from ScratchAutomated Acceptance Testing from Scratch
Automated Acceptance Testing from Scratch
Excella
 
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's WorkbenchAugust Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
Howard Greenberg
 
Setting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce AppsSetting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce Apps
Daniel Stange
 
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirstBuilding CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Jun-ichi Sakamoto
 
Steamlining your puppet development workflow
Steamlining your puppet development workflowSteamlining your puppet development workflow
Steamlining your puppet development workflow
Tomas Doran
 
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet
 
Open stack jobs avoiding the axe
Open stack jobs   avoiding the axeOpen stack jobs   avoiding the axe
Open stack jobs avoiding the axe
Jim Leitch
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast Websites
Jonathan Klein
 
Simplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query ToolSimplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query Tool
DataWorks Summit
 
Test Automation with Twist and Sahi
Test Automation with Twist and SahiTest Automation with Twist and Sahi
Test Automation with Twist and Sahi
ericjamesblackburn
 
Continuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneContinuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at Dashlane
Dashlane
 
Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.
Junichi Ishida
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
ConSol Consulting & Solutions Software GmbH
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
ConSol Consulting & Solutions Software GmbH
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Kelly Looney
 

Recently uploaded (20)

Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 

Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014

  • 1. Hadoop 101 ETL + Automation Smackdown Learning Big Data: Which approach makes me the most valuable as developer?
  • 2. Bio - Pete Carapetyan • Java dev last 15 years, dev 20 years • Grew up automating in a different industry • Apparent obsession with systems & automation • Since 2000 as dataFundamentals, now 2 man shop
  • 3. Special Skills - Special Snowflakes • Let me show you these Hadoop & Avro skills. • Then, we code for the special snowflakes. (data) • Thus we are more valuable, and can up our bill rates! • This is Approach #1: Manual or Special Snowflake
  • 4. My 2013 Manual Hadoop Story • 15 ETL jobs [Partial scope] • Brilliant, ninja level team • 1 year of competitive NIH* 
 copy paste spaghetti coding - AKA special snowflake approach • Not a fun year *NIH: Not Invented Here
  • 5. [Demo Basics of ETL Job]
  • 6. Special Snowflake Approach: Human drama! What limitations of this manual 
 special skills special snowflakes
 approach do we observe?
  • 7. How To Un-Pack Either Approach? What if we remove the human drama?
  • 14. Now, what happens if we automate? Automated Approach
  • 15. Carrie Our own internal project for automating big data.
 
 Name inspired by the horror film…
  • 17. Also inspired by 
 The Phoenix Project • Results, not drama • Focus only on bottleneck • Brent as bottleneck
  • 18. On Brent • Brent is a team’s best asset! Brent is a ninja. • Brent is my dark side only when treating every situation like a special snowflake. • Brent enjoys the attention. • Brent is not the drama queen, others bring the drama to him. Brent?
  • 19. Automation Basics 1. Brent spends time on clean design, not NIH* • [Camel] - Integration Server 2. Brent automates the rule, codes the exception • Apply metadata to templates • Automated VM dev infrastructure * NIH: Not Invented Here
  • 20. Demo Clean • Clean project folder • Clean hadoop file system • Clean hadoop DDL https://www.youtube.com/watch?v=qR7XTzv5P_M&index=2&list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
  • 21. Later Demo Integration Server • Raw linux OS (Centos) • Java • Maven • Ruby • networking • maven repo - binaries • [created with vagrant] https://www.youtube.com/watch?v=xgheERvulqw&index=3&list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
  • 22. Demo Metadata Collection • Simple properties • Collected using a cheesy UI • UI written in Ruby
  • 23. Demo Generated Code • Camel ETL binary • OSGi, versioned, modular jar • Only 3 primary outputs! • simple • clean • well designed (?) • JUnit/integration tested • Supporting scripting • messy
  • 24. Demo Server Deploy • One line deploy/run command • Compiles on server with Maven • Also runnable as jar
  • 25. Does it work? • Make custom file • Drop into ETL folder • Inspect
  • 26. Demo - Review • Schema created • DDL run • Avro binary (JSON) transform • Data Migration • FTP to server • Into HDFS partition • Alter Table: Date Partition
  • 27. Transform to Avro • Not detailed in this talk • Demo’d here as a binary • Code listed at end of talk
  • 28. Modular Binaries • Each ETL • Own binary, OSGi • Own codebase • Fully versioned • Fully customizable after generation • Runs alone or as part of Camel container(s) • Tests on build • Contains own supporting scripts
  • 29. Takeaways • Brent coding the exception manually, rule by template. • Brent has time to focus on design. • Brent may lose some amount of desired attention :( • Resulting code is • clean • consistent, easy to maintain • But is there a Home Run? • defined as not possible via special snowflake approach
  • 30. Home Run 1: Infrastructure As Code Demo • [Jeff]
  • 31. Home Run 2: Big Data, Beyond Hadoop! 1. Pick your provider • Hadoop • Cassandra • Couchbase • etc 2. Adopt your templates, VMs, etc
  • 32. Home Run 3: Idempotent Effort • Idempotent effort? Each subsequent run doesn’t have bad effect. • Walkup - The 10 minute test • Walkaway - Requirements • Features • Testing, technical debt, already in place for code • VMs and recipes for dev, test, prod • OSGi etc modularity for binaries • Does what we see here pass this test?
  • 33. What to leave with • De-mystify: how to Avro/Hadoop a delimited file • Review motives for automating this process • Code automation basics • Infrastructure automation basics • Code for above
  • 34. Further Hadoop Tutuorial Resources • Hortonworks • best free stuff? Except networking vas • Cloudera • Lots but appear to prefer to get paid • Apache Hadoop • haven’t tried but it is Apache
  • 35. Wish To See More? • In office demos • Your data
  • 36. Code, Content, Contacts • This Slide Deck: http://www.slideshare.net/datafundamentals/hadoop-big-data-35762308 • or just remember slideshare.net/datafundamentals it may be the only one there • Youtube - 11 minute version of code demo - https://www.youtube.com/playlist?list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe • Dev Code • Carrie (ruby UI and generator) https://github.com/datafundamentals/df_ui_carrie • Avro from delimited https://bitbucket.org/datafundamentals/avro_from_delimited • Camel-Avro https://bitbucket.org/datafundamentals/camel-avro-etl • Ops Code - cookbook recipes • https://github.com/datafundamentals • Contact • [email protected], [email protected] Be careful out there!