SlideShare a Scribd company logo
LEARN • NETWORK • COLLABORATE • INFLUENCE
LEARN • NETWORK • COLLABORATE • INFLUENCE
Deploying Big Data platforms
LEARN • NETWORK • COLLABORATE • INFLUENCE
Chris Kernaghan
Principal Consultant
LEARN • NETWORK • COLLABORATE • INFLUENCE
Cholera epidemic first use of big data
LEARN • NETWORK • COLLABORATE • INFLUENCE
Big Data Epidemiology by Google
LEARN • NETWORK • COLLABORATE • INFLUENCE
How I really got started in Big Data
John, we need
to give Chris
more grey hair
Let’s throw him
into a Big Data
demo
LEARN • NETWORK • COLLABORATE • INFLUENCE
My examples
LEARN • NETWORK • COLLABORATE • INFLUENCE
LEARN • NETWORK • COLLABORATE • INFLUENCE
Areas of focus
Data acquisition
and curation
Data storage Compute
infrastructure
Analysis and
Insight
Everything as Code*
* Well As much as possible
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data Acquisition and curation
Areas of focus
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data Lake
HANA
LEARN • NETWORK • COLLABORATE • INFLUENCE
How big was the Panama Papers data set
LEARN • NETWORK • COLLABORATE • INFLUENCE
How big was the Panama Papers data set
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data Lake
Panama Papers Technology stack
SQL
LEARN • NETWORK • COLLABORATE • INFLUENCE
The tools used supported 370 journalists from
around the world
Infrastructure
was a pool of
up to 40
servers run in
AWS
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data quality and curation are not one time activities
Remove the human element as much as possible
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data security
• Data lake
– What data do you collect
– Do you have restrictions on what data can be combined
– How long does your data live
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data security
• Geographical concerns
– Where does your data reside
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data security
• Authentication
– Who is accessing your data
LEARN • NETWORK • COLLABORATE • INFLUENCE
Data Storage
Areas of focus
LEARN • NETWORK • COLLABORATE • INFLUENCE
How BIG is Big Data
LEARN • NETWORK • COLLABORATE • INFLUENCE
LEARN • NETWORK • COLLABORATE • INFLUENCE
Storage Considerations
• IOPS are still important
– Big data still uses a lot of spinning disk
• Replication and Redundancy
– Eats a lot of disk space
• Build for failure
• Sometimes you have to go in-memory
LEARN • NETWORK • COLLABORATE • INFLUENCE
Compute infrastructure
Areas of focus
LEARN • NETWORK • COLLABORATE • INFLUENCE
Structured Reporting Versus Big Data/Science
Compute requirements
2
• Structured reporting systems run business processes
– Sized and static
– Under change control
– Business centric
LEARN • NETWORK • COLLABORATE • INFLUENCE
Structured Reporting Versus Big Data/Science
Compute requirements
2
• Data science systems answer difficult questions irregularly
– Cloud or heavy use of virtualisation
– Developer centric
– Rapidly evolving
LEARN • NETWORK • COLLABORATE • INFLUENCE
What you still need to remember
2
• Compute is cheap
• Scalability is critical
LEARN • NETWORK • COLLABORATE • INFLUENCE
What you still need to remember
2
• Software definition for consistency
• Automate as much as possible
LEARN • NETWORK • COLLABORATE • INFLUENCE
2
100 Hadoop
Nodes
122GB RAM
Each = 12.2TB RAM
Build time of 3Hrs
LEARN • NETWORK • COLLABORATE • INFLUENCE
Use of scripted builds from VM to application
2
Disk definition
Network
defintion
Software
Install
LEARN • NETWORK • COLLABORATE • INFLUENCE
Use of scripted builds from VM to application
3
• Deployment was consistent for each and every node of the
cluster
– Hostnames defined the same way
– Configuration files created the same way
LEARN • NETWORK • COLLABORATE • INFLUENCE
Use of scripted builds from VM to application
3
• Faster deployment
– Automated build 3hrs to build and deploy 100 nodes
– Manual build 800hrs + to build and deploy 100 nodes
• Use of automated tools to detect failure and start new node
(ElasticBeanstalk)
LEARN • NETWORK • COLLABORATE • INFLUENCE
Use of scripted builds from VM to application
3
• Reusability of script
– Heavy use of parameters means it is adaptable
• Use of Git meant distributed development was handled easily
LEARN • NETWORK • COLLABORATE • INFLUENCE
LEARN • NETWORK • COLLABORATE • INFLUENCE
Analysis and Insight
3
Areas of focus
Presentation Tag Line
LEARN • NETWORK • COLLABORATE • INFLUENCE
Query the Data
• Programmatically
– Python
– R
• Application
– Lumira
– Business Objects
– Spark
– SQL
– Excel
– ElasticSearch
LEARN • NETWORK • COLLABORATE • INFLUENCE
Analysis and Visualisation
• Quick Analysis
– Lumira, Excel
• Graph
– Neo4J, Synerscope
• Charts
– Business Objects, Grafana, Kibana
• Dynamic
– D3
http://www.wikiviz.org/wiki/Tools
LEARN • NETWORK • COLLABORATE • INFLUENCE
Things to remember
• Remember the type of
platform you are using
• Storage is cheap but not
all storage is equal
• Scalability is critical
• Version control rocks
• Automate everything
you can
• Value is in the data but
not all data is valuable
• Data should not live
forever
LEARN • NETWORK • COLLABORATE • INFLUENCE
3
• Key Takeways
LEARN • NETWORK • COLLABORATE • INFLUENCE

More Related Content

What's hot (20)

How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
DataWorks Summit
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 
Enterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to SuccessEnterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to Success
Hortonworks
 
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
DataStax Academy
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
DataWorks Summit/Hadoop Summit
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Databricks
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
DataWorks Summit
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
Complex Data Transformations Made Easy
Complex Data Transformations Made EasyComplex Data Transformations Made Easy
Complex Data Transformations Made Easy
Data Con LA
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
Mark Rittman
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
In-Memory Computing Summit
 
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
DataStax Academy
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
Guido Schmutz
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
Juantomás García Molina
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJData Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
DataWorks Summit
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 
Enterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to SuccessEnterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to Success
Hortonworks
 
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
DataStax Academy
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Databricks
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
DataWorks Summit
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
Complex Data Transformations Made Easy
Complex Data Transformations Made EasyComplex Data Transformations Made Easy
Complex Data Transformations Made Easy
Data Con LA
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
Mark Rittman
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
In-Memory Computing Summit
 
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
DataStax Academy
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
Guido Schmutz
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJData Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 

Similar to Deploying Big Data Platforms (20)

Big Data overview
Big Data overviewBig Data overview
Big Data overview
alexisroos
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Roi Blanco
 
Big Data World
Big Data WorldBig Data World
Big Data World
Hossein Zahed
 
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhuPPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
bhushanshashi818
 
selected topics in CS-CHaaapteerobe.pptx
selected topics in CS-CHaaapteerobe.pptxselected topics in CS-CHaaapteerobe.pptx
selected topics in CS-CHaaapteerobe.pptx
BachaLamessaa
 
Unit 1
Unit 1Unit 1
Unit 1
vishal choudhary
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
 
Big Data Analytics M1.pdf big data analytics
Big Data Analytics M1.pdf big data analyticsBig Data Analytics M1.pdf big data analytics
Big Data Analytics M1.pdf big data analytics
nithishlkumar9194
 
basic of data science and big data......
basic of data science and big data......basic of data science and big data......
basic of data science and big data......
anjanasharma77573
 
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
Be3 experimentingbigdatainabox-part1:comprehendingthescenarioBe3 experimentingbigdatainabox-part1:comprehendingthescenario
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
Kalyana Chakravarthy Kadiyala
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Business with Big data
Business with Big dataBusiness with Big data
Business with Big data
Bruno Curtarelli
 
Infographics and big data
Infographics and big dataInfographics and big data
Infographics and big data
Hanna-Liisa Pender
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
Indu Khemchandani
 
Mike keating - News Int - 18th BDL meetup
Mike keating - News Int - 18th BDL meetupMike keating - News Int - 18th BDL meetup
Mike keating - News Int - 18th BDL meetup
bigdatalondon
 
BDA UNIT 1big data – web analytics – big data applications– big data technolo...
BDA UNIT 1big data – web analytics – big data applications– big data technolo...BDA UNIT 1big data – web analytics – big data applications– big data technolo...
BDA UNIT 1big data – web analytics – big data applications– big data technolo...
BalachandarJ5
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
Padma Metta
 
Big Data overview
Big Data overviewBig Data overview
Big Data overview
alexisroos
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Roi Blanco
 
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhuPPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
bhushanshashi818
 
selected topics in CS-CHaaapteerobe.pptx
selected topics in CS-CHaaapteerobe.pptxselected topics in CS-CHaaapteerobe.pptx
selected topics in CS-CHaaapteerobe.pptx
BachaLamessaa
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
 
Big Data Analytics M1.pdf big data analytics
Big Data Analytics M1.pdf big data analyticsBig Data Analytics M1.pdf big data analytics
Big Data Analytics M1.pdf big data analytics
nithishlkumar9194
 
basic of data science and big data......
basic of data science and big data......basic of data science and big data......
basic of data science and big data......
anjanasharma77573
 
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
Be3 experimentingbigdatainabox-part1:comprehendingthescenarioBe3 experimentingbigdatainabox-part1:comprehendingthescenario
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
Kalyana Chakravarthy Kadiyala
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
Indu Khemchandani
 
Mike keating - News Int - 18th BDL meetup
Mike keating - News Int - 18th BDL meetupMike keating - News Int - 18th BDL meetup
Mike keating - News Int - 18th BDL meetup
bigdatalondon
 
BDA UNIT 1big data – web analytics – big data applications– big data technolo...
BDA UNIT 1big data – web analytics – big data applications– big data technolo...BDA UNIT 1big data – web analytics – big data applications– big data technolo...
BDA UNIT 1big data – web analytics – big data applications– big data technolo...
BalachandarJ5
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
Padma Metta
 
Ad

More from Chris Kernaghan (16)

DevOps for SAP customers
DevOps for SAP customersDevOps for SAP customers
DevOps for SAP customers
Chris Kernaghan
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
Chris Kernaghan
 
Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)
Chris Kernaghan
 
Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017
Chris Kernaghan
 
Beginners HANA
Beginners HANABeginners HANA
Beginners HANA
Chris Kernaghan
 
Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)
Chris Kernaghan
 
Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapes
Chris Kernaghan
 
Quick and dirty performance analysis
Quick and dirty performance analysisQuick and dirty performance analysis
Quick and dirty performance analysis
Chris Kernaghan
 
HANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANAHANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANA
Chris Kernaghan
 
Cloud or On Premise
Cloud or On PremiseCloud or On Premise
Cloud or On Premise
Chris Kernaghan
 
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
Chris Kernaghan
 
Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213
Chris Kernaghan
 
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deploymentsSAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
Chris Kernaghan
 
SAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environmentSAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environment
Chris Kernaghan
 
01 sap hana landscape and operations infrastructure v2 0
01  sap hana landscape and operations infrastructure v2 001  sap hana landscape and operations infrastructure v2 0
01 sap hana landscape and operations infrastructure v2 0
Chris Kernaghan
 
Sapuki sig 2013
Sapuki sig 2013Sapuki sig 2013
Sapuki sig 2013
Chris Kernaghan
 
DevOps for SAP customers
DevOps for SAP customersDevOps for SAP customers
DevOps for SAP customers
Chris Kernaghan
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
Chris Kernaghan
 
Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)
Chris Kernaghan
 
Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017
Chris Kernaghan
 
Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)
Chris Kernaghan
 
Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapes
Chris Kernaghan
 
Quick and dirty performance analysis
Quick and dirty performance analysisQuick and dirty performance analysis
Quick and dirty performance analysis
Chris Kernaghan
 
HANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANAHANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANA
Chris Kernaghan
 
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
Chris Kernaghan
 
Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213
Chris Kernaghan
 
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deploymentsSAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
Chris Kernaghan
 
SAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environmentSAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environment
Chris Kernaghan
 
01 sap hana landscape and operations infrastructure v2 0
01  sap hana landscape and operations infrastructure v2 001  sap hana landscape and operations infrastructure v2 0
01 sap hana landscape and operations infrastructure v2 0
Chris Kernaghan
 
Ad

Recently uploaded (20)

Trends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary MeekerTrends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary Meeker
Clive Dickens
 
Trends Report: Artificial Intelligence (AI)
Trends Report: Artificial Intelligence (AI)Trends Report: Artificial Intelligence (AI)
Trends Report: Artificial Intelligence (AI)
Brian Ahier
 
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to KnowWhat is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
SMACT Works
 
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfHow Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
Rejig Digital
 
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Scott M. Graffius
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Extend-Microsoft365-with-Copilot-agents.pptx
Extend-Microsoft365-with-Copilot-agents.pptxExtend-Microsoft365-with-Copilot-agents.pptx
Extend-Microsoft365-with-Copilot-agents.pptx
hoang971
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Introduction to Typescript - GDG On Campus EUE
Introduction to Typescript - GDG On Campus EUEIntroduction to Typescript - GDG On Campus EUE
Introduction to Typescript - GDG On Campus EUE
Google Developer Group On Campus European Universities in Egypt
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
DevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical PodcastDevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical Podcast
Chris Wahl
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
How to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptxHow to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptx
Version 1 Analytics
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
If You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FMEIf You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FME
Safe Software
 
Trends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary MeekerTrends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary Meeker
Clive Dickens
 
Trends Report: Artificial Intelligence (AI)
Trends Report: Artificial Intelligence (AI)Trends Report: Artificial Intelligence (AI)
Trends Report: Artificial Intelligence (AI)
Brian Ahier
 
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to KnowWhat is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
SMACT Works
 
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfHow Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
Rejig Digital
 
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Mark Zuckerberg teams up with frenemy Palmer Luckey to shape the future of XR...
Scott M. Graffius
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Extend-Microsoft365-with-Copilot-agents.pptx
Extend-Microsoft365-with-Copilot-agents.pptxExtend-Microsoft365-with-Copilot-agents.pptx
Extend-Microsoft365-with-Copilot-agents.pptx
hoang971
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
DevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical PodcastDevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical Podcast
Chris Wahl
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
How to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptxHow to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptx
Version 1 Analytics
 
If You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FMEIf You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FME
Safe Software
 

Deploying Big Data Platforms

  • 1. LEARN • NETWORK • COLLABORATE • INFLUENCE
  • 2. LEARN • NETWORK • COLLABORATE • INFLUENCE Deploying Big Data platforms LEARN • NETWORK • COLLABORATE • INFLUENCE Chris Kernaghan Principal Consultant
  • 3. LEARN • NETWORK • COLLABORATE • INFLUENCE Cholera epidemic first use of big data
  • 4. LEARN • NETWORK • COLLABORATE • INFLUENCE Big Data Epidemiology by Google
  • 5. LEARN • NETWORK • COLLABORATE • INFLUENCE How I really got started in Big Data John, we need to give Chris more grey hair Let’s throw him into a Big Data demo
  • 6. LEARN • NETWORK • COLLABORATE • INFLUENCE My examples
  • 7. LEARN • NETWORK • COLLABORATE • INFLUENCE
  • 8. LEARN • NETWORK • COLLABORATE • INFLUENCE Areas of focus Data acquisition and curation Data storage Compute infrastructure Analysis and Insight Everything as Code* * Well As much as possible
  • 9. LEARN • NETWORK • COLLABORATE • INFLUENCE Data Acquisition and curation Areas of focus
  • 10. LEARN • NETWORK • COLLABORATE • INFLUENCE Data Lake HANA
  • 11. LEARN • NETWORK • COLLABORATE • INFLUENCE How big was the Panama Papers data set
  • 12. LEARN • NETWORK • COLLABORATE • INFLUENCE How big was the Panama Papers data set
  • 13. LEARN • NETWORK • COLLABORATE • INFLUENCE Data Lake Panama Papers Technology stack SQL
  • 14. LEARN • NETWORK • COLLABORATE • INFLUENCE The tools used supported 370 journalists from around the world Infrastructure was a pool of up to 40 servers run in AWS
  • 15. LEARN • NETWORK • COLLABORATE • INFLUENCE Data quality and curation are not one time activities Remove the human element as much as possible
  • 16. LEARN • NETWORK • COLLABORATE • INFLUENCE Data security • Data lake – What data do you collect – Do you have restrictions on what data can be combined – How long does your data live
  • 17. LEARN • NETWORK • COLLABORATE • INFLUENCE Data security • Geographical concerns – Where does your data reside
  • 18. LEARN • NETWORK • COLLABORATE • INFLUENCE Data security • Authentication – Who is accessing your data
  • 19. LEARN • NETWORK • COLLABORATE • INFLUENCE Data Storage Areas of focus
  • 20. LEARN • NETWORK • COLLABORATE • INFLUENCE How BIG is Big Data
  • 21. LEARN • NETWORK • COLLABORATE • INFLUENCE
  • 22. LEARN • NETWORK • COLLABORATE • INFLUENCE Storage Considerations • IOPS are still important – Big data still uses a lot of spinning disk • Replication and Redundancy – Eats a lot of disk space • Build for failure • Sometimes you have to go in-memory
  • 23. LEARN • NETWORK • COLLABORATE • INFLUENCE Compute infrastructure Areas of focus
  • 24. LEARN • NETWORK • COLLABORATE • INFLUENCE Structured Reporting Versus Big Data/Science Compute requirements 2 • Structured reporting systems run business processes – Sized and static – Under change control – Business centric
  • 25. LEARN • NETWORK • COLLABORATE • INFLUENCE Structured Reporting Versus Big Data/Science Compute requirements 2 • Data science systems answer difficult questions irregularly – Cloud or heavy use of virtualisation – Developer centric – Rapidly evolving
  • 26. LEARN • NETWORK • COLLABORATE • INFLUENCE What you still need to remember 2 • Compute is cheap • Scalability is critical
  • 27. LEARN • NETWORK • COLLABORATE • INFLUENCE What you still need to remember 2 • Software definition for consistency • Automate as much as possible
  • 28. LEARN • NETWORK • COLLABORATE • INFLUENCE 2 100 Hadoop Nodes 122GB RAM Each = 12.2TB RAM Build time of 3Hrs
  • 29. LEARN • NETWORK • COLLABORATE • INFLUENCE Use of scripted builds from VM to application 2 Disk definition Network defintion Software Install
  • 30. LEARN • NETWORK • COLLABORATE • INFLUENCE Use of scripted builds from VM to application 3 • Deployment was consistent for each and every node of the cluster – Hostnames defined the same way – Configuration files created the same way
  • 31. LEARN • NETWORK • COLLABORATE • INFLUENCE Use of scripted builds from VM to application 3 • Faster deployment – Automated build 3hrs to build and deploy 100 nodes – Manual build 800hrs + to build and deploy 100 nodes • Use of automated tools to detect failure and start new node (ElasticBeanstalk)
  • 32. LEARN • NETWORK • COLLABORATE • INFLUENCE Use of scripted builds from VM to application 3 • Reusability of script – Heavy use of parameters means it is adaptable • Use of Git meant distributed development was handled easily
  • 33. LEARN • NETWORK • COLLABORATE • INFLUENCE
  • 34. LEARN • NETWORK • COLLABORATE • INFLUENCE Analysis and Insight 3 Areas of focus Presentation Tag Line
  • 35. LEARN • NETWORK • COLLABORATE • INFLUENCE Query the Data • Programmatically – Python – R • Application – Lumira – Business Objects – Spark – SQL – Excel – ElasticSearch
  • 36. LEARN • NETWORK • COLLABORATE • INFLUENCE Analysis and Visualisation • Quick Analysis – Lumira, Excel • Graph – Neo4J, Synerscope • Charts – Business Objects, Grafana, Kibana • Dynamic – D3 http://www.wikiviz.org/wiki/Tools
  • 37. LEARN • NETWORK • COLLABORATE • INFLUENCE Things to remember • Remember the type of platform you are using • Storage is cheap but not all storage is equal • Scalability is critical • Version control rocks • Automate everything you can • Value is in the data but not all data is valuable • Data should not live forever
  • 38. LEARN • NETWORK • COLLABORATE • INFLUENCE 3 • Key Takeways
  • 39. LEARN • NETWORK • COLLABORATE • INFLUENCE

Editor's Notes

  • #4: John Snow London
  • #5: 2008 H1N1 flu pandemic in US CDC had out of date data
  • #7: Panama papers – transient use case Under Armour – constant data use case answering lots of different questions Common Sense Finance institution – transient audit data use case Natures Hope – Pushing structured data into data lake to provide better temperate control as part of their data lifecycle Intel – using event streaming to drive manufacturing processes
  • #10: We are literally drowning in data – data lakes What data do we acquire – sensor data, web data, social media, transactional data What data is actually necessary, how long does it need to live for, what is its data life cycle What data do we need that we do not have access to How do we curate data for data lakes
  • #11: We are literally drowning in data – data lakes What data do we acquire – sensor data, web data, social media, transactional data What data is actually necessary, how long does it need to live for, what is its data life cycle What data do we need that we do not have access to How do we curate data for data lakes
  • #12: We have four developers and three journalists. 
  • #14: Time line Working on Platform for 3 years across the various links Processed Panama papers in around 12 months
  • #22: How do we store data – databases and files Big data data storage systems HDFS Cloud based S3 or Azure Storage Databases – SQL and NoSQL CSV Hardware – massively scalable software defined infrastructures which expect failure
  • #29: John broke my cluster 20 nodes – scaled to 100 nodes