SlideShare a Scribd company logo
© Hortonworks Inc. 2011
Hadoop Engineering Best Practices
Raja Aluri, Release Eng
Deepesh Khandelwal, Quality Eng
Ramya Sunil, Quality Eng
Page 1
© Hortonworks Inc. 2011
Agenda
• Source Mechanics
• Why do System Testing?
• Test Matrix
• Automated Testing Flow
• Test Planning
• Planning your own System Testing
• Q & A
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Apache Hortonworks Partner Source
Mechanics
• Hortonworks Open Source Philosophy
• How we do Apache first development
• How we incorporate fixes or features that did not make into apache yet
• How we integrate our partner contributions to the source code
• Bookkeeping of the delta between apache and Hortonworks
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Apache-Hortonworks-Partner Source flow
Page 4
Architecting the Future of Big Data
Partner
ApacheRef
HDPRef
Partner
HWX
ApacheRef
HDP
Apache Git
Hadoopbranch-2
Hadoopbranch-2.4
Issue Type Course of Action
Normal Issue Patch in Apache first
Urgent Issue Patch in HWX Repo first
Read-Write Repository
Read-Only Repository
Continuous
Merges
Continuous
Merges
HDP Build
CI
HDP
Package
Repo
HDP
Maven
Repository
Publish
Releases
QE Workflow
for Testing
© Hortonworks Inc. 2011
Unit Testing
• Test individual parts of the program in isolation, white-box testing
• Homogeneous cluster, usually in-memory
• One configuration, usually 1 operating system and unsecure
• Limited dataset, usually few kilobytes
Page 5
Architecting the Future of Big Data
Unit testing
component A
Unit testing
component
C
Unit testing
component
B
?? ??
??
??
DB
Interaction
Concurrent
user
interaction
Third party
connectors
??
??
??
© Hortonworks Inc. 2011
System Testing
• Mimics production environment
– Multiple nodes in the cluster
– Multiple concurrent users
– Different workloads
• Multiple configurations to test
• Large dataset, more complex and richer
• Encompasses different types of testing
– Functional
– Performance, Stress and Reliability
– High Availability
– Backwards Compatibility
– Integration testing
– Third party connectors
– Upgrade testing
Page 6
Architecting the Future of Big Data
© Hortonworks Inc. 2011
System Testing cont...
• Heterogeneous testing
– Cross version testing
– Cross operating system testing
– Hardware configs like Disk and CPU
– Security settings, level of encryption
Page 7
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Test Matrix
• Total of ~15000+ configurations to test!
Page 8
Architecting the Future of Big Data
OS
•CentOS
•SuSE
•Debian
•Ubuntu
•Windows
JDK
•Oracle JDK
•OpenJDK
•Different version - 1.6.x, 1.7.x,
1.8.x
Security
•Disabled
•Enabled – MIT-only, AD-only,
MIT-AD
•Ranger - enabled/disabled
Encryption
•Wire encryption –
enabled/disabled
•Transparent Data Encryption
– enabled/disabled
DB
•Mysql
•Oracle
•Postgres
•MSSQL
File system
•HDFS
•WASB
•Other vendor specific FSs
Others
•Tez – enabled/disabled
•Slider apps v/s standalone
© Hortonworks Inc. 2011
Automated Testing Flow
Page 9
Architecting the Future of Big Data
Build Job
Apache
Repos
Internal
Commits
Staging
Repo
QE Deploy
Trigger
Provision VMs
Deploy HDP Stack
Test Setup & Execution
Test analysis
Continuous Integration
Publishing Builds to staging
repo
Installer deploying bits from
staging repo to test cluster
Bug tracking system
© Hortonworks Inc. 2011
Test Planning
20+ components in the HDP stack and growing!
Page 10
Architecting the Future of Big Data
Test
plan
Internal
developers
Apache jiras
and
community
forums
Product
Management
Support
tickets
© Hortonworks Inc. 2011
Planning your own QATS
Architecting the Future of Big Data
Page 11
© Hortonworks Inc. 2011
Typical user scenarios
• Fresh install
• Upgrade stack, going from an earlier release to a newer one
• Migration, changing distributions
• Applying changes to an existing cluster
– Upgrading hardware in regards to CPU, memory, disks
– Changing dependent software pieces like OS, JDK
– Changing security settings like turning ON Kerberos, Encryption
– Changing component configs in *-site.xml, enabling HA
Page 12
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Planning your own QATS
Page 13
Architecting the Future of Big Data
E2E automation
Preparation
phase
• Collect
requirements
on the stack
and workload
• Identify
appropriate
hardware
CI development
phase
• Build in-
house CI
system for
deployment
and testing
Testing phase
• Build basic
acceptance
tests
• End to end
automation
for your
application
© Hortonworks Inc. 2011
Preparation Phase
• Collect the stack requirements
– Identify all the stack components that will be installed including the third-party
applications, connectors
– Identify the installer
– Identify configs
• Hardware selection
– Should be scaled appropriately to mimic production environment
– Prefer multi-node than single-node with component services distributed
• Collect workload information
– Use actual workload whenever possible
– If not, simulate the workload, some tools available
– Use rumen to obtain jobtrace from existing clusters
– Use gridmix to generate workload
– Data set size and complexity
– Number of concurrent users
Page 14
Architecting the Future of Big Data
© Hortonworks Inc. 2011
CI Development phase
• Implement a CI system
– Modularize CI system, eg individual Jenkins jobs for provision, deploy and test
• Determine the cadence of testing
• Establish reporting
Page 15
Architecting the Future of Big Data
Provision
cluster
Deploy Test
© Hortonworks Inc. 2011
Testing Phase
• Basic Acceptance Tests
– Basic service check for individual deployed components
– Basic acceptance tests to validate integrations
• Establish baseline – to track performance of pipeline components in
future
• Compatibility tests (including apps, third party connectors, dashboards
etc)
• E2E automation to simulate production workloads
Page 16
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Q & A
Page 17
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thank You!
Architecting the Future of Big Data
Page 18
Ad

More Related Content

What's hot (20)

Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
Chris Nauroth
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
Chris Nauroth
 

Similar to Hadoop engineering bo_f_final (20)

Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOps
Eklove Mohan
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
Bert Poller
 
Ankit Chohan - Java
Ankit Chohan - JavaAnkit Chohan - Java
Ankit Chohan - Java
Ankit Chohan
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
Grid Dynamics
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling Magento
Mathew Beane
 
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
Vietnam Open Infrastructure User Group
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
Peter Clapham
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
Hojoong Kim
 
Oracle Cloud DBaaS
Oracle Cloud DBaaSOracle Cloud DBaaS
Oracle Cloud DBaaS
Arush Jain
 
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
Jean Vanderdonckt
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
GlobalLogic Ukraine
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
Jianfeng Zhang
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
Rajesh Balamohan
 
PP_Eric_Gandt
PP_Eric_GandtPP_Eric_Gandt
PP_Eric_Gandt
Eric Gandt
 
Performance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMsPerformance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMs
Maarten Smeets
 
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Filipe Miranda
 
Devops architecture
Devops architectureDevops architecture
Devops architecture
Ojasvi Jagtap
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
Senturus
 
Testing Below the Application
Testing Below the ApplicationTesting Below the Application
Testing Below the Application
Ash Winter
 
Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOps
Eklove Mohan
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
Bert Poller
 
Ankit Chohan - Java
Ankit Chohan - JavaAnkit Chohan - Java
Ankit Chohan - Java
Ankit Chohan
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
Grid Dynamics
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling Magento
Mathew Beane
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
Hojoong Kim
 
Oracle Cloud DBaaS
Oracle Cloud DBaaSOracle Cloud DBaaS
Oracle Cloud DBaaS
Arush Jain
 
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
Jean Vanderdonckt
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
GlobalLogic Ukraine
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
Jianfeng Zhang
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
Rajesh Balamohan
 
Performance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMsPerformance of Microservice Frameworks on different JVMs
Performance of Microservice Frameworks on different JVMs
Maarten Smeets
 
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Filipe Miranda
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
Senturus
 
Testing Below the Application
Testing Below the ApplicationTesting Below the Application
Testing Below the Application
Ash Winter
 
Ad

Recently uploaded (20)

Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Ad

Hadoop engineering bo_f_final

  • 1. © Hortonworks Inc. 2011 Hadoop Engineering Best Practices Raja Aluri, Release Eng Deepesh Khandelwal, Quality Eng Ramya Sunil, Quality Eng Page 1
  • 2. © Hortonworks Inc. 2011 Agenda • Source Mechanics • Why do System Testing? • Test Matrix • Automated Testing Flow • Test Planning • Planning your own System Testing • Q & A Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2011 Apache Hortonworks Partner Source Mechanics • Hortonworks Open Source Philosophy • How we do Apache first development • How we incorporate fixes or features that did not make into apache yet • How we integrate our partner contributions to the source code • Bookkeeping of the delta between apache and Hortonworks Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2011 Apache-Hortonworks-Partner Source flow Page 4 Architecting the Future of Big Data Partner ApacheRef HDPRef Partner HWX ApacheRef HDP Apache Git Hadoopbranch-2 Hadoopbranch-2.4 Issue Type Course of Action Normal Issue Patch in Apache first Urgent Issue Patch in HWX Repo first Read-Write Repository Read-Only Repository Continuous Merges Continuous Merges HDP Build CI HDP Package Repo HDP Maven Repository Publish Releases QE Workflow for Testing
  • 5. © Hortonworks Inc. 2011 Unit Testing • Test individual parts of the program in isolation, white-box testing • Homogeneous cluster, usually in-memory • One configuration, usually 1 operating system and unsecure • Limited dataset, usually few kilobytes Page 5 Architecting the Future of Big Data Unit testing component A Unit testing component C Unit testing component B ?? ?? ?? ?? DB Interaction Concurrent user interaction Third party connectors ?? ?? ??
  • 6. © Hortonworks Inc. 2011 System Testing • Mimics production environment – Multiple nodes in the cluster – Multiple concurrent users – Different workloads • Multiple configurations to test • Large dataset, more complex and richer • Encompasses different types of testing – Functional – Performance, Stress and Reliability – High Availability – Backwards Compatibility – Integration testing – Third party connectors – Upgrade testing Page 6 Architecting the Future of Big Data
  • 7. © Hortonworks Inc. 2011 System Testing cont... • Heterogeneous testing – Cross version testing – Cross operating system testing – Hardware configs like Disk and CPU – Security settings, level of encryption Page 7 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2011 Test Matrix • Total of ~15000+ configurations to test! Page 8 Architecting the Future of Big Data OS •CentOS •SuSE •Debian •Ubuntu •Windows JDK •Oracle JDK •OpenJDK •Different version - 1.6.x, 1.7.x, 1.8.x Security •Disabled •Enabled – MIT-only, AD-only, MIT-AD •Ranger - enabled/disabled Encryption •Wire encryption – enabled/disabled •Transparent Data Encryption – enabled/disabled DB •Mysql •Oracle •Postgres •MSSQL File system •HDFS •WASB •Other vendor specific FSs Others •Tez – enabled/disabled •Slider apps v/s standalone
  • 9. © Hortonworks Inc. 2011 Automated Testing Flow Page 9 Architecting the Future of Big Data Build Job Apache Repos Internal Commits Staging Repo QE Deploy Trigger Provision VMs Deploy HDP Stack Test Setup & Execution Test analysis Continuous Integration Publishing Builds to staging repo Installer deploying bits from staging repo to test cluster Bug tracking system
  • 10. © Hortonworks Inc. 2011 Test Planning 20+ components in the HDP stack and growing! Page 10 Architecting the Future of Big Data Test plan Internal developers Apache jiras and community forums Product Management Support tickets
  • 11. © Hortonworks Inc. 2011 Planning your own QATS Architecting the Future of Big Data Page 11
  • 12. © Hortonworks Inc. 2011 Typical user scenarios • Fresh install • Upgrade stack, going from an earlier release to a newer one • Migration, changing distributions • Applying changes to an existing cluster – Upgrading hardware in regards to CPU, memory, disks – Changing dependent software pieces like OS, JDK – Changing security settings like turning ON Kerberos, Encryption – Changing component configs in *-site.xml, enabling HA Page 12 Architecting the Future of Big Data
  • 13. © Hortonworks Inc. 2011 Planning your own QATS Page 13 Architecting the Future of Big Data E2E automation Preparation phase • Collect requirements on the stack and workload • Identify appropriate hardware CI development phase • Build in- house CI system for deployment and testing Testing phase • Build basic acceptance tests • End to end automation for your application
  • 14. © Hortonworks Inc. 2011 Preparation Phase • Collect the stack requirements – Identify all the stack components that will be installed including the third-party applications, connectors – Identify the installer – Identify configs • Hardware selection – Should be scaled appropriately to mimic production environment – Prefer multi-node than single-node with component services distributed • Collect workload information – Use actual workload whenever possible – If not, simulate the workload, some tools available – Use rumen to obtain jobtrace from existing clusters – Use gridmix to generate workload – Data set size and complexity – Number of concurrent users Page 14 Architecting the Future of Big Data
  • 15. © Hortonworks Inc. 2011 CI Development phase • Implement a CI system – Modularize CI system, eg individual Jenkins jobs for provision, deploy and test • Determine the cadence of testing • Establish reporting Page 15 Architecting the Future of Big Data Provision cluster Deploy Test
  • 16. © Hortonworks Inc. 2011 Testing Phase • Basic Acceptance Tests – Basic service check for individual deployed components – Basic acceptance tests to validate integrations • Establish baseline – to track performance of pipeline components in future • Compatibility tests (including apps, third party connectors, dashboards etc) • E2E automation to simulate production workloads Page 16 Architecting the Future of Big Data
  • 17. © Hortonworks Inc. 2011 Q & A Page 17 Architecting the Future of Big Data
  • 18. © Hortonworks Inc. 2011 Thank You! Architecting the Future of Big Data Page 18