SlideShare a Scribd company logo
Cloudwick
ANKUS	

Deployment and Orchestration Framework for BigData
Frameworks	

Ashrith
Cloudwick
CLOUDWICK
• Motto: Empowering the deployment’s and
adoption of big-data across organizations	

• Accelerate enterprise big data people, process and
technology transformation	

• Dedicated resources and team’s to research and
develop big-data use-cases
Cloudwick
BIG DATA
• Big data is the term for a collection of data sets so large and complex
that it becomes difficult to process using on-hand database
management tools or traditional data processing applications.
[1]
	

• As of 2012, limits on the size of data sets that are feasible to process in
a reasonable amount of time were on the order of exabytes of data.	

• Big data is difficult to work with using most relational database
management systems and desktop statistics and visualization packages,
requiring instead "massively parallel software running on tens, hundreds,
or even thousands of servers".
Cloudwick
REASONS FOR BIG DATA
• Variety - Data today comes in all types of formats. Structured, numeric data in
traditional databases. Information created from line-of-business applications.
Unstructured text documents, email, video, audio, stock ticker data and financial
transactions. Managing, merging and governing different varieties of data is something
many organizations still grapple with.	

• Velocity - Data is streaming in at unprecedented speed and must be dealt with in a
timely manner. RFID tags, sensors and smart metering are driving the need to deal with
torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a
challenge for most organizations.	

• Volume - Many factors contribute to the increase in data volume.Transaction-based
data stored through the years. Unstructured data streaming in from social media.
Increasing amounts of sensor and machine-to-machine data being collected. In the past,
excessive data volume was a storage issue. But with decreasing storage costs, other
issues emerge, including how to determine relevance within large data volumes and
how to use analytics to create value from relevant data.
Cloudwick
BIG DATA STATS
• As of 2012, about 2.5 exabytes of data are created each
day, and that number is doubling every 40 months or so.	

• It is estimated that Walmart collects more than 2.5
petabytes of data every hour from its customer
transactions.A petabyte is one quadrillion bytes, or the
equivalent of about 20 million filing cabinets’ worth of
text.An exabyte is 1,000 times that amount, or one
billion gigabytes.
Cloudwick
TOOLS & FRAMEWORKS
• Mentioned below are some of the big data management frameworks
and tools	

• Data Platforms - where data is stored and processed in huge volumes	

• Distributed file system - Hadoop HDFS	

• Distributed processing engine - Hadoop MapReduce/YARN	

• Distributed real-time computation - Storm	

• In-memory cluster computing - Spark
Cloudwick
TOOLS & FRAMEWORKS
• NoSQL Databases/Data Warehouses - NoSQL (Not only SQL) database
provides a mechanism for storage and retrieval of data that is modeled in
means other than the tabular relations used in relational databases.	

• Column Family Stores - Cassandra, Hbase,Accumulo	

• Document databases - CouchDB, MongoDB, MarkLogic	

• Key-value stores - Redis,Voldemort, Oracle BDB, Riak,Amazon
SimpleDB,Tokyo Cabinet	

• Graph databases - Neo4j, OrientDB, Infinite Graph,Allegro,Virtuoso
Cloudwick
TOOLS & FRAMEWORKS
• BigData Search - Solr, ElasticSearch	

• Data Aggregation - Sqoop, Flume, Chukwa, LogStash	

• Distributed Messaging Queues - Kafka, RabbiMQ	

• Distributed coordination services - Zookeeper	

• Authorization & Authentication - Kerberos, LDAP	

• Encryption & Masking - Gazzang & DataGuise
Cloudwick
WHY DEPLOYMENT &
MANAGEMENTTOOL
• Deploying & managing so many complex
frameworks and tools could be a lot complex	

• Making all those frameworks & tools working
together is complicated	

• Ankus makes it easy to provision, manage and
monitor many of the big-data frameworks
Cloudwick
ANKUS
• Cloudwick project to accelerate big-data discovery
and testing	

• Ankus is a deployment and orchestration tool for
managing big-data frameworks	

• Definition (noun):An elephant goad with a sharp spike
and a hook that is used to prod an elephant into
motion.
Cloudwick
ANKUS FEATURES
• Supports deployments across on-premises and on-cloud	

• Cloud infrastructure supported:	

• AWS	

• Rackspace	

• OpenStack	

• Operating System compatible - Redhat, Debian	

• Multiple big-data frameworks compatible
Cloudwick
ANKUS FEATURES (MORE…)
• Supported Big-Data Frameworks	

• Hadoop	

• HBase	

• Hadoop EcoSystem - Hive, Pig, Oozie, Sqoop, Hue	

• Cassandra	

• Kafka	

• Storm	

• Solr
Cloudwick
ANKUS FEATURES (MORE…)
• Supported Deployment Modes:	

• Highly-available clusters	

• Secure clusters	

• Integrated monitoring	

• Integrated alerting	

• Integrated log-aggregation
Cloudwick
10,000 FOOTVIEW
Cloudwick
INTERNALS
Cloudwick
CLOUD ENGINE
• Ankus has an powerful cloud manager which natively
communicates with many of the major cloud providers	

• Pluggable cloud providers	

• Manages instances & volumes across various cloud providers	

• Configuration based resources management	

• Automatically takes care of the creating volumes and attaching
them to the instances
Cloudwick
ORCHESTRATION ENGINE
• Ankus orchestrates deployments by designing DAG’s which
embeds steps to take to achieve the desired state	

• Nodes being hosts and their states	

• Edge being how nodes depend on each other to achieve
the complete state of the system	

• Ankus leverages `net-*` and `puppet` gems to achieve a
state of the system
Cloudwick
DEPLOY ENGINE
• Ankus uses the DAG’s built by orchestration engine and deploy components	

• The power behind the deploy engine is puppet modules	

• Ankus has wrapper around to make it protocol agnostic	

• `net/ssh`	

• `net/scp`	

• `net/http`	

• `puppet`
Cloudwick
METADATA MANAGEMENT
• Ankus manages metadata of every node to consistently manage the state
on each node	

• The metadata could be store on regular files usingYAML/JSON format or
on RDBMS	

• Metadata includes 	

• nodes information (cpu, cores, ram, etc..)	

• puppet status (last_run, install, etc..)	

• services statuses
Cloudwick
OPEN SOURCE POWER
• We at cloudwick love open-source and don’t want to reinvent the
wheel rather use as many existing components as possible	

• Ankus leverages the following open-source projects:	

• Puppet, MCollective, PuppetDB, Passenger	

• Nagios, NRPE	

• Ganglia & JMXTrans	

• LogStash, Lumberjack, Redis & ElasticSearch
Cloudwick
GEM DEPENDENCIES
Cloudwick
SCREENSHOTS - CLI	

Fig: Command line interface of Ankus
Cloudwick
SCREENSHOTS - CONFIG	

Fig: Sample configuration for deploying hadoop cluster in AWS
Cloudwick
SCREENSHOTS - DEPLOY	

Fig: Ankus deployment information
Cloudwick
SCREENSHOTS - INFO	

Fig:Ankus information overview of the cluster
Cloudwick
SCREENSHOTS - DESTROY	

Fig: Destroy the cluster in the cloud (AWS)
Cloudwick
PROJECT RESOURCES
• HomePage: http://cloudwicklabs.github.io/ankus/	

• Git Repository: https://github.com/ashrithr/ankus	

• IssueTracker: https://github.com/ashrithr/ankus/
issues
Ad

More Related Content

What's hot (20)

Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
Aaron (Ari) Bornstein
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
Tomasz Kopacz
 
What is DataStax Enterprise?
What is DataStax Enterprise?What is DataStax Enterprise?
What is DataStax Enterprise?
DataStax
 
Netflix Teradata partner's presentation
Netflix Teradata partner's presentationNetflix Teradata partner's presentation
Netflix Teradata partner's presentation
Vishal Jain
 
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxWebinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
DataStax
 
Getting Big Value from Big Data
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big Data
DataStax
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
DataStax
 
Building a Digital Bank
Building a Digital BankBuilding a Digital Bank
Building a Digital Bank
DataStax
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
DataStax
 
Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013
adamnelson
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
DataStax
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data
Treasure Data, Inc.
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
DataWorks Summit
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
MapR Technologies
 
DataStax Training – Everything you need to become a Cassandra Rockstar
DataStax Training – Everything you need to become a Cassandra RockstarDataStax Training – Everything you need to become a Cassandra Rockstar
DataStax Training – Everything you need to become a Cassandra Rockstar
DataStax
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
Tomasz Kopacz
 
What is DataStax Enterprise?
What is DataStax Enterprise?What is DataStax Enterprise?
What is DataStax Enterprise?
DataStax
 
Netflix Teradata partner's presentation
Netflix Teradata partner's presentationNetflix Teradata partner's presentation
Netflix Teradata partner's presentation
Vishal Jain
 
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxWebinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
DataStax
 
Getting Big Value from Big Data
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big Data
DataStax
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
DataStax
 
Building a Digital Bank
Building a Digital BankBuilding a Digital Bank
Building a Digital Bank
DataStax
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
DataStax
 
Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013
adamnelson
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
DataStax
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data
Treasure Data, Inc.
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
DataWorks Summit
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
MapR Technologies
 
DataStax Training – Everything you need to become a Cassandra Rockstar
DataStax Training – Everything you need to become a Cassandra RockstarDataStax Training – Everything you need to become a Cassandra Rockstar
DataStax Training – Everything you need to become a Cassandra Rockstar
DataStax
 

Similar to Ankus, bigdata deployment and orchestration framework (20)

Lecture 5- Data Collection and Storage.pptx
Lecture 5- Data Collection and Storage.pptxLecture 5- Data Collection and Storage.pptx
Lecture 5- Data Collection and Storage.pptx
Brianc34
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Big Data training
Big Data trainingBig Data training
Big Data training
vishal192091
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
Adam Doyle
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016
Chris Evans
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs Kubernetes
Stridely Solutions
 
Slide Share MDW Modern Data Warehouse DWH
Slide Share MDW Modern Data Warehouse DWHSlide Share MDW Modern Data Warehouse DWH
Slide Share MDW Modern Data Warehouse DWH
MahmoudTalaat52
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
DataTactics
 
cloud computing data for computer science.pptx
cloud computing data for computer science.pptxcloud computing data for computer science.pptx
cloud computing data for computer science.pptx
JyotiGupta883481
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Denodo
 
3 Steps to Accelerate to Cloud
3 Steps to Accelerate to Cloud3 Steps to Accelerate to Cloud
3 Steps to Accelerate to Cloud
RightScale
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
Big Data Week
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
DATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Maya Lumbroso
 
Lecture 5- Data Collection and Storage.pptx
Lecture 5- Data Collection and Storage.pptxLecture 5- Data Collection and Storage.pptx
Lecture 5- Data Collection and Storage.pptx
Brianc34
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
Adam Doyle
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016
Chris Evans
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs Kubernetes
Stridely Solutions
 
Slide Share MDW Modern Data Warehouse DWH
Slide Share MDW Modern Data Warehouse DWHSlide Share MDW Modern Data Warehouse DWH
Slide Share MDW Modern Data Warehouse DWH
MahmoudTalaat52
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
DataTactics
 
cloud computing data for computer science.pptx
cloud computing data for computer science.pptxcloud computing data for computer science.pptx
cloud computing data for computer science.pptx
JyotiGupta883481
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Denodo
 
3 Steps to Accelerate to Cloud
3 Steps to Accelerate to Cloud3 Steps to Accelerate to Cloud
3 Steps to Accelerate to Cloud
RightScale
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
Big Data Week
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
DATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Maya Lumbroso
 
Ad

Recently uploaded (20)

Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Ad

Ankus, bigdata deployment and orchestration framework

  • 1. Cloudwick ANKUS Deployment and Orchestration Framework for BigData Frameworks Ashrith
  • 2. Cloudwick CLOUDWICK • Motto: Empowering the deployment’s and adoption of big-data across organizations • Accelerate enterprise big data people, process and technology transformation • Dedicated resources and team’s to research and develop big-data use-cases
  • 3. Cloudwick BIG DATA • Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. [1] • As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were on the order of exabytes of data. • Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers".
  • 4. Cloudwick REASONS FOR BIG DATA • Variety - Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with. • Velocity - Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations. • Volume - Many factors contribute to the increase in data volume.Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.
  • 5. Cloudwick BIG DATA STATS • As of 2012, about 2.5 exabytes of data are created each day, and that number is doubling every 40 months or so. • It is estimated that Walmart collects more than 2.5 petabytes of data every hour from its customer transactions.A petabyte is one quadrillion bytes, or the equivalent of about 20 million filing cabinets’ worth of text.An exabyte is 1,000 times that amount, or one billion gigabytes.
  • 6. Cloudwick TOOLS & FRAMEWORKS • Mentioned below are some of the big data management frameworks and tools • Data Platforms - where data is stored and processed in huge volumes • Distributed file system - Hadoop HDFS • Distributed processing engine - Hadoop MapReduce/YARN • Distributed real-time computation - Storm • In-memory cluster computing - Spark
  • 7. Cloudwick TOOLS & FRAMEWORKS • NoSQL Databases/Data Warehouses - NoSQL (Not only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. • Column Family Stores - Cassandra, Hbase,Accumulo • Document databases - CouchDB, MongoDB, MarkLogic • Key-value stores - Redis,Voldemort, Oracle BDB, Riak,Amazon SimpleDB,Tokyo Cabinet • Graph databases - Neo4j, OrientDB, Infinite Graph,Allegro,Virtuoso
  • 8. Cloudwick TOOLS & FRAMEWORKS • BigData Search - Solr, ElasticSearch • Data Aggregation - Sqoop, Flume, Chukwa, LogStash • Distributed Messaging Queues - Kafka, RabbiMQ • Distributed coordination services - Zookeeper • Authorization & Authentication - Kerberos, LDAP • Encryption & Masking - Gazzang & DataGuise
  • 9. Cloudwick WHY DEPLOYMENT & MANAGEMENTTOOL • Deploying & managing so many complex frameworks and tools could be a lot complex • Making all those frameworks & tools working together is complicated • Ankus makes it easy to provision, manage and monitor many of the big-data frameworks
  • 10. Cloudwick ANKUS • Cloudwick project to accelerate big-data discovery and testing • Ankus is a deployment and orchestration tool for managing big-data frameworks • Definition (noun):An elephant goad with a sharp spike and a hook that is used to prod an elephant into motion.
  • 11. Cloudwick ANKUS FEATURES • Supports deployments across on-premises and on-cloud • Cloud infrastructure supported: • AWS • Rackspace • OpenStack • Operating System compatible - Redhat, Debian • Multiple big-data frameworks compatible
  • 12. Cloudwick ANKUS FEATURES (MORE…) • Supported Big-Data Frameworks • Hadoop • HBase • Hadoop EcoSystem - Hive, Pig, Oozie, Sqoop, Hue • Cassandra • Kafka • Storm • Solr
  • 13. Cloudwick ANKUS FEATURES (MORE…) • Supported Deployment Modes: • Highly-available clusters • Secure clusters • Integrated monitoring • Integrated alerting • Integrated log-aggregation
  • 16. Cloudwick CLOUD ENGINE • Ankus has an powerful cloud manager which natively communicates with many of the major cloud providers • Pluggable cloud providers • Manages instances & volumes across various cloud providers • Configuration based resources management • Automatically takes care of the creating volumes and attaching them to the instances
  • 17. Cloudwick ORCHESTRATION ENGINE • Ankus orchestrates deployments by designing DAG’s which embeds steps to take to achieve the desired state • Nodes being hosts and their states • Edge being how nodes depend on each other to achieve the complete state of the system • Ankus leverages `net-*` and `puppet` gems to achieve a state of the system
  • 18. Cloudwick DEPLOY ENGINE • Ankus uses the DAG’s built by orchestration engine and deploy components • The power behind the deploy engine is puppet modules • Ankus has wrapper around to make it protocol agnostic • `net/ssh` • `net/scp` • `net/http` • `puppet`
  • 19. Cloudwick METADATA MANAGEMENT • Ankus manages metadata of every node to consistently manage the state on each node • The metadata could be store on regular files usingYAML/JSON format or on RDBMS • Metadata includes • nodes information (cpu, cores, ram, etc..) • puppet status (last_run, install, etc..) • services statuses
  • 20. Cloudwick OPEN SOURCE POWER • We at cloudwick love open-source and don’t want to reinvent the wheel rather use as many existing components as possible • Ankus leverages the following open-source projects: • Puppet, MCollective, PuppetDB, Passenger • Nagios, NRPE • Ganglia & JMXTrans • LogStash, Lumberjack, Redis & ElasticSearch
  • 22. Cloudwick SCREENSHOTS - CLI Fig: Command line interface of Ankus
  • 23. Cloudwick SCREENSHOTS - CONFIG Fig: Sample configuration for deploying hadoop cluster in AWS
  • 24. Cloudwick SCREENSHOTS - DEPLOY Fig: Ankus deployment information
  • 25. Cloudwick SCREENSHOTS - INFO Fig:Ankus information overview of the cluster
  • 26. Cloudwick SCREENSHOTS - DESTROY Fig: Destroy the cluster in the cloud (AWS)
  • 27. Cloudwick PROJECT RESOURCES • HomePage: http://cloudwicklabs.github.io/ankus/ • Git Repository: https://github.com/ashrithr/ankus • IssueTracker: https://github.com/ashrithr/ankus/ issues