SlideShare a Scribd company logo
© 2016 DataTorrent
Chinmay Kolhatkar
Committer, Apache Apex
Engineer, DataTorrent
June 21, 2016
Apache Apex-Bigtop
© 2016 DataTorrent
Agenda
2
• About Apache Apex
• Apex Platform Overview
• Apex - Native Hadoop Integration
• Apex Malhar Library
• Apex as a Bigtop component
• Installing Bigtop Apex
• Apex Docker sandbox
• Apex Docker sandbox Demo
© 2016 DataTorrent
About Apache Apex
3
• Platform and runtime engine that enables development of scalable
and fault-tolerant distributed applications
• Hadoop native (Hadoop >= 2.2)
No separate service to manage stream processing
Streaming Engine built into Application Master and Containers
• Process streaming or batch big data
• High throughput and low latency
• Library of commonly needed business logic
• Write any custom business logic in your application
© 2016 DataTorrent
Apex Platform Overview
4
© 2016 DataTorrent
Apex - Native Hadoop Integration
5
• YARN is the
resource
manager
• HDFS used for
storing any
persistent
state
© 2016 DataTorrent
Apex Malhar Library
6
RDBMS
• Vertica
• MySQL
• Oracle
• JDBC
NoSQL
• Cassandra, Hbase
• Aerospike, Accumulo
• Couchbase/ CouchDB
• Redis, MongoDB
• Geode
Messaging
• Kafka
• Solace
• Flume, ActiveMQ
• Kinesis, NiFi
File Systems
• HDFS/ Hive
• NFS
• S3
Parsers
• XML
• JSON
• CSV
• Avro
• Parquet
Transformations
• Filters
• Rules
• Expression
• Dedup
• Enrich
Analytics
• Dimensional Aggregations
(with state management for
historical data + query)
Protocols
• HTTP
• FTP
• WebSocket
• MQTT
• SMTP
Other
• Elastic Search
• Script (JavaScript, Python, R)
• Solr
• Twitter
© 2016 DataTorrent
Apex as Bigtop component
7
• Uses Bigtop framework for ease of deployment
Deployment using puppet recipes and Vagrant
Can spawn multiple node clusters for docker, VM & OpenStack
• Generates a deployable binaries for Apex engine
RPM - CentOS 5 & 6, Fedora 20, OpenSuse 42.1
DEB - Ubuntu 14.04 & 16.04, Debian 8
• Allows validating installations
Package Test
Smoke Test
© 2016 DataTorrent
• Add Bigtop Repository
http://www.apache.org/dist/bigtop/bigtop-1.1.0/repos/
• Install bigtop-hadoop
For Debian: apt-get install hadoop*
For RPM: yum install hadoop*
• Download bigtop-apex from bigtop CI
https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/
• Install Apex:
For Debian: dpkg -i apex_3.4.0-1_all.deb
For RPM: rpm -i apex-3.4.0-1.el6.noarch.rpm
Installing Bigtop Apex
Bigtop 1.1.0 (Current)
8
© 2016 DataTorrent
• Add Bigtop Repository (Future URL)
http://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/
• Install apex
For Debian: apt-get install apex
For RPM: yum install apex
Installing Bigtop Apex
Bigtop 1.2.0 (Next Release)
9
© 2016 DataTorrent
• A quick starter Apex docker image: https://hub.docker.com/r/chinmayk/apex/
• Preconfigured and running components
HDFS (namenode, secondarynamenode, datanode)
YARN (resourcemanager, nodemanager, timelineserver)
• Preconfigured and installed component
Apex
• Get started:
Step1: docker pull chinmayk/apex
Step2: docker run -it chinmayk/apex:ubuntu-14.04
Apex Docker sandbox
10
© 2016 DataTorrent
Apex Docker sandbox (contd.)
11
© 2016 DataTorrent
Resources
12
• Apache Apex website - http://apex.apache.org/
• Subscribe - http://apex.apache.org/community.html
• Download - http://apex.apache.org/downloads.html
• Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex
• Facebook - https://www.facebook.com/ApacheApex/
• Meetup - http://www.meetup.com/topics/apache-apex
• SlideShare - http://www.slideshare.net/ApacheApex/presentations
• More Examples - https://github.com/DataTorrent/examples
• Startup Program – Free Enterprise License for Startups, Educational Institutions,
Non-Profits - https://www.datatorrent.com/startups/
• Cloud Trial - https://www.datatorrent.com/download/cloud-trial/
© 2016 DataTorrent
We Are Hiring
13
• jobs@datatorrent.com
• Back-End Engineers
• Front-End Engineers
• QA Automation Engineers
• Solutions Engineers

More Related Content

What's hot (20)

PPTX
Apache Storm In Retail Context
Karthik Deivasigamani
 
PDF
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
 
PDF
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
PDF
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
DataStax
 
PDF
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
 
PPTX
High cardinality time series search: A new level of scale - Data Day Texas 2016
Eric Sammer
 
PPTX
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
Michael Stack
 
PPTX
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
KEY
Near-realtime analytics with Kafka and HBase
dave_revell
 
PDF
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.
 
PDF
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PPTX
Performance Comparison of Streaming Big Data Platforms
DataWorks Summit/Hadoop Summit
 
PPTX
Architecture of a Kafka camus infrastructure
mattlieber
 
PDF
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Hakka Labs
 
PDF
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
PDF
Stream Processing made simple with Kafka
DataWorks Summit/Hadoop Summit
 
PPTX
Overview of Cascading 3.0 on Apache Flink
Cascading
 
PPTX
Flink history, roadmap and vision
Stephan Ewen
 
PDF
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
jkni
 
Apache Storm In Retail Context
Karthik Deivasigamani
 
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
DataStax
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
Eric Sammer
 
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
Michael Stack
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Near-realtime analytics with Kafka and HBase
dave_revell
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Performance Comparison of Streaming Big Data Platforms
DataWorks Summit/Hadoop Summit
 
Architecture of a Kafka camus infrastructure
mattlieber
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Hakka Labs
 
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
Stream Processing made simple with Kafka
DataWorks Summit/Hadoop Summit
 
Overview of Cascading 3.0 on Apache Flink
Cascading
 
Flink history, roadmap and vision
Stephan Ewen
 
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
jkni
 

Viewers also liked (20)

PDF
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
PPTX
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
PDF
Log ingestion kafka -- impala using apex
Apache Apex
 
PPTX
Writing an Apache Apex Application
Apache Apex
 
PPTX
DataFlow & Beam
Gabriel Hamilton
 
PDF
Real-time Stream Processing using Apache Apex
Apache Apex
 
PPTX
Smart Partitioning with Apache Apex (Webinar)
Apache Apex
 
PPTX
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Apex Introduction with PubMatic
Apache Apex
 
PPTX
Integrating Apache NiFi and Apache Flink
Hortonworks
 
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
PPTX
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
PPTX
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
 
PDF
Introduction to Apache Beam
Jean-Baptiste Onofré
 
PPTX
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
PDF
Streaming Processing with a Distributed Commit Log
Joe Stein
 
PPTX
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
PPTX
Introduction to Apache Apex
Apache Apex
 
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
Log ingestion kafka -- impala using apex
Apache Apex
 
Writing an Apache Apex Application
Apache Apex
 
DataFlow & Beam
Gabriel Hamilton
 
Real-time Stream Processing using Apache Apex
Apache Apex
 
Smart Partitioning with Apache Apex (Webinar)
Apache Apex
 
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Apache Apex Introduction with PubMatic
Apache Apex
 
Integrating Apache NiFi and Apache Flink
Hortonworks
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Beam
Jean-Baptiste Onofré
 
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Streaming Processing with a Distributed Commit Log
Joe Stein
 
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Apex
Apache Apex
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Ad

Similar to Apache Apex & Bigtop (20)

PDF
Cloud Stack with Bare Metal, presented in Apache Con Europe 2016
irvan352366
 
PPTX
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Apex
 
PDF
Installing Hadoop / Spark from scratch
Andrey Vykhodtsev
 
PPTX
Java PaaS Apache Stratos
Chris Haddad
 
PPTX
Storage for containers and cloud-native deployments - Rancher Online Meetup -...
Shannon Williams
 
PDF
Chef for OpenStack December 2012
Matt Ray
 
PPTX
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
PDF
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
PDF
haproxy-150423120602-conversion-gate01.pdf
PawanVerma628806
 
PPTX
HAProxy
Arindam Nayak
 
PPTX
Hadoop engineering bo_f_final
Ramya Sunil
 
PDF
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
PPTX
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
 
PDF
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
PPTX
Cloud Foundry: Hands-on Deployment Workshop
Manuel Garcia
 
PPTX
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
PPTX
DC HUG Hadoop for Windows
Terry Padgett
 
PPTX
Galera on kubernetes_no_video
Patrick Galbraith
 
PPTX
Detailed Introduction To Docker
nklmish
 
PPTX
Midwest PHP - Scaling Magento
Mathew Beane
 
Cloud Stack with Bare Metal, presented in Apache Con Europe 2016
irvan352366
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Apex
 
Installing Hadoop / Spark from scratch
Andrey Vykhodtsev
 
Java PaaS Apache Stratos
Chris Haddad
 
Storage for containers and cloud-native deployments - Rancher Online Meetup -...
Shannon Williams
 
Chef for OpenStack December 2012
Matt Ray
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
haproxy-150423120602-conversion-gate01.pdf
PawanVerma628806
 
HAProxy
Arindam Nayak
 
Hadoop engineering bo_f_final
Ramya Sunil
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
 
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
Cloud Foundry: Hands-on Deployment Workshop
Manuel Garcia
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
DC HUG Hadoop for Windows
Terry Padgett
 
Galera on kubernetes_no_video
Patrick Galbraith
 
Detailed Introduction To Docker
nklmish
 
Midwest PHP - Scaling Magento
Mathew Beane
 
Ad

More from Apache Apex (20)

PDF
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
PDF
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Apache Apex
 
PDF
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
 
PPTX
Intro to Apache Apex @ Women in Big Data
Apache Apex
 
PPTX
Deep Dive into Apache Apex App Development
Apache Apex
 
PPTX
Hadoop Interacting with HDFS
Apache Apex
 
PPTX
Introduction to Real-Time Data Processing
Apache Apex
 
PPTX
Introduction to Apache Apex
Apache Apex
 
PPTX
Introduction to Yarn
Apache Apex
 
PPTX
Introduction to Map Reduce
Apache Apex
 
PPTX
HDFS Internals
Apache Apex
 
PPTX
Intro to Big Data Hadoop
Apache Apex
 
PPTX
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Apache Apex
 
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
PPTX
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Apache Apex
 
PPTX
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Apache Apex
 
PPTX
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
PPTX
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Apache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Apache Apex
 
Deep Dive into Apache Apex App Development
Apache Apex
 
Hadoop Interacting with HDFS
Apache Apex
 
Introduction to Real-Time Data Processing
Apache Apex
 
Introduction to Apache Apex
Apache Apex
 
Introduction to Yarn
Apache Apex
 
Introduction to Map Reduce
Apache Apex
 
HDFS Internals
Apache Apex
 
Intro to Big Data Hadoop
Apache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Apache Apex
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Apache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 

Recently uploaded (20)

PDF
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PDF
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
visibel.ai Company Profile – Real-Time AI Solution for CCTV
visibelaiproject
 
PDF
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
PDF
UiPath on Tour London Community Booth Deck
UiPathCommunity
 
PDF
Alpha Altcoin Setup : TIA - 19th July 2025
CIFDAQ
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PPTX
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
Productivity Management Software | Workstatus
Lovely Baghel
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
Top Managed Service Providers in Los Angeles
Captain IT
 
visibel.ai Company Profile – Real-Time AI Solution for CCTV
visibelaiproject
 
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
UiPath on Tour London Community Booth Deck
UiPathCommunity
 
Alpha Altcoin Setup : TIA - 19th July 2025
CIFDAQ
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 

Apache Apex & Bigtop

  • 1. © 2016 DataTorrent Chinmay Kolhatkar Committer, Apache Apex Engineer, DataTorrent June 21, 2016 Apache Apex-Bigtop
  • 2. © 2016 DataTorrent Agenda 2 • About Apache Apex • Apex Platform Overview • Apex - Native Hadoop Integration • Apex Malhar Library • Apex as a Bigtop component • Installing Bigtop Apex • Apex Docker sandbox • Apex Docker sandbox Demo
  • 3. © 2016 DataTorrent About Apache Apex 3 • Platform and runtime engine that enables development of scalable and fault-tolerant distributed applications • Hadoop native (Hadoop >= 2.2) No separate service to manage stream processing Streaming Engine built into Application Master and Containers • Process streaming or batch big data • High throughput and low latency • Library of commonly needed business logic • Write any custom business logic in your application
  • 4. © 2016 DataTorrent Apex Platform Overview 4
  • 5. © 2016 DataTorrent Apex - Native Hadoop Integration 5 • YARN is the resource manager • HDFS used for storing any persistent state
  • 6. © 2016 DataTorrent Apex Malhar Library 6 RDBMS • Vertica • MySQL • Oracle • JDBC NoSQL • Cassandra, Hbase • Aerospike, Accumulo • Couchbase/ CouchDB • Redis, MongoDB • Geode Messaging • Kafka • Solace • Flume, ActiveMQ • Kinesis, NiFi File Systems • HDFS/ Hive • NFS • S3 Parsers • XML • JSON • CSV • Avro • Parquet Transformations • Filters • Rules • Expression • Dedup • Enrich Analytics • Dimensional Aggregations (with state management for historical data + query) Protocols • HTTP • FTP • WebSocket • MQTT • SMTP Other • Elastic Search • Script (JavaScript, Python, R) • Solr • Twitter
  • 7. © 2016 DataTorrent Apex as Bigtop component 7 • Uses Bigtop framework for ease of deployment Deployment using puppet recipes and Vagrant Can spawn multiple node clusters for docker, VM & OpenStack • Generates a deployable binaries for Apex engine RPM - CentOS 5 & 6, Fedora 20, OpenSuse 42.1 DEB - Ubuntu 14.04 & 16.04, Debian 8 • Allows validating installations Package Test Smoke Test
  • 8. © 2016 DataTorrent • Add Bigtop Repository http://www.apache.org/dist/bigtop/bigtop-1.1.0/repos/ • Install bigtop-hadoop For Debian: apt-get install hadoop* For RPM: yum install hadoop* • Download bigtop-apex from bigtop CI https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/ • Install Apex: For Debian: dpkg -i apex_3.4.0-1_all.deb For RPM: rpm -i apex-3.4.0-1.el6.noarch.rpm Installing Bigtop Apex Bigtop 1.1.0 (Current) 8
  • 9. © 2016 DataTorrent • Add Bigtop Repository (Future URL) http://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/ • Install apex For Debian: apt-get install apex For RPM: yum install apex Installing Bigtop Apex Bigtop 1.2.0 (Next Release) 9
  • 10. © 2016 DataTorrent • A quick starter Apex docker image: https://hub.docker.com/r/chinmayk/apex/ • Preconfigured and running components HDFS (namenode, secondarynamenode, datanode) YARN (resourcemanager, nodemanager, timelineserver) • Preconfigured and installed component Apex • Get started: Step1: docker pull chinmayk/apex Step2: docker run -it chinmayk/apex:ubuntu-14.04 Apex Docker sandbox 10
  • 11. © 2016 DataTorrent Apex Docker sandbox (contd.) 11
  • 12. © 2016 DataTorrent Resources 12 • Apache Apex website - http://apex.apache.org/ • Subscribe - http://apex.apache.org/community.html • Download - http://apex.apache.org/downloads.html • Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex • Facebook - https://www.facebook.com/ApacheApex/ • Meetup - http://www.meetup.com/topics/apache-apex • SlideShare - http://www.slideshare.net/ApacheApex/presentations • More Examples - https://github.com/DataTorrent/examples • Startup Program – Free Enterprise License for Startups, Educational Institutions, Non-Profits - https://www.datatorrent.com/startups/ • Cloud Trial - https://www.datatorrent.com/download/cloud-trial/
  • 13. © 2016 DataTorrent We Are Hiring 13 • [email protected] • Back-End Engineers • Front-End Engineers • QA Automation Engineers • Solutions Engineers