SlideShare a Scribd company logo
Vinay Shukla Twitter: @neomythos
Feb 17th, 2016
Multi User Data Science with Zeppelin® ®
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed.
Project capabilities are based on information that is publicly available within the Apache
Software Foundation project websites ("Apache"). Progress of the project capabilities
can be tracked from inception to release through Apache, however, technical feasibility,
market demand, user feedback and the overarching Apache Software Foundation
community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans,
customers should not rely upon it when making purchasing decisions.
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introducing Apache Zeppelin Web-based Notebook for
interactive analytics
Features
Ad-hoc experimentation
Spark, Hive, Shell, Flink, Tajo, Ignite, Lens, etc
Deeply integrated with Spark + Hadoop
Can be managed via Ambari Stacks
Supports multiple language backends
Pluggable “Interpreters”
Incubating at Apache
100% open source and open community
Use Case
Data exploration and discovery
Visualization
tables, graphs and charts
Interactive snippet-at-a-time experience
Collaboration and publishing
“Modern Data Science Studio”
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Zeppelin
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
PySpark / Spark SQL
Page6 © Hortonworks Inc. 2014
Spark & Zeppelin Pace of Innovation
HDP 2.2.4
Spark 1.2.1
GA
HDP 2.3.2
Spark 1.4.1
GA
HDP 2.3.0
Spark 1.3.1
GA
HDP 2.3.4
Spark 1.5.2*
GA
Spark
Spark 1.3.1
TP
5/2015
Spark 1.4.1
TP
8/2015
Spark 1.5.1
TP
Nov/2015
Now
Zeppelin
TP
Oct/2015
Apache Zeppelin
Zeppelin
TP Refresh
March 1st 2016
Dec 2015
HDP 2.4.0
Spark 1.6
GA
Zeppelin
GA
Q1, 2016
Spark 1.6
TP
Jan/2015
March 1st 2016
HDP 2.5.x
Spark 1.6.1*
GA
Q1, 2016
© Hortonworks Inc. 2015. All Rights Reserved
What’s New in HDP 2.4.0?
• Spark 1.6 GA
– GA of Dynamic Resource Allocation*
• Zeppelin TP#2
– Notebook import/export features
– LDAP Authentication*
Marketing
announcement
coming March
1st
© Hortonworks Inc. 2015. All Rights Reserved
Requirements for Zeppelin in a M/T Env
• Support multiple users
• Security - Provide security sandbox by default
• Authentication – LDAP – Integrate with Corporate Identity
Store
• Authorization – Access Control for both Data & Notebooks
• Encryption – Work with both Wire & encrypted data
• Audit – Keep track of who did, what, when & what results
with non-repudiation
• Manageability
• Sharing/Collaboration of both data & notebooks
Page9 © Hortonworks Inc. 2014
Zeppelin GA – Features
•Ambari Managed Install/Configuration
•Runs in a Kerberos Cluster
•LDAP Authentication
•SSL
•Notebook Import/Export
Coming April,
2016
Page10 © Hortonworks Inc. 2014
Zeppelin Missing Features
•R Interpreter
•Better Visualizations
–GGPlot,, Shiny equivalent visualizations
•Access Control on Notebooks
•Library Management
Page11 © Hortonworks Inc. 2014
What is coming later? – H2, 2016
•Zeppelin Improvements
–Zeppelin Access Control
–Ambari managed LDAP Configuration
–Pluggable Visualization
–R Interpreter
Page12 © Hortonworks Inc. 2014
Various Apache Zeppelin JIRA/Pull Requests
–Identity Propagation: https://issues.apache.org/jira/browse/ZEPPELIN-645
–LDAP Authentication: https://github.com/apache/incubator-zeppelin/pull/625
–Notebook Access Control: https://github.com/apache/incubator-
zeppelin/pull/681
–Notebook Import/Export: https://issues.apache.org/jira/browse/ZEPPELIN-372
–R Interpreter: https://issues.apache.org/jira/browse/ZEPPELIN-156
Page13 © Hortonworks Inc. 2014
Thank You
Twitter:
@neomythos

More Related Content

What's hot (20)

PPTX
Zeppelin at Twitter
Prasad Wagle
 
PPTX
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
Luke Han
 
PPTX
Apache Accumulo 1.8.0 Overview
Josh Elser
 
PDF
Local Apache NiFi Processor Debug
Deon Huang
 
PDF
Apache NiFi SDLC Improvements
Bryan Bende
 
PDF
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
PDF
Deep learning on HDP 2018 Prague
Timothy Spann
 
PDF
How mentoring can help you start contributing to open source
Luciano Resende
 
PDF
Luciano Resende's keynote at Apache big data conference
Luciano Resende
 
PPTX
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays
 
PPTX
Apache Ambari - What's New in 2.4
Hortonworks
 
PPTX
Oracle SQL Developer: 3 Features You're Not Using But Should Be
Jeff Smith
 
PPTX
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Carolyn Duby
 
PDF
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
PivotalOpenSourceHub
 
PPTX
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
PPTX
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks
 
PPTX
Apache Ambari: Past, Present, Future
Hortonworks
 
PDF
Error Management Features of PL/SQL
Steven Feuerstein
 
PPTX
Change Management for Oracle Database with SQLcl
Jeff Smith
 
PPTX
Hive ACID Apache BigData 2016
alanfgates
 
Zeppelin at Twitter
Prasad Wagle
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
Luke Han
 
Apache Accumulo 1.8.0 Overview
Josh Elser
 
Local Apache NiFi Processor Debug
Deon Huang
 
Apache NiFi SDLC Improvements
Bryan Bende
 
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
Deep learning on HDP 2018 Prague
Timothy Spann
 
How mentoring can help you start contributing to open source
Luciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende
 
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays
 
Apache Ambari - What's New in 2.4
Hortonworks
 
Oracle SQL Developer: 3 Features You're Not Using But Should Be
Jeff Smith
 
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Carolyn Duby
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
PivotalOpenSourceHub
 
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks
 
Apache Ambari: Past, Present, Future
Hortonworks
 
Error Management Features of PL/SQL
Steven Feuerstein
 
Change Management for Oracle Database with SQLcl
Jeff Smith
 
Hive ACID Apache BigData 2016
alanfgates
 

Viewers also liked (20)

PPTX
Fine-Grained Security for Spark and Hive
DataWorks Summit/Hadoop Summit
 
PPTX
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PDF
Big Data visualization with Apache Spark and Zeppelin
prajods
 
PDF
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
Data Science Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
K data
 
PDF
DLAB company info and big data case studies
DLAB
 
PPTX
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
PDF
Pivotal HAWQ 소개
Seungdon Choi
 
PPTX
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
 
PDF
ベアメタルクラウドの運用をJupyter NotebookとAnsibleで機械化してみた
Satoshi Yazawa
 
PPTX
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
PPTX
오픈소스 프로젝트 따라잡기_공개
Hyoungjun Kim
 
PPTX
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Hortonworks
 
PPTX
Apache Ranger
Rommel Garcia
 
PPTX
Apache Atlas: Tracking dataset lineage across Hadoop components
DataWorks Summit/Hadoop Summit
 
PDF
Deploying and Managing Hadoop Clusters with AMBARI
DataWorks Summit
 
PPTX
Hadoop administration
Ryan Guhnguk Ahn
 
PDF
Apache zeppelin 0.7.0 helium
Ahyoung Ryu
 
Fine-Grained Security for Spark and Hive
DataWorks Summit/Hadoop Summit
 
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Data Science Crash Course Hadoop Summit SJ
Daniel Madrigal
 
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
K data
 
DLAB company info and big data case studies
DLAB
 
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
Pivotal HAWQ 소개
Seungdon Choi
 
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
 
ベアメタルクラウドの運用をJupyter NotebookとAnsibleで機械化してみた
Satoshi Yazawa
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
오픈소스 프로젝트 따라잡기_공개
Hyoungjun Kim
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Hortonworks
 
Apache Ranger
Rommel Garcia
 
Apache Atlas: Tracking dataset lineage across Hadoop components
DataWorks Summit/Hadoop Summit
 
Deploying and Managing Hadoop Clusters with AMBARI
DataWorks Summit
 
Hadoop administration
Ryan Guhnguk Ahn
 
Apache zeppelin 0.7.0 helium
Ahyoung Ryu
 
Ad

Similar to Multi User Data science with Zeppelin (20)

PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
PDF
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
PPTX
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
PPT
Running Zeppelin in Enterprise
DataWorks Summit
 
PPTX
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
 
PPTX
Data Science with Spark & Zeppelin
Vinay Shukla
 
PPTX
Data Science in the Cloud with Spark, Zeppelin, and Cloudbreak
DataWorks Summit
 
PPTX
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
PPTX
Future of data visualization
hadoopsphere
 
PPTX
Zeppelin at twitter (sf data science meetup, july 2016)
Prasad Wagle
 
PPTX
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
PPTX
Quick Tour On Zeppelin
Knoldus Inc.
 
PDF
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PPTX
Toulouse Data Science meetup - Apache zeppelin
Gérard Dupont
 
PPTX
Data Science at Scale with Apache Spark and Zeppelin Notebook
Carolyn Duby
 
PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Alex Zeltov
 
PDF
Running Zeppelin in Enterprise
DataWorks Summit
 
PPTX
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
PPTX
Zeppelin – An Agile & interactive analytical platform
Abhra Pal
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
Running Zeppelin in Enterprise
DataWorks Summit
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
 
Data Science with Spark & Zeppelin
Vinay Shukla
 
Data Science in the Cloud with Spark, Zeppelin, and Cloudbreak
DataWorks Summit
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
Future of data visualization
hadoopsphere
 
Zeppelin at twitter (sf data science meetup, july 2016)
Prasad Wagle
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
Quick Tour On Zeppelin
Knoldus Inc.
 
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Toulouse Data Science meetup - Apache zeppelin
Gérard Dupont
 
Data Science at Scale with Apache Spark and Zeppelin Notebook
Carolyn Duby
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Alex Zeltov
 
Running Zeppelin in Enterprise
DataWorks Summit
 
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Zeppelin – An Agile & interactive analytical platform
Abhra Pal
 
Ad

Recently uploaded (20)

PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 

Multi User Data science with Zeppelin

  • 1. Vinay Shukla Twitter: @neomythos Feb 17th, 2016 Multi User Data Science with Zeppelin® ®
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introducing Apache Zeppelin Web-based Notebook for interactive analytics Features Ad-hoc experimentation Spark, Hive, Shell, Flink, Tajo, Ignite, Lens, etc Deeply integrated with Spark + Hadoop Can be managed via Ambari Stacks Supports multiple language backends Pluggable “Interpreters” Incubating at Apache 100% open source and open community Use Case Data exploration and discovery Visualization tables, graphs and charts Interactive snippet-at-a-time experience Collaboration and publishing “Modern Data Science Studio”
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Zeppelin
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved PySpark / Spark SQL
  • 6. Page6 © Hortonworks Inc. 2014 Spark & Zeppelin Pace of Innovation HDP 2.2.4 Spark 1.2.1 GA HDP 2.3.2 Spark 1.4.1 GA HDP 2.3.0 Spark 1.3.1 GA HDP 2.3.4 Spark 1.5.2* GA Spark Spark 1.3.1 TP 5/2015 Spark 1.4.1 TP 8/2015 Spark 1.5.1 TP Nov/2015 Now Zeppelin TP Oct/2015 Apache Zeppelin Zeppelin TP Refresh March 1st 2016 Dec 2015 HDP 2.4.0 Spark 1.6 GA Zeppelin GA Q1, 2016 Spark 1.6 TP Jan/2015 March 1st 2016 HDP 2.5.x Spark 1.6.1* GA Q1, 2016
  • 7. © Hortonworks Inc. 2015. All Rights Reserved What’s New in HDP 2.4.0? • Spark 1.6 GA – GA of Dynamic Resource Allocation* • Zeppelin TP#2 – Notebook import/export features – LDAP Authentication* Marketing announcement coming March 1st
  • 8. © Hortonworks Inc. 2015. All Rights Reserved Requirements for Zeppelin in a M/T Env • Support multiple users • Security - Provide security sandbox by default • Authentication – LDAP – Integrate with Corporate Identity Store • Authorization – Access Control for both Data & Notebooks • Encryption – Work with both Wire & encrypted data • Audit – Keep track of who did, what, when & what results with non-repudiation • Manageability • Sharing/Collaboration of both data & notebooks
  • 9. Page9 © Hortonworks Inc. 2014 Zeppelin GA – Features •Ambari Managed Install/Configuration •Runs in a Kerberos Cluster •LDAP Authentication •SSL •Notebook Import/Export Coming April, 2016
  • 10. Page10 © Hortonworks Inc. 2014 Zeppelin Missing Features •R Interpreter •Better Visualizations –GGPlot,, Shiny equivalent visualizations •Access Control on Notebooks •Library Management
  • 11. Page11 © Hortonworks Inc. 2014 What is coming later? – H2, 2016 •Zeppelin Improvements –Zeppelin Access Control –Ambari managed LDAP Configuration –Pluggable Visualization –R Interpreter
  • 12. Page12 © Hortonworks Inc. 2014 Various Apache Zeppelin JIRA/Pull Requests –Identity Propagation: https://issues.apache.org/jira/browse/ZEPPELIN-645 –LDAP Authentication: https://github.com/apache/incubator-zeppelin/pull/625 –Notebook Access Control: https://github.com/apache/incubator- zeppelin/pull/681 –Notebook Import/Export: https://issues.apache.org/jira/browse/ZEPPELIN-372 –R Interpreter: https://issues.apache.org/jira/browse/ZEPPELIN-156
  • 13. Page13 © Hortonworks Inc. 2014 Thank You Twitter: @neomythos