SlideShare a Scribd company logo
Introduction to R, Python, and Flow
Amy Wang
amy@h2o.ai
Getting Started with H2O
• Learn how R, Flow, and Python sends commands to compute in H2O
• FAQ on writing R, Flow, and Python expressions
• Hands on introduction into data science
• Understanding model outputs
• Note the limitations of the basic workflow to improve upon later
Objective
I have H2O
Installed
I have Python
installed
I have R
installed
I have the H2O
World data
sets
Pick up stickers or get install help at the
information booth
Reading Data into H2O with R
R User
h2o_df = h2o.importFile(“../data/allyears2k.csv”)
STEP 1
Reading Data into H2O with R
STEP 2
H2O
H2O
H2O
H2O ClusterInitiate distributed
ingest
2.3
HTTP REST API
request to H2O
carries the path
argument
2.2
allyears2k.csvRequest data
2.4
R
h2o.importFile()
2.1
R function call
Reading Data into H2O with R
Data Provided
3.1
allyears2k.csv
R
Cluster IP
Cluster Port
Pointer to Data
3.4
h2o_df object
created in R
h2o_df
H2O
H2O
H2O
H2O
Frame
3.2
Distributed H2O
Frame in DKV
H2O Cluster
Return pointer to
data in REST API
JSON Response
3.3
STEP 3
R Script Starting H2O GLM
HTTP
REST/JSON
.h2o.startModelJob()
POST /3/ModelBuilders/glm
h2o.glm()
R script
Standard R process
TCP/IP
HTTP
REST/JSON
/3/ModelBuilders/glm endpoint
Job
GLM algorithm
GLM tasks
Fork/Join
framework
K/V store
framework
H2O process
Network layer
REST layer
H2O - algos
H2O - core
User process
H2O process
Legend
R Parity
Hands on Introduction to Running H2O
• Import & parse a small 44000 row airlines dataset
• Run a Logistic Regression
• Build a Deep Learning model
• Review the Model Outputs
Starting up H2O and Preloaded Workbook
From terminal, change the directory to where h2o.jar file is sitting and run:
> java -jar h2o.jar
Then access the Flow UI at:
https://localhost:54321
Open the intro-to-r.md.R file and run from R (Native R or Rstudio):
library(h2o)
h2o.init(nthreads = -1)
Open either intro-to-python.ipynb or intro-to-python.py with python:
import h2o
import …
h2o.init()
Flow Users
R Users
Python Users
Load up Preinstalled Flow Pack
For Flow Users
Import Airlines Data into H2O
importFiles [ "https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv" ]
setupParse paths: [ "https://s3.amazonaws.com/h2o-airlines-unpacked/
allyears2k.csv" ]
…
airlines.hex <- h2o.importFile(path = normalizePath(“../data/allyears2k.csv”),
destionation_frame = “allyearks2k.hex”)
** Note: Make sure that the working directory is set to where your
intro-to-r.md.R file is located otherwise the path to the data will be wrong.
airlines.hex = h2o.import_file(path = os.path.realpath(“../data/allyears2k.csv”),
destionation_frame = “allyearks2k.hex”)
airlines.hex.describe()
Flow Users
R Users
Python Users
The Airlines Data
• Goal: To predict departure delays using historical airlines
data.
• Enumerator columns: Dest, Origin, and Unique Carrier have
a cardinality of 134, 132, and 10 respectively
• Numeric Columns: DayOfMonth, Year, DayOfWeek, Month,
and Distance
• Binary Response Column: IsDepDelayed
Build a Logistic Regression Model
y <- "IsDepDelayed"
x <- c("Dest", "Origin", "DayofMonth", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance")
glm_model <- h2o.glm(x = x, y = y, training_frame = airlines.hex, model_id = "glm_model_from_R",
solver = "IRLSM", standardize = T, link = "logit",
family = "binomial", alpha = 0.5, lambda = 1e-05)
myY = “IsDepDelayed”
myX = ["Dest", "Origin", "DayofMonth", "Year", "UniqueCarrier", "DayOfWeek", "Month", “Distance"]
glm_model = H2OGeneralizedLinearEstimator(family = "binomial",standardize = True, solver = “IRLSM", link =
"logit", alpha = 0.5, model_id = "glm_model_from_python" )
glm_model.train( x = myX, y = myY, training_frame = airlines_hex)
Flow Users
R Users
Python Users
GLM Model Output
To Learn More About GLM
• Ramanujan Stage @ 10:45 AM
• Erdos Stage @ 2:15 PM
Build a Deep Learning Model
dl_model <- h2o.deeplearning(x = x, y = y, training_frame = airlines.hex,
distribution = “bernoulli”, model_id = "deeplearning_model_from_R", epochs = 100,
hidden = c(200,200), target_ratio_comm_to_comp = 0.02, seed =
6765686131094811000, variable_importances = T)
deeplearning_model = H2ODeepLearningEstimator( distribution = "bernoulli",
model_id = “deeplearning_model_from_python”, epochs = 100, hidden = [200,200], seed =
6765686131094811000, variable_importances = True)
deeplearning_model.train(x = myX, y = myY, training_frame = airlines_hex)
Flow Users
R Users
Python Users
Deep Learning Model Output
To Learn More About Deep Learning
• Ramanujan Stage @ 5:45 PM
• Erdos Stage @ 1:30 PM
Overview
• Write R and Python expression to clean and munge the data in a
parallelized and distributed fashion in H2O.
• Automate model builds by writing R and Python code and because
all frames and models are generate in H2O, it is also accessible
from the Flow UI.
• From the Web UI, the user can readily access the POJO which is
the H2O independent Java representation of the models.
Ad

Recommended

H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi Mehta
Sri Ambati
 
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
Sri Ambati
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on
Sri Ambati
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
Tatiana Al-Chueyr
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
Growth Intelligence
 
Spark Summit EU talk by Nimbus Goehausen
Spark Summit EU talk by Nimbus Goehausen
Spark Summit
 
Presto overview
Presto overview
Shixiong Zhu
 
Sparkling Water Meetup
Sparkling Water Meetup
Sri Ambati
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Akihiro Hayashi
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data Science
Erik Bernhardsson
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Forward
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
wqchen
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
Big Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling Water
Sri Ambati
 
Monitoring Spark Applications
Monitoring Spark Applications
Tzach Zohar
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
Hadoop User Group
 
Getting to Know Airflow
Getting to Know Airflow
Rosanne Hoyem
 
R and C++
R and C++
Romain Francois
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFS
John Conley
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
InfluxData
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
Konrad Malawski
 
Unit testing of spark applications
Unit testing of spark applications
Knoldus Inc.
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPC
Tatiana Al-Chueyr
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
Sid Anand
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
Introduction to R for Data Science :: Session 3
Introduction to R for Data Science :: Session 3
Goran S. Milovanovic
 

More Related Content

What's hot (20)

Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Akihiro Hayashi
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data Science
Erik Bernhardsson
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Forward
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
wqchen
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
Big Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling Water
Sri Ambati
 
Monitoring Spark Applications
Monitoring Spark Applications
Tzach Zohar
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
Hadoop User Group
 
Getting to Know Airflow
Getting to Know Airflow
Rosanne Hoyem
 
R and C++
R and C++
Romain Francois
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFS
John Conley
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
InfluxData
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
Konrad Malawski
 
Unit testing of spark applications
Unit testing of spark applications
Knoldus Inc.
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPC
Tatiana Al-Chueyr
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
Sid Anand
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Akihiro Hayashi
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data Science
Erik Bernhardsson
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Forward
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
wqchen
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
Big Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling Water
Sri Ambati
 
Monitoring Spark Applications
Monitoring Spark Applications
Tzach Zohar
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
Hadoop User Group
 
Getting to Know Airflow
Getting to Know Airflow
Rosanne Hoyem
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFS
John Conley
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
InfluxData
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
Konrad Malawski
 
Unit testing of spark applications
Unit testing of spark applications
Knoldus Inc.
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPC
Tatiana Al-Chueyr
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
Sid Anand
 

Viewers also liked (20)

H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
Introduction to R for Data Science :: Session 3
Introduction to R for Data Science :: Session 3
Goran S. Milovanovic
 
Introduction to R for Data Science :: Session 2
Introduction to R for Data Science :: Session 2
Goran S. Milovanovic
 
R language tutorial
R language tutorial
David Chiu
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
H2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! Aalborg
Sri Ambati
 
H2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas Nykodym
Sri Ambati
 
R Intro Workshop
R Intro Workshop
Saad Chahine
 
OSGeo와 Open Data
OSGeo와 Open Data
r-kor
 
황성수 공공데이터 개방과 공공이슈 해결
황성수 공공데이터 개방과 공공이슈 해결
r-kor
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive Analytics
Jorge Martinez de Salinas
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
R Systems International
 
Optimizing Facebook Campaigns with R
Optimizing Facebook Campaigns with R
Domino Data Lab
 
R lecture oga
R lecture oga
Osamu Ogasawara
 
The Next List: R&D Breakthroughs that are Changing the World
The Next List: R&D Breakthroughs that are Changing the World
GE
 
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
 
Cloud Conf 2015 - Develop and Deploy IOT Applications
Cloud Conf 2015 - Develop and Deploy IOT Applications
Corley S.r.l.
 
IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Me...
IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Me...
In-Memory Computing Summit
 
오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
r-kor
 
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
r-kor
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
Introduction to R for Data Science :: Session 3
Introduction to R for Data Science :: Session 3
Goran S. Milovanovic
 
Introduction to R for Data Science :: Session 2
Introduction to R for Data Science :: Session 2
Goran S. Milovanovic
 
R language tutorial
R language tutorial
David Chiu
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
H2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! Aalborg
Sri Ambati
 
H2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas Nykodym
Sri Ambati
 
OSGeo와 Open Data
OSGeo와 Open Data
r-kor
 
황성수 공공데이터 개방과 공공이슈 해결
황성수 공공데이터 개방과 공공이슈 해결
r-kor
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive Analytics
Jorge Martinez de Salinas
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
R Systems International
 
Optimizing Facebook Campaigns with R
Optimizing Facebook Campaigns with R
Domino Data Lab
 
The Next List: R&D Breakthroughs that are Changing the World
The Next List: R&D Breakthroughs that are Changing the World
GE
 
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
 
Cloud Conf 2015 - Develop and Deploy IOT Applications
Cloud Conf 2015 - Develop and Deploy IOT Applications
Corley S.r.l.
 
IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Me...
IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Me...
In-Memory Computing Summit
 
오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
r-kor
 
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
r-kor
 
Ad

Similar to H2O World - Intro to R, Python, and Flow - Amy Wang (20)

Living With Legacy Code
Living With Legacy Code
Rowan Merewood
 
Container (Docker) Orchestration Tools
Container (Docker) Orchestration Tools
Dhilipsiva DS
 
Pemrograman Python untuk Pemula
Pemrograman Python untuk Pemula
Oon Arfiandwi
 
What's New In Laravel 5
What's New In Laravel 5
Darren Craig
 
Using and scaling Rack and Rack-based middleware
Using and scaling Rack and Rack-based middleware
Alona Mekhovova
 
Python Code Camp for Professionals 3/4
Python Code Camp for Professionals 3/4
DEVCON
 
Swift Cloud Workshop - Swift Microservices
Swift Cloud Workshop - Swift Microservices
Chris Bailey
 
Best practices tekx
Best practices tekx
Lorna Mitchell
 
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
Qiangning Hong
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
Introduction to data science with H2O-Chicago
Introduction to data science with H2O-Chicago
Sri Ambati
 
CodeIgniter PHP MVC Framework
CodeIgniter PHP MVC Framework
Bo-Yi Wu
 
OSSBarCamp Talk on Dexy
OSSBarCamp Talk on Dexy
ananelson
 
Python and Oracle : allies for best of data management
Python and Oracle : allies for best of data management
Laurent Leturgez
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
Itzik Kotler
 
Django
Django
Harmeet Lamba
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Edureka!
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
Lindsay Holmwood
 
Tips
Tips
mclee
 
Python from zero to hero (Twitter Explorer)
Python from zero to hero (Twitter Explorer)
Yuriy Senko
 
Living With Legacy Code
Living With Legacy Code
Rowan Merewood
 
Container (Docker) Orchestration Tools
Container (Docker) Orchestration Tools
Dhilipsiva DS
 
Pemrograman Python untuk Pemula
Pemrograman Python untuk Pemula
Oon Arfiandwi
 
What's New In Laravel 5
What's New In Laravel 5
Darren Craig
 
Using and scaling Rack and Rack-based middleware
Using and scaling Rack and Rack-based middleware
Alona Mekhovova
 
Python Code Camp for Professionals 3/4
Python Code Camp for Professionals 3/4
DEVCON
 
Swift Cloud Workshop - Swift Microservices
Swift Cloud Workshop - Swift Microservices
Chris Bailey
 
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
Qiangning Hong
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
Introduction to data science with H2O-Chicago
Introduction to data science with H2O-Chicago
Sri Ambati
 
CodeIgniter PHP MVC Framework
CodeIgniter PHP MVC Framework
Bo-Yi Wu
 
OSSBarCamp Talk on Dexy
OSSBarCamp Talk on Dexy
ananelson
 
Python and Oracle : allies for best of data management
Python and Oracle : allies for best of data management
Laurent Leturgez
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
Itzik Kotler
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Edureka!
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
Lindsay Holmwood
 
Tips
Tips
mclee
 
Python from zero to hero (Twitter Explorer)
Python from zero to hero (Twitter Explorer)
Yuriy Senko
 
Ad

More from Sri Ambati (20)

H2O.ai Agents : From Theory to Practice - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
Intro to Enterprise h2oGPTe Presentation Slides
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
Enterprise h2o GPTe Learning Path Slide Deck
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
H2O Wave Course Starter - Presentation Slides
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
Large Language Models (LLMs) - Level 3 Slides
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
Data Science and Machine Learning Platforms (2024) Slides
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
Data Prep for H2O Driverless AI - Slides
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
H2O Cloud AI Developer Services - Slides (2024)
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
LLM Learning Path Level 1 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
Hydrogen Torch - Starter Course - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
H2O Driverless AI Starter Course - Slides and Assignments
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
Intro to Enterprise h2oGPTe Presentation Slides
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
Enterprise h2o GPTe Learning Path Slide Deck
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
H2O Wave Course Starter - Presentation Slides
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
Large Language Models (LLMs) - Level 3 Slides
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
Data Science and Machine Learning Platforms (2024) Slides
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
Data Prep for H2O Driverless AI - Slides
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
H2O Cloud AI Developer Services - Slides (2024)
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
LLM Learning Path Level 1 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
Hydrogen Torch - Starter Course - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
H2O Driverless AI Starter Course - Slides and Assignments
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 

Recently uploaded (20)

Zoneranker’s Digital marketing solutions
Zoneranker’s Digital marketing solutions
reenashriee
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
Maharshi Mallela
 
Zoho Creator Solution for EI by Elsner Technologies.docx
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
From Data Preparation to Inference: How Alluxio Speeds Up AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
arctitecture application system design os dsa
arctitecture application system design os dsa
za241967
 
IObit Driver Booster Pro 12 Crack Latest Version Download
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
 
Decipher SEO Solutions for your startup needs.
Decipher SEO Solutions for your startup needs.
mathai2
 
Download Adobe Illustrator Crack free for Windows 2025?
Download Adobe Illustrator Crack free for Windows 2025?
grete1122g
 
Heat Treatment Process Automation in India
Heat Treatment Process Automation in India
Reckers Mechatronics
 
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
WSO2
 
Best Practice for LLM Serving in the Cloud
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
Shane Coughlan
 
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
IFI Techsolutions
 
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
Hassan Abid
 
Automated Migration of ESRI Geodatabases Using XML Control Files and FME
Automated Migration of ESRI Geodatabases Using XML Control Files and FME
Safe Software
 
Azure AI Foundry: The AI app and agent factory
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
Application Modernization with Choreo - The AI-Native Internal Developer Plat...
Application Modernization with Choreo - The AI-Native Internal Developer Plat...
WSO2
 
ElectraSuite_Prsentation(online voting system).pptx
ElectraSuite_Prsentation(online voting system).pptx
mrsinankhan01
 
Zoneranker’s Digital marketing solutions
Zoneranker’s Digital marketing solutions
reenashriee
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
Maharshi Mallela
 
Zoho Creator Solution for EI by Elsner Technologies.docx
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
From Data Preparation to Inference: How Alluxio Speeds Up AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
arctitecture application system design os dsa
arctitecture application system design os dsa
za241967
 
IObit Driver Booster Pro 12 Crack Latest Version Download
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
 
Decipher SEO Solutions for your startup needs.
Decipher SEO Solutions for your startup needs.
mathai2
 
Download Adobe Illustrator Crack free for Windows 2025?
Download Adobe Illustrator Crack free for Windows 2025?
grete1122g
 
Heat Treatment Process Automation in India
Heat Treatment Process Automation in India
Reckers Mechatronics
 
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
WSO2
 
Best Practice for LLM Serving in the Cloud
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
Shane Coughlan
 
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
IFI Techsolutions
 
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
Hassan Abid
 
Automated Migration of ESRI Geodatabases Using XML Control Files and FME
Automated Migration of ESRI Geodatabases Using XML Control Files and FME
Safe Software
 
Azure AI Foundry: The AI app and agent factory
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
Application Modernization with Choreo - The AI-Native Internal Developer Plat...
Application Modernization with Choreo - The AI-Native Internal Developer Plat...
WSO2
 
ElectraSuite_Prsentation(online voting system).pptx
ElectraSuite_Prsentation(online voting system).pptx
mrsinankhan01
 

H2O World - Intro to R, Python, and Flow - Amy Wang

  • 1. Introduction to R, Python, and Flow Amy Wang [email protected]
  • 2. Getting Started with H2O • Learn how R, Flow, and Python sends commands to compute in H2O • FAQ on writing R, Flow, and Python expressions • Hands on introduction into data science • Understanding model outputs • Note the limitations of the basic workflow to improve upon later Objective
  • 3. I have H2O Installed I have Python installed I have R installed I have the H2O World data sets Pick up stickers or get install help at the information booth
  • 4. Reading Data into H2O with R R User h2o_df = h2o.importFile(“../data/allyears2k.csv”) STEP 1
  • 5. Reading Data into H2O with R STEP 2 H2O H2O H2O H2O ClusterInitiate distributed ingest 2.3 HTTP REST API request to H2O carries the path argument 2.2 allyears2k.csvRequest data 2.4 R h2o.importFile() 2.1 R function call
  • 6. Reading Data into H2O with R Data Provided 3.1 allyears2k.csv R Cluster IP Cluster Port Pointer to Data 3.4 h2o_df object created in R h2o_df H2O H2O H2O H2O Frame 3.2 Distributed H2O Frame in DKV H2O Cluster Return pointer to data in REST API JSON Response 3.3 STEP 3
  • 7. R Script Starting H2O GLM HTTP REST/JSON .h2o.startModelJob() POST /3/ModelBuilders/glm h2o.glm() R script Standard R process TCP/IP HTTP REST/JSON /3/ModelBuilders/glm endpoint Job GLM algorithm GLM tasks Fork/Join framework K/V store framework H2O process Network layer REST layer H2O - algos H2O - core User process H2O process Legend
  • 9. Hands on Introduction to Running H2O • Import & parse a small 44000 row airlines dataset • Run a Logistic Regression • Build a Deep Learning model • Review the Model Outputs
  • 10. Starting up H2O and Preloaded Workbook From terminal, change the directory to where h2o.jar file is sitting and run: > java -jar h2o.jar Then access the Flow UI at: https://localhost:54321 Open the intro-to-r.md.R file and run from R (Native R or Rstudio): library(h2o) h2o.init(nthreads = -1) Open either intro-to-python.ipynb or intro-to-python.py with python: import h2o import … h2o.init() Flow Users R Users Python Users
  • 11. Load up Preinstalled Flow Pack For Flow Users
  • 12. Import Airlines Data into H2O importFiles [ "https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv" ] setupParse paths: [ "https://s3.amazonaws.com/h2o-airlines-unpacked/ allyears2k.csv" ] … airlines.hex <- h2o.importFile(path = normalizePath(“../data/allyears2k.csv”), destionation_frame = “allyearks2k.hex”) ** Note: Make sure that the working directory is set to where your intro-to-r.md.R file is located otherwise the path to the data will be wrong. airlines.hex = h2o.import_file(path = os.path.realpath(“../data/allyears2k.csv”), destionation_frame = “allyearks2k.hex”) airlines.hex.describe() Flow Users R Users Python Users
  • 13. The Airlines Data • Goal: To predict departure delays using historical airlines data. • Enumerator columns: Dest, Origin, and Unique Carrier have a cardinality of 134, 132, and 10 respectively • Numeric Columns: DayOfMonth, Year, DayOfWeek, Month, and Distance • Binary Response Column: IsDepDelayed
  • 14. Build a Logistic Regression Model y <- "IsDepDelayed" x <- c("Dest", "Origin", "DayofMonth", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance") glm_model <- h2o.glm(x = x, y = y, training_frame = airlines.hex, model_id = "glm_model_from_R", solver = "IRLSM", standardize = T, link = "logit", family = "binomial", alpha = 0.5, lambda = 1e-05) myY = “IsDepDelayed” myX = ["Dest", "Origin", "DayofMonth", "Year", "UniqueCarrier", "DayOfWeek", "Month", “Distance"] glm_model = H2OGeneralizedLinearEstimator(family = "binomial",standardize = True, solver = “IRLSM", link = "logit", alpha = 0.5, model_id = "glm_model_from_python" ) glm_model.train( x = myX, y = myY, training_frame = airlines_hex) Flow Users R Users Python Users
  • 16. To Learn More About GLM • Ramanujan Stage @ 10:45 AM • Erdos Stage @ 2:15 PM
  • 17. Build a Deep Learning Model dl_model <- h2o.deeplearning(x = x, y = y, training_frame = airlines.hex, distribution = “bernoulli”, model_id = "deeplearning_model_from_R", epochs = 100, hidden = c(200,200), target_ratio_comm_to_comp = 0.02, seed = 6765686131094811000, variable_importances = T) deeplearning_model = H2ODeepLearningEstimator( distribution = "bernoulli", model_id = “deeplearning_model_from_python”, epochs = 100, hidden = [200,200], seed = 6765686131094811000, variable_importances = True) deeplearning_model.train(x = myX, y = myY, training_frame = airlines_hex) Flow Users R Users Python Users
  • 19. To Learn More About Deep Learning • Ramanujan Stage @ 5:45 PM • Erdos Stage @ 1:30 PM
  • 20. Overview • Write R and Python expression to clean and munge the data in a parallelized and distributed fashion in H2O. • Automate model builds by writing R and Python code and because all frames and models are generate in H2O, it is also accessible from the Flow UI. • From the Web UI, the user can readily access the POJO which is the H2O independent Java representation of the models.