SlideShare a Scribd company logo
Ali Zaidi
Data Scientist @Microsoft [alizaidi@microsoft.com]
akzaidi
Natural Language
Processing with CNTK
and Spark
@alikzaidi
Want to Run CNTK with your Spark
Pipelines?
• Roope and Sudarshan are talking right now
about mmlspark!
• If you’re interested in using CNTK’s deep
learning algorithms within your Spark pipelines,
check out their talk!
• Come to the Microsoft booth to see a demo
running on Azure HDInsight!
• https://github.com/azure/mmlspark
CNTK – Deep Learning Toolkit
• CNTK, or the Microsoft Cognitive Toolkit, is an open-
source, cross-platform deep learning framework
capable of running on many GPUs with state-of-the art
performance.
Ready for Production!
Home page:
aka.ms/CNTK: https://www.microsoft.com/en-us/cognitive-toolkit/
Benchmarking on a single server by HKBU
“CNTK is production-ready: State-of-the-art accuracy,
efficient, and scales to multi-GPU/multi-server.”
FCN-8 AlexNet ResNet-50 LSTM-64
CNTK 0.037 0.040 (0.054) 0.207 (0.245) 0.122
Caffe 0.038 0.026 (0.033) 0.307 (-) -
TensorFlow 0.063 - (0.058) - (0.346) 0.144
Torch 0.048 0.033 (0.038) 0.188 (0.215) 0.194
G980
Getting Started with CNTK
• Model Gallery:
– https://www.microsoft.com/en-us/cognitive-toolkit/features/model-
gallery/
• Python API Documentation:
– https://www.cntk.ai/pythondocs/index.html
• CNTK with Keras:
– https://docs.microsoft.com/en-us/cognitive-toolkit/using-cntk-with-
keras
• Installing whl from pip:
– pip install https://cntk.ai/PythonWheel/GPU/cntk-2.0-cp35-cp35m-
linux_x86_64.whl
Thanks to:
Jarek Kazmierczak
def create_reader(map_file, mean_file, is_training):
# image preprocessing pipeline
transforms = [
ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio')
ImageDeserializer.scale(width=image_width, height=image_height, channels=num_channels,
interpolations='linear'),
ImageDeserializer.mean(mean_file)
]
# deserializer
return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
features = StreamDef(field='image', transforms=transforms), '
labels = StreamDef(field='label', shape=num_classes)
)), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)
• automatic on-the-fly randomization
• readers compose, e.g. image à text caption
How to: reader
Neural networks as graphs
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
ce
• nodes: functions (primitives)
– can be composed into reusable composites
• edges: values
– incl. tensors, sparse
• automatic differentiation and SGD
– ∂F / ∂in = ∂F / ∂out ∙ ∂out / ∂in
• deferred computation à execution engine
• editable, clonable
Graphs are the “assembly language” of DNN
tools
CNTK expresses (nearly) arbitrary neural networks by composing simple building blocks
into complex computational networks, supporting relevant network types and applications
• “model function”
– features à predictions
– defines the model structure & parameter initialization
– holds parameters that will be learned by training
• “criterion function”
– (features, labels) à (training loss, additional metrics)
– defines training and evaluation criteria on top of the model function
– provides gradients w.r.t. training criteria
How to: network
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
ce
How to: network
example: 2-hidden layer feed-forward NN
h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1)
h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2)
P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout)
with	input	x Î RM
and	one-hot	label	y Î RJ
and	cross-entropy	training	criterion
ce = yT
log P ce = cross_entropy (P, y)
Scorpusce = max
Networks as graphs
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
h1 = sigmoid (x @ W1 + b1)
h2 = sigmoid (h1 @ W2 + b2)
P = softmax (h2 @ Wout + bout)
ce = cross_entropy (P, y)
ce
expression tree with
• primitive ops
• values (tensors)
• composite ops
Graphs are the “assembly language” of DNN tools
Authoring networks as functions
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
# --- graph building with function objects ---
M = 40 ; H = 512 ; J = 9000 # feat/hid/out dim
# - function objects own the learnable parameters
# - here used as blocks in graph building
x = Input(M) ; y = Input(J) # feat/labels
h1 = Dense(H, activation=sigmoid)(x)
h2 = Dense(H, activation=sigmoid)(h1)
P = Dense(J, activation=softmax)(h2)
ce = cross_entropy(P, y)
ce
Layers API
• basic blocks:
– LSTM(), GRU(), RNNUnit()
– Stabilizer(), identity
– ForwardDeclaration(), Tensor[], SparseTensor[], Sequence[], SequenceOver[]
• layers:
– Dense(), Embedding()
– Convolution(), Convolution1D(), Convolution2D(), Convolution3D(), Deconvolution()
– MaxPooling(), AveragePooling(), GlobalMaxPooling(), GlobalAveragePooling(), MaxUnpooling()
– BatchNormalization(), LayerNormalization()
– Dropout(), Activation()
– Label()
• composition:
– Sequential(), For(), operator >>, (function tuples)
– ResNetBlock(), SequentialClique()
• sequences:
– Delay(), PastValueWindow()
– Recurrence(), RecurrenceFrom(), Fold(), UnfoldFrom()
• models:
– AttentionModel()
Authoring networks as functions
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
# --- model function composition ---
M = 40 ; H = 512 ; J = 9000 # feat/hid/out dim
# function objects compose the model
model = (Dense(H, activation=sigmoid) >>
Dense(H, activation=sigmoid) >>
Dense(J, activation=softmax))
# criterion still graph-building
x = Input(M) ; y = Input(J) # feat/labels
P = model(x)
ce = cross_entropy(P, y)
ce
How to: trainer
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
# --- model function composition ---
M = 40 ; H = 512 ; J = 9000 # feat/hid/out dim
# function objects compose the model
model = (Dense(H, activation=sigmoid) >>
Dense(H, activation=sigmoid) >>
Dense(J, activation=softmax))
# criterion still graph-building
x = Input(M) ; y = Input(J) # feat/labels
P = model(x) ; ce = cross_entropy(P, y)
learner = sgd(P.parameters, …)
Trainer = Trainer(P, (ce), [learner])
ce
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
mmlspark
• mmlspark makes it super easy to embed CNTK trainers,
transformers, and learners directly into your Spark pipelines
• Mix-and-match Spark SQL, Spark ML and mmlspark methods
together
• Okay, how do I get it?
• Docker: microsoft/mmlspark
• HDInsight: script action available on github page
• Databricks cloud: Maven library
• Locally: add maven to your sbt file, or build from source, clone
the repo and ./runme
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Torrential Downpour from GitHub
• GitHub Torrent Data (ghtorrent.org)
• Collected by Georgios Gousios from TU Delft
• Available in
– MySQL dumps (5 Billions Records)
– MongoDB (10TB of entity data)
– Azure Data Lake Store
• https://github.com/Microsoft/ghinsights
Torrential Downpour from GitHub
Querying the Data
GitHub API
Events Queue Data
Retrieval
G. Gousios, GitHub Insights:
Understanding Open Source
(OSCON 2016)
Querying the Data
Users Commits
<- comments
-> issue
Pull
Requests
<- pr_commits
Repositories
GH API request for
code patch
Final Dataset
• Each commit, it’s associated commit message, pull request
message (if available), issue messages (if available)
• For natural language, use pre-trained GloVe word
embeddings
– Gives us semantic representation of words based on their co-
occurrences
• For code diffs, we tokenize all consecutive alpha-numeric
characters
• We have tuples of code tokens and natural language
embeddings.
Seq2Seq: Code Patches to Natural
Language
• Use a generative attention-based neural network to model
conditional distribution of a natural language summary
conditioned on a code diff
• Training corpus:
– Where: = set of all code diffs; = set of all
messages
• Trying to find: , that maximizes the
probability of the next-word occurrence given past words
Encoder-Decoder Model
• What we have is a quirky encoder-decoder
model
Attention
<DEL> Import argpase os getcwd
Remove unneeded modules <DONE>
Results
Information Retrieval
• What we have learned is a differentiable function
mapping from code tokens to natural language
• What if we reverse the question, can we ask:
• Yes! Code-retrieval given natural language
descriptions.
Thanks!
Ali Zaidi
Data Scientist @Microsoft [alizaidi@microsoft.com]
akzaidi
@alikzaidi

More Related Content

What's hot (20)

HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Wayne Chen
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
Databricks
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Spark Summit
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
DataWorks Summit
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Databricks
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Jim Dowling
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Spark Summit
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
Databricks
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Spark Summit
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Spark Summit
 
Sparklyr: Recap, Updates, and Use Cases with Javier Luraschi
Sparklyr: Recap, Updates, and Use Cases with Javier LuraschiSparklyr: Recap, Updates, and Use Cases with Javier Luraschi
Sparklyr: Recap, Updates, and Use Cases with Javier Luraschi
Databricks
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Wayne Chen
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
Databricks
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Spark Summit
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
DataWorks Summit
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Databricks
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Jim Dowling
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Spark Summit
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
Databricks
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Spark Summit
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Spark Summit
 
Sparklyr: Recap, Updates, and Use Cases with Javier Luraschi
Sparklyr: Recap, Updates, and Use Cases with Javier LuraschiSparklyr: Recap, Updates, and Use Cases with Javier Luraschi
Sparklyr: Recap, Updates, and Use Cases with Javier Luraschi
Databricks
 

Similar to Natural Language Processing with CNTK and Apache Spark with Ali Zaidi (20)

[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
Naoki (Neo) SATO
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
Christian Peel
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
TigerGraph
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
Travis Oliphant
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
Julien SIMON
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Flink Forward
 
Linux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringLinux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and Engineering
PDE1D
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
Greg Makowski
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
Itzik Kotler
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
Matthew Gerring
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
Dataconomy Media
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
Edge AI and Vision Alliance
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
Oleksii Diagiliev
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
b0ris_1
 
Apache MXNet AI
Apache MXNet AIApache MXNet AI
Apache MXNet AI
Mike Frampton
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
Travis Oliphant
 
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
Naoki (Neo) SATO
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
Christian Peel
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
TigerGraph
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
Julien SIMON
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Flink Forward
 
Linux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringLinux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and Engineering
PDE1D
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
Greg Makowski
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
Itzik Kotler
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
Matthew Gerring
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
Dataconomy Media
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
Edge AI and Vision Alliance
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
Oleksii Diagiliev
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
b0ris_1
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
Travis Oliphant
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

Recently uploaded (20)

Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docxGeospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
sofiawilliams5966
 
2. Conditional_Probabilkbkjbj,vj,v,ity.ppt
2. Conditional_Probabilkbkjbj,vj,v,ity.ppt2. Conditional_Probabilkbkjbj,vj,v,ity.ppt
2. Conditional_Probabilkbkjbj,vj,v,ity.ppt
SalmitaSalman
 
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptxMulti-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
VikashVats1
 
time_series_forecasting_constructor_uni.pptx
time_series_forecasting_constructor_uni.pptxtime_series_forecasting_constructor_uni.pptx
time_series_forecasting_constructor_uni.pptx
stefanopinto1113
 
Alcoholic liver disease slides presentation new.pptx
Alcoholic liver disease slides presentation new.pptxAlcoholic liver disease slides presentation new.pptx
Alcoholic liver disease slides presentation new.pptx
DrShashank7
 
Understanding Tree Data Structure and Its Applications
Understanding Tree Data Structure and Its ApplicationsUnderstanding Tree Data Structure and Its Applications
Understanding Tree Data Structure and Its Applications
M Munim
 
Acounting Softwares Options & ERP system
Acounting Softwares Options & ERP systemAcounting Softwares Options & ERP system
Acounting Softwares Options & ERP system
huenkwan1214
 
delta airlines new york office (Airwayscityoffice)
delta airlines new york office (Airwayscityoffice)delta airlines new york office (Airwayscityoffice)
delta airlines new york office (Airwayscityoffice)
jamespromind
 
BADS-MBA-Unit 1 that what data science and Interpretation
BADS-MBA-Unit 1 that what data science and InterpretationBADS-MBA-Unit 1 that what data science and Interpretation
BADS-MBA-Unit 1 that what data science and Interpretation
srishtisingh1813
 
artificial intelligence (1).pptx hgggfcgfch
artificial intelligence (1).pptx hgggfcgfchartificial intelligence (1).pptx hgggfcgfch
artificial intelligence (1).pptx hgggfcgfch
DevAnshGupta609215
 
Unit---5.pdf of ba in srcc du gst before exam
Unit---5.pdf of ba in srcc du gst before examUnit---5.pdf of ba in srcc du gst before exam
Unit---5.pdf of ba in srcc du gst before exam
FireBolt6
 
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
JunZhao68
 
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
yashikanigam1
 
Market Share Analysis.pptx nnnnnnnnnnnnnn
Market Share Analysis.pptx nnnnnnnnnnnnnnMarket Share Analysis.pptx nnnnnnnnnnnnnn
Market Share Analysis.pptx nnnnnnnnnnnnnn
rocky
 
9.-Composite-Dr.-B.-Nalini.pptxfdrtyuioklj
9.-Composite-Dr.-B.-Nalini.pptxfdrtyuioklj9.-Composite-Dr.-B.-Nalini.pptxfdrtyuioklj
9.-Composite-Dr.-B.-Nalini.pptxfdrtyuioklj
aishwaryavdcw
 
GROUP 7 CASE STUDY Real Life Incident.pptx
GROUP 7 CASE STUDY Real Life Incident.pptxGROUP 7 CASE STUDY Real Life Incident.pptx
GROUP 7 CASE STUDY Real Life Incident.pptx
mardoglenn21
 
refractiveindexexperimentdetailed-250528162156-4516aa1c.pptx
refractiveindexexperimentdetailed-250528162156-4516aa1c.pptxrefractiveindexexperimentdetailed-250528162156-4516aa1c.pptx
refractiveindexexperimentdetailed-250528162156-4516aa1c.pptx
KannanDamodaram
 
Understanding LLM Temperature: A comprehensive Guide
Understanding LLM Temperature: A comprehensive GuideUnderstanding LLM Temperature: A comprehensive Guide
Understanding LLM Temperature: A comprehensive Guide
Tamanna36
 
How Data Annotation Services Drive Innovation in Autonomous Vehicles.docx
How Data Annotation Services Drive Innovation in Autonomous Vehicles.docxHow Data Annotation Services Drive Innovation in Autonomous Vehicles.docx
How Data Annotation Services Drive Innovation in Autonomous Vehicles.docx
sofiawilliams5966
 
Nonverbal_Communication_Presentation.pptx
Nonverbal_Communication_Presentation.pptxNonverbal_Communication_Presentation.pptx
Nonverbal_Communication_Presentation.pptx
srtcuibinpm
 
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docxGeospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
sofiawilliams5966
 
2. Conditional_Probabilkbkjbj,vj,v,ity.ppt
2. Conditional_Probabilkbkjbj,vj,v,ity.ppt2. Conditional_Probabilkbkjbj,vj,v,ity.ppt
2. Conditional_Probabilkbkjbj,vj,v,ity.ppt
SalmitaSalman
 
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptxMulti-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
VikashVats1
 
time_series_forecasting_constructor_uni.pptx
time_series_forecasting_constructor_uni.pptxtime_series_forecasting_constructor_uni.pptx
time_series_forecasting_constructor_uni.pptx
stefanopinto1113
 
Alcoholic liver disease slides presentation new.pptx
Alcoholic liver disease slides presentation new.pptxAlcoholic liver disease slides presentation new.pptx
Alcoholic liver disease slides presentation new.pptx
DrShashank7
 
Understanding Tree Data Structure and Its Applications
Understanding Tree Data Structure and Its ApplicationsUnderstanding Tree Data Structure and Its Applications
Understanding Tree Data Structure and Its Applications
M Munim
 
Acounting Softwares Options & ERP system
Acounting Softwares Options & ERP systemAcounting Softwares Options & ERP system
Acounting Softwares Options & ERP system
huenkwan1214
 
delta airlines new york office (Airwayscityoffice)
delta airlines new york office (Airwayscityoffice)delta airlines new york office (Airwayscityoffice)
delta airlines new york office (Airwayscityoffice)
jamespromind
 
BADS-MBA-Unit 1 that what data science and Interpretation
BADS-MBA-Unit 1 that what data science and InterpretationBADS-MBA-Unit 1 that what data science and Interpretation
BADS-MBA-Unit 1 that what data science and Interpretation
srishtisingh1813
 
artificial intelligence (1).pptx hgggfcgfch
artificial intelligence (1).pptx hgggfcgfchartificial intelligence (1).pptx hgggfcgfch
artificial intelligence (1).pptx hgggfcgfch
DevAnshGupta609215
 
Unit---5.pdf of ba in srcc du gst before exam
Unit---5.pdf of ba in srcc du gst before examUnit---5.pdf of ba in srcc du gst before exam
Unit---5.pdf of ba in srcc du gst before exam
FireBolt6
 
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
JunZhao68
 
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
yashikanigam1
 
Market Share Analysis.pptx nnnnnnnnnnnnnn
Market Share Analysis.pptx nnnnnnnnnnnnnnMarket Share Analysis.pptx nnnnnnnnnnnnnn
Market Share Analysis.pptx nnnnnnnnnnnnnn
rocky
 
9.-Composite-Dr.-B.-Nalini.pptxfdrtyuioklj
9.-Composite-Dr.-B.-Nalini.pptxfdrtyuioklj9.-Composite-Dr.-B.-Nalini.pptxfdrtyuioklj
9.-Composite-Dr.-B.-Nalini.pptxfdrtyuioklj
aishwaryavdcw
 
GROUP 7 CASE STUDY Real Life Incident.pptx
GROUP 7 CASE STUDY Real Life Incident.pptxGROUP 7 CASE STUDY Real Life Incident.pptx
GROUP 7 CASE STUDY Real Life Incident.pptx
mardoglenn21
 
refractiveindexexperimentdetailed-250528162156-4516aa1c.pptx
refractiveindexexperimentdetailed-250528162156-4516aa1c.pptxrefractiveindexexperimentdetailed-250528162156-4516aa1c.pptx
refractiveindexexperimentdetailed-250528162156-4516aa1c.pptx
KannanDamodaram
 
Understanding LLM Temperature: A comprehensive Guide
Understanding LLM Temperature: A comprehensive GuideUnderstanding LLM Temperature: A comprehensive Guide
Understanding LLM Temperature: A comprehensive Guide
Tamanna36
 
How Data Annotation Services Drive Innovation in Autonomous Vehicles.docx
How Data Annotation Services Drive Innovation in Autonomous Vehicles.docxHow Data Annotation Services Drive Innovation in Autonomous Vehicles.docx
How Data Annotation Services Drive Innovation in Autonomous Vehicles.docx
sofiawilliams5966
 
Nonverbal_Communication_Presentation.pptx
Nonverbal_Communication_Presentation.pptxNonverbal_Communication_Presentation.pptx
Nonverbal_Communication_Presentation.pptx
srtcuibinpm
 

Natural Language Processing with CNTK and Apache Spark with Ali Zaidi

  • 1. Ali Zaidi Data Scientist @Microsoft [[email protected]] akzaidi Natural Language Processing with CNTK and Spark @alikzaidi
  • 2. Want to Run CNTK with your Spark Pipelines? • Roope and Sudarshan are talking right now about mmlspark! • If you’re interested in using CNTK’s deep learning algorithms within your Spark pipelines, check out their talk! • Come to the Microsoft booth to see a demo running on Azure HDInsight! • https://github.com/azure/mmlspark
  • 3. CNTK – Deep Learning Toolkit • CNTK, or the Microsoft Cognitive Toolkit, is an open- source, cross-platform deep learning framework capable of running on many GPUs with state-of-the art performance.
  • 4. Ready for Production! Home page: aka.ms/CNTK: https://www.microsoft.com/en-us/cognitive-toolkit/
  • 5. Benchmarking on a single server by HKBU “CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server.” FCN-8 AlexNet ResNet-50 LSTM-64 CNTK 0.037 0.040 (0.054) 0.207 (0.245) 0.122 Caffe 0.038 0.026 (0.033) 0.307 (-) - TensorFlow 0.063 - (0.058) - (0.346) 0.144 Torch 0.048 0.033 (0.038) 0.188 (0.215) 0.194 G980
  • 6. Getting Started with CNTK • Model Gallery: – https://www.microsoft.com/en-us/cognitive-toolkit/features/model- gallery/ • Python API Documentation: – https://www.cntk.ai/pythondocs/index.html • CNTK with Keras: – https://docs.microsoft.com/en-us/cognitive-toolkit/using-cntk-with- keras • Installing whl from pip: – pip install https://cntk.ai/PythonWheel/GPU/cntk-2.0-cp35-cp35m- linux_x86_64.whl
  • 8. def create_reader(map_file, mean_file, is_training): # image preprocessing pipeline transforms = [ ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio') ImageDeserializer.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear'), ImageDeserializer.mean(mean_file) ] # deserializer return MinibatchSource(ImageDeserializer(map_file, StreamDefs( features = StreamDef(field='image', transforms=transforms), ' labels = StreamDef(field='label', shape=num_classes) )), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP) • automatic on-the-fly randomization • readers compose, e.g. image à text caption How to: reader
  • 9. Neural networks as graphs • + s • + s • + softmax W 1 b1 W 2 b2 Wout bout cross_entropy h1 h2 P x y ce • nodes: functions (primitives) – can be composed into reusable composites • edges: values – incl. tensors, sparse • automatic differentiation and SGD – ∂F / ∂in = ∂F / ∂out ∙ ∂out / ∂in • deferred computation à execution engine • editable, clonable Graphs are the “assembly language” of DNN tools CNTK expresses (nearly) arbitrary neural networks by composing simple building blocks into complex computational networks, supporting relevant network types and applications
  • 10. • “model function” – features à predictions – defines the model structure & parameter initialization – holds parameters that will be learned by training • “criterion function” – (features, labels) à (training loss, additional metrics) – defines training and evaluation criteria on top of the model function – provides gradients w.r.t. training criteria How to: network • + s • + s • + softmax W 1 b1 W 2 b2 Wout bout cross_entropy h1 h2 P x y ce
  • 11. How to: network example: 2-hidden layer feed-forward NN h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1) h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2) P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout) with input x Î RM and one-hot label y Î RJ and cross-entropy training criterion ce = yT log P ce = cross_entropy (P, y) Scorpusce = max
  • 12. Networks as graphs • + s • + s • + softmax W 1 b1 W 2 b2 Wout bout cross_entropy h1 h2 P x y h1 = sigmoid (x @ W1 + b1) h2 = sigmoid (h1 @ W2 + b2) P = softmax (h2 @ Wout + bout) ce = cross_entropy (P, y) ce expression tree with • primitive ops • values (tensors) • composite ops Graphs are the “assembly language” of DNN tools
  • 13. Authoring networks as functions • + s • + s • + softmax W 1 b1 W 2 b2 Wout bout cross_entropy h1 h2 P x y # --- graph building with function objects --- M = 40 ; H = 512 ; J = 9000 # feat/hid/out dim # - function objects own the learnable parameters # - here used as blocks in graph building x = Input(M) ; y = Input(J) # feat/labels h1 = Dense(H, activation=sigmoid)(x) h2 = Dense(H, activation=sigmoid)(h1) P = Dense(J, activation=softmax)(h2) ce = cross_entropy(P, y) ce
  • 14. Layers API • basic blocks: – LSTM(), GRU(), RNNUnit() – Stabilizer(), identity – ForwardDeclaration(), Tensor[], SparseTensor[], Sequence[], SequenceOver[] • layers: – Dense(), Embedding() – Convolution(), Convolution1D(), Convolution2D(), Convolution3D(), Deconvolution() – MaxPooling(), AveragePooling(), GlobalMaxPooling(), GlobalAveragePooling(), MaxUnpooling() – BatchNormalization(), LayerNormalization() – Dropout(), Activation() – Label() • composition: – Sequential(), For(), operator >>, (function tuples) – ResNetBlock(), SequentialClique() • sequences: – Delay(), PastValueWindow() – Recurrence(), RecurrenceFrom(), Fold(), UnfoldFrom() • models: – AttentionModel()
  • 15. Authoring networks as functions • + s • + s • + softmax W 1 b1 W 2 b2 Wout bout cross_entropy h1 h2 P x y # --- model function composition --- M = 40 ; H = 512 ; J = 9000 # feat/hid/out dim # function objects compose the model model = (Dense(H, activation=sigmoid) >> Dense(H, activation=sigmoid) >> Dense(J, activation=softmax)) # criterion still graph-building x = Input(M) ; y = Input(J) # feat/labels P = model(x) ce = cross_entropy(P, y) ce
  • 16. How to: trainer • + s • + s • + softmax W 1 b1 W 2 b2 Wout bout cross_entropy h1 h2 P x y # --- model function composition --- M = 40 ; H = 512 ; J = 9000 # feat/hid/out dim # function objects compose the model model = (Dense(H, activation=sigmoid) >> Dense(H, activation=sigmoid) >> Dense(J, activation=softmax)) # criterion still graph-building x = Input(M) ; y = Input(J) # feat/labels P = model(x) ; ce = cross_entropy(P, y) learner = sgd(P.parameters, …) Trainer = Trainer(P, (ce), [learner]) ce
  • 18. mmlspark • mmlspark makes it super easy to embed CNTK trainers, transformers, and learners directly into your Spark pipelines • Mix-and-match Spark SQL, Spark ML and mmlspark methods together • Okay, how do I get it? • Docker: microsoft/mmlspark • HDInsight: script action available on github page • Databricks cloud: Maven library • Locally: add maven to your sbt file, or build from source, clone the repo and ./runme
  • 21. Torrential Downpour from GitHub • GitHub Torrent Data (ghtorrent.org) • Collected by Georgios Gousios from TU Delft • Available in – MySQL dumps (5 Billions Records) – MongoDB (10TB of entity data) – Azure Data Lake Store • https://github.com/Microsoft/ghinsights
  • 23. Querying the Data GitHub API Events Queue Data Retrieval G. Gousios, GitHub Insights: Understanding Open Source (OSCON 2016)
  • 24. Querying the Data Users Commits <- comments -> issue Pull Requests <- pr_commits Repositories GH API request for code patch
  • 25. Final Dataset • Each commit, it’s associated commit message, pull request message (if available), issue messages (if available) • For natural language, use pre-trained GloVe word embeddings – Gives us semantic representation of words based on their co- occurrences • For code diffs, we tokenize all consecutive alpha-numeric characters • We have tuples of code tokens and natural language embeddings.
  • 26. Seq2Seq: Code Patches to Natural Language • Use a generative attention-based neural network to model conditional distribution of a natural language summary conditioned on a code diff • Training corpus: – Where: = set of all code diffs; = set of all messages • Trying to find: , that maximizes the probability of the next-word occurrence given past words
  • 27. Encoder-Decoder Model • What we have is a quirky encoder-decoder model Attention <DEL> Import argpase os getcwd Remove unneeded modules <DONE>
  • 29. Information Retrieval • What we have learned is a differentiable function mapping from code tokens to natural language • What if we reverse the question, can we ask: • Yes! Code-retrieval given natural language descriptions.
  • 30. Thanks! Ali Zaidi Data Scientist @Microsoft [[email protected]] akzaidi @alikzaidi