Natural Language Processing with CNTK and Apache Spark with Ali Zaidi

Ali Zaidi
Data Scientist @Microsoft [alizaidi@microsoft.com]
akzaidi
Natural Language
Processing with CNTK
and Spark
@alikzaidi

Want to Run CNTK with your Spark
Pipelines?
• Roope and Sudarshan are talking right now
about mmlspark!
• If you’re interested in using CNTK’s deep
learning algorithms within your Spark pipelines,
check out their talk!
• Come to the Microsoft booth to see a demo
running on Azure HDInsight!
• https://github.com/azure/mmlspark

CNTK – Deep Learning Toolkit
• CNTK, or the Microsoft Cognitive Toolkit, is an open-
source, cross-platform deep learning framework
capable of running on many GPUs with state-of-the art
performance.

Ready for Production!
Home page:
aka.ms/CNTK: https://www.microsoft.com/en-us/cognitive-toolkit/

Benchmarking on a single server by HKBU
“CNTK is production-ready: State-of-the-art accuracy,
efficient, and scales to multi-GPU/multi-server.”
FCN-8 AlexNet ResNet-50 LSTM-64
CNTK 0.037 0.040 (0.054) 0.207 (0.245) 0.122
Caffe 0.038 0.026 (0.033) 0.307 (-) -
TensorFlow 0.063 - (0.058) - (0.346) 0.144
Torch 0.048 0.033 (0.038) 0.188 (0.215) 0.194
G980

Getting Started with CNTK
• Model Gallery:
– https://www.microsoft.com/en-us/cognitive-toolkit/features/model-
gallery/
• Python API Documentation:
– https://www.cntk.ai/pythondocs/index.html
• CNTK with Keras:
– https://docs.microsoft.com/en-us/cognitive-toolkit/using-cntk-with-
keras
• Installing whl from pip:
– pip install https://cntk.ai/PythonWheel/GPU/cntk-2.0-cp35-cp35m-
linux_x86_64.whl

def create_reader(map_file, mean_file, is_training):
# image preprocessing pipeline
transforms = [
ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio')
ImageDeserializer.scale(width=image_width, height=image_height, channels=num_channels,
interpolations='linear'),
ImageDeserializer.mean(mean_file)
]
# deserializer
return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
features = StreamDef(field='image', transforms=transforms), '
labels = StreamDef(field='label', shape=num_classes)
)), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)
• automatic on-the-fly randomization
• readers compose, e.g. image à text caption
How to: reader

Neural networks as graphs
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
ce
• nodes: functions (primitives)
– can be composed into reusable composites
• edges: values
– incl. tensors, sparse
• automatic differentiation and SGD
– ∂F / ∂in = ∂F / ∂out ∙ ∂out / ∂in
• deferred computation à execution engine
• editable, clonable
Graphs are the “assembly language” of DNN
tools
CNTK expresses (nearly) arbitrary neural networks by composing simple building blocks
into complex computational networks, supporting relevant network types and applications

• “model function”
– features à predictions
– defines the model structure & parameter initialization
– holds parameters that will be learned by training
• “criterion function”
– (features, labels) à (training loss, additional metrics)
– defines training and evaluation criteria on top of the model function
– provides gradients w.r.t. training criteria
How to: network
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
ce

How to: network
example: 2-hidden layer feed-forward NN
h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1)
h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2)
P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout)
with input x Î RM
and one-hot label y Î RJ
and cross-entropy training criterion
ce = yT
log P ce = cross_entropy (P, y)
Scorpusce = max

Networks as graphs
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
h1 = sigmoid (x @ W1 + b1)
h2 = sigmoid (h1 @ W2 + b2)
P = softmax (h2 @ Wout + bout)
ce = cross_entropy (P, y)
ce
expression tree with
• primitive ops
• values (tensors)
• composite ops
Graphs are the “assembly language” of DNN tools

Authoring networks as functions
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
# --- graph building with function objects ---
M = 40 ; H = 512 ; J = 9000 # feat/hid/out dim
# - function objects own the learnable parameters
# - here used as blocks in graph building
x = Input(M) ; y = Input(J) # feat/labels
h1 = Dense(H, activation=sigmoid)(x)
h2 = Dense(H, activation=sigmoid)(h1)
P = Dense(J, activation=softmax)(h2)
ce = cross_entropy(P, y)
ce

Layers API
• basic blocks:
– LSTM(), GRU(), RNNUnit()
– Stabilizer(), identity
– ForwardDeclaration(), Tensor[], SparseTensor[], Sequence[], SequenceOver[]
• layers:
– Dense(), Embedding()
– Convolution(), Convolution1D(), Convolution2D(), Convolution3D(), Deconvolution()
– MaxPooling(), AveragePooling(), GlobalMaxPooling(), GlobalAveragePooling(), MaxUnpooling()
– BatchNormalization(), LayerNormalization()
– Dropout(), Activation()
– Label()
• composition:
– Sequential(), For(), operator >>, (function tuples)
– ResNetBlock(), SequentialClique()
• sequences:
– Delay(), PastValueWindow()
– Recurrence(), RecurrenceFrom(), Fold(), UnfoldFrom()
• models:
– AttentionModel()

Authoring networks as functions
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
# --- model function composition ---
# function objects compose the model
model = (Dense(H, activation=sigmoid) >>
Dense(H, activation=sigmoid) >>
Dense(J, activation=softmax))
# criterion still graph-building
P = model(x)
ce = cross_entropy(P, y)
ce

How to: trainer
•
+
s
•
+
s
•
+
softmax
W
1
b1
W
2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
# --- model function composition ---
# function objects compose the model
model = (Dense(H, activation=sigmoid) >>
Dense(H, activation=sigmoid) >>
Dense(J, activation=softmax))
# criterion still graph-building
P = model(x) ; ce = cross_entropy(P, y)
learner = sgd(P.parameters, …)
Trainer = Trainer(P, (ce), [learner])
ce

Natural Language Processing with CNTK and Apache Spark with Ali Zaidi

mmlspark
• mmlspark makes it super easy to embed CNTK trainers,
transformers, and learners directly into your Spark pipelines
• Mix-and-match Spark SQL, Spark ML and mmlspark methods
together
• Okay, how do I get it?
• Docker: microsoft/mmlspark
• HDInsight: script action available on github page
• Databricks cloud: Maven library
• Locally: add maven to your sbt file, or build from source, clone
the repo and ./runme

Torrential Downpour from GitHub
• GitHub Torrent Data (ghtorrent.org)
• Collected by Georgios Gousios from TU Delft
• Available in
– MySQL dumps (5 Billions Records)
– MongoDB (10TB of entity data)
– Azure Data Lake Store
• https://github.com/Microsoft/ghinsights

Torrential Downpour from GitHub

Querying the Data
GitHub API
Events Queue Data
Retrieval
G. Gousios, GitHub Insights:
Understanding Open Source
(OSCON 2016)

Querying the Data
Users Commits
<- comments
-> issue
Pull
Requests
<- pr_commits
Repositories
GH API request for
code patch

Final Dataset
• Each commit, it’s associated commit message, pull request
message (if available), issue messages (if available)
• For natural language, use pre-trained GloVe word
embeddings
– Gives us semantic representation of words based on their co-
occurrences
• For code diffs, we tokenize all consecutive alpha-numeric
characters
• We have tuples of code tokens and natural language
embeddings.

Seq2Seq: Code Patches to Natural
Language
• Use a generative attention-based neural network to model
conditional distribution of a natural language summary
conditioned on a code diff
• Training corpus:
– Where: = set of all code diffs; = set of all
messages
• Trying to find: , that maximizes the
probability of the next-word occurrence given past words

Encoder-Decoder Model
• What we have is a quirky encoder-decoder
model
Attention
<DEL> Import argpase os getcwd
Remove unneeded modules <DONE>

Information Retrieval
• What we have learned is a differentiable function
mapping from code tokens to natural language
• What if we reverse the question, can we ask:
• Yes! Code-retrieval given natural language
descriptions.

Thanks!
Ali Zaidi
Data Scientist @Microsoft [alizaidi@microsoft.com]
akzaidi
@alikzaidi

Natural Language Processing with CNTK and Apache Spark with Ali Zaidi

Recommended

More Related Content

What's hot (20)

Similar to Natural Language Processing with CNTK and Apache Spark with Ali Zaidi (20)

More from Databricks (20)

Recently uploaded (20)

Natural Language Processing with CNTK and Apache Spark with Ali Zaidi