Are we reaching a Data Science Singularity? How Cognitive Computing is emerging from Machine Learning Algorithms, Big Data Tools, and Cloud Services by Natalino Busa

1
Natalino Busa - @natbusa
Natalino Busa
Head of Data Science Teradata

6
What about (data) science?
- technologies and tools are driving innovation in data analytics -

7
Man - Machine
as cognitive systems

8
Learning: The Scientific Method
Ørsted's "First Introduction to General Physics" (1811)
https://en.m.wikipedia.org/wiki/History_of_scientific_method
observation hypothesis deduction synthesis
Hans Christian Ørsted
experiment
Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY

9
Innovation in Data Analytics
Cloud Community AI & ML

10
Cloud

11
“we live in an age of open source datacenters, so
we can stack all these things together and we
have open source from the ground to ceiling.”
Sam Ramji, CEO of Cloud Foundry
https://www.youtube.com/watch?v=7oCSFcUW-Qk

12
Analytics in the cloud
Bare Metal: Physical Machines
IAAS: Virtual Resources
CAAS: Containers,
dPAAS: Datastores, Data Engines
iPAAS: Tools Integration, Flows & Processes
DAAAS: Data Analytics as a Service

13
DAAAS: AI and ML API’s
Cloud Computing for Deep Neural Networks
> Models, Compute (Train, Score), and Data
AI and ML models for:
● Speech (audio)
● Language (text)
● Vision (images/video)
● Data (classification, regression, clustering, anomaly detection)

14
Ephemeral Computing Clusters on a Cloud
data
create load compute store
timeline
destroy

15
dPaaS: Analytical clusters
Ephemeral
Short-Lived
Data Exploration
Isolated, Personal
Simple Access Management
Permanent
Long Lived
Production / Operations
Co-Ordinated
Complex Access Management
vs

16
GPU’s and Distributed Computing
GPU support is coming in Kubernetes, Mesos, Spark
https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus
http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark
out
up
CPU
R,Python
Spark
TensorFrames

17
Community

18
Community
Develop - Use - Share

19
Sharing is caring … speed
github.com + Jupyter notebooks,
share ideas, code, and data
arxiv.org
share innovation and scientific results

20
Artificial Intelligence
Machine Learning

21
Google: open-sources NLP parser
scoring 95% in grammar accuracy
https://github.com/tensorflow/models/tree/master/syntaxnet

22
Deep Learning in Language Parsing
https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png

23
Semantic Search: TDA + NNs Word2Vec, Par2Vec, Doc2Vec
https://arxiv.org/pdf/1405.4053v2.pdf

24
Lip reading
LipNet achieves 93.4% accuracy,
on GRID corpus.

25
Ask me Anything
Dynamic Memory Networks
for Natural Language
Processing
https://youtu.be/oGk1v1jQITw
Caiming Xiong,
Stephen Merity,
Richard Socher

26
Ask me Anything
http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial
Dynamic Memory Networks for Natural Language Processing
http://www.socher.org/
Local
context
Wider
context
NLP, Attention Masks
Semantic Embeddings from Text, Images

27
Network Traffic Patterns Classification

28
Network Intrusion Detection
http://billsdata.net/?p=105
It contains 130 million flow records involving
12,027 distinct computers over 36 days (not
the full 58 days claimed for the entire data
release).
Each record consists of: time (to nearest
second), duration, source and destination
computer ids, source and destination ports,
protocol, number of packets and number of
bytes
Techniques: TDA, Dimensionality Reduction
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction

29
Approaching (Almost) Any Machine Learning Problem
- Abhishek Thakur, Kaggle Grandmaster -
data labels
raw data: tables, files Useful dataData munging Feature
Engineering
Tabular Data ready for ML
http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-l
earning-problem-abhishek-thakur/

30
AutoML challenge
- based on scikit-learn
- 15 classifiers,
- 14 feature preprocessing methods
- 4 data preprocessing methods
- 110 hyperparameters
- Supervised classification challenge:
100 different datasets

31
Artificial + Human Intelligence

32
Human cognitive biases :
Too much information
Not enough meaning
What should we
remember?
Need to act fast
https://en.wikipedia.org/wiki/List_of_cognitive_biases

33
Man vs Machine cognitive limits
Model generation
Explanation
Unsupervised
Planning
Too much information
Not enough meaning
Need to act quickly
Memory limits

34
Theorems often tell us complex truths about the simple things,
but only rarely tell us simple truths about the complex ones
Marvin Minsky
K-Linesː A Theory of Memory (1980)

35
Data Science: wear the AI/ML Lenses
We are entering a new era of intelligent machines
Boost our understanding of data
Focus on higher level analyses

36
Intelligent Data Systems:
Long live the “database”
Wikipedia:
A database is an organized collection of data.
DATA
New-SQL
ML
AI
SQL
Python - Scala - R
NLP
UX
Speech
COG

37
The Database.
is never going to be the same.

38
Thank you.
@natbusa

39
credits

40
bonus slides

Are we reaching a Data Science Singularity? How Cognitive Computing is emerging from Machine Learning Algorithms, Big Data Tools, and Cloud Services by Natalino Busa

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Are we reaching a Data Science Singularity? How Cognitive Computing is emerging from Machine Learning Algorithms, Big Data Tools, and Cloud Services by Natalino Busa (20)

More from Big Data Spain (20)

Recently uploaded (20)

Are we reaching a Data Science Singularity? How Cognitive Computing is emerging from Machine Learning Algorithms, Big Data Tools, and Cloud Services by Natalino Busa