SlideShare a Scribd company logo
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
What is Big Data?
What is Big Data?
Big Data = All Data!
Big Data = All Data!
Unstructured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
Semi-Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Structured CSV, Columnar Storage (Parquet,
ORC). Strict data model structure
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
• Volume: It can be 1GB or 1PB
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
• Volume: It can be 1GB or 1PB
Why is Processing Big Data Challenging ?
Big Data with Azure
Big Data with Azure
TrustedProductive IntelligentHybrid
Azure. Cloud for all.
>80%
of Fortune 500 use
the Microsoft Cloud
Big Data with Azure
Azure Big Data Processing Pipeline Ingest
Azure Event Hubs
Compose, orchestrate & monitor data services at scale
• Fully managed service
• Any data on-premises or in the cloud
• Single pane of glass management
• Global service infrastructure
• Cost Effective
Azure Data Factory
BI & analytics
Stored Procedures
Hadoop on Azure
Data Lake Analytics
Custom Code
Machine Learning
Trusted data
Azure Big Data Processing Pipeline Store
A Z U R E B L O B S T O R A G E
• A highly scalable object storage for unstructured data
 Serverless Azure Service.
 Can store billions of Images, Videos, Audio,
Documents etc.
 Automatically scales as more data is uploaded.
 Four Replication Options: LRS, GRS, ZRS and
RA-GRS
A Z U R E D A T A L A K E S T O R E
• A highly scalable, parallel, file system in the cloud specifically optimized for big data Analytics
 No limits on: data types, number of files, size of
individual files, total amount of data stored, how
long data can be stored or ingestion throughput
 Low latency and high throughput workloads can be
used for ingesting streaming data.
 Is Hadoop-compatible (via WebHDFS REST API).
Supported by leading Hadoop distros and
HDInsight. Backend Storage in Azure
Data Node Data Node Data Node Data Node Data NodeData Node
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rdBlock Block Block Block Block Block
Block 1 Block 2 Block n…
Azure Data Lake Store File
Azure Big Data Processing Pipeline Process
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Azure Databricks
A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W
• Notebooks are a popular way to develop, and run, Spark Applications
 Notebooks are not only for authoring Spark applications but
can be run/executed directly on clusters
• Shift+Enter
•
•
 Notebooks support fine grained permissions—so they can be
securely shared with colleagues for collaboration (see
following slide for details on permissions and abilities)
 Notebooks are well-suited for prototyping, rapid
development, exploration, discovery and iterative
development Notebooks typically consist of code, data, visualization, comments and notes
Big Data Processing Pipeline
Azure
Machine
Learning
SQL
MongoDB
Table API
Turnkey global
distribution
Elastic scale out
of storage & throughput
Guaranteed low latency
at the 99th percentile
Comprehensive
SLAs
Five well-defined
consistency models
Azure Cosmos DB
DocumentColumn-family
Key-value Graph
A globally distributed, massively scalable, multi-model database service
No SQL Decision Tree
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer
• Perform near real-time queries on terabytes of data
• A lightning-fast indexing and querying service for complex analytics.
• Allows you to quickly identify trends, patterns, or anomalies in all
data types inclusive of structured, semi structured and unstructured
data.
Big Data Processing Pipeline
Visualize
Azure
Machine
Learning
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
DEMO
Big Data with Azure
Ad

More Related Content

What's hot (20)

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
Koray Kocabas
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
Data Con LA
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
DataWorks Summit/Hadoop Summit
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
Vincent Terrasi
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
Stéphane Fréchette
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
James Serra
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Mark Rittman
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
James Serra
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
Venkatesh Narayanan
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
Eyal Ben Ivri
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
Data Con LA
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
Vincent Terrasi
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
James Serra
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Mark Rittman
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
James Serra
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
Eyal Ben Ivri
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 

Similar to Big Data with Azure (20)

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
It's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonIt's All About the Data - Tia Dubuisson
It's All About the Data - Tia Dubuisson
Catalina Arango
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
Trivadis
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
James Serra
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Azure Databricks - An Introduction 2019 Roadshow.pptx
Azure Databricks - An Introduction 2019 Roadshow.pptxAzure Databricks - An Introduction 2019 Roadshow.pptx
Azure Databricks - An Introduction 2019 Roadshow.pptx
pascalsegoul
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Azure Data Engineer Course | Azure Data Engineer Trainin
Azure Data Engineer Course | Azure Data Engineer TraininAzure Data Engineer Course | Azure Data Engineer Trainin
Azure Data Engineer Course | Azure Data Engineer Trainin
Accentfuture
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
Microsoft
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
Sergio Zenatti Filho
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
It's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonIt's All About the Data - Tia Dubuisson
It's All About the Data - Tia Dubuisson
Catalina Arango
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
Trivadis
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
James Serra
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Azure Databricks - An Introduction 2019 Roadshow.pptx
Azure Databricks - An Introduction 2019 Roadshow.pptxAzure Databricks - An Introduction 2019 Roadshow.pptx
Azure Databricks - An Introduction 2019 Roadshow.pptx
pascalsegoul
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Azure Data Engineer Course | Azure Data Engineer Trainin
Azure Data Engineer Course | Azure Data Engineer TraininAzure Data Engineer Course | Azure Data Engineer Trainin
Azure Data Engineer Course | Azure Data Engineer Trainin
Accentfuture
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
Microsoft
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
Sergio Zenatti Filho
 
Ad

More from Aaron (Ari) Bornstein (13)

The Importance of Developing Interpretable AI Applications
The Importance of Developing Interpretable AI ApplicationsThe Importance of Developing Interpretable AI Applications
The Importance of Developing Interpretable AI Applications
Aaron (Ari) Bornstein
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
Aaron (Ari) Bornstein
 
Microsoft Breeze CA AI Workshop
Microsoft Breeze CA AI WorkshopMicrosoft Breeze CA AI Workshop
Microsoft Breeze CA AI Workshop
Aaron (Ari) Bornstein
 
What startups need to know about NLP, AI, & ML on the cloud.
What startups need to know about NLP, AI, & ML on the cloud.What startups need to know about NLP, AI, & ML on the cloud.
What startups need to know about NLP, AI, & ML on the cloud.
Aaron (Ari) Bornstein
 
PyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings SlidesPyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings Slides
Aaron (Ari) Bornstein
 
Best practices for DevRel in Israel
Best practices for DevRel in IsraelBest practices for DevRel in Israel
Best practices for DevRel in Israel
Aaron (Ari) Bornstein
 
NLP in the Industry
NLP in the Industry NLP in the Industry
NLP in the Industry
Aaron (Ari) Bornstein
 
Democratizing AI Istanbul Open Source Summit
Democratizing AI Istanbul Open Source SummitDemocratizing AI Istanbul Open Source Summit
Democratizing AI Istanbul Open Source Summit
Aaron (Ari) Bornstein
 
Beyond word embeddings
Beyond word embeddingsBeyond word embeddings
Beyond word embeddings
Aaron (Ari) Bornstein
 
Data Hack 2018 Microsoft Math Teacher Challenge
Data Hack 2018 Microsoft Math Teacher ChallengeData Hack 2018 Microsoft Math Teacher Challenge
Data Hack 2018 Microsoft Math Teacher Challenge
Aaron (Ari) Bornstein
 
A walk through Azure IoT
A walk through Azure IoTA walk through Azure IoT
A walk through Azure IoT
Aaron (Ari) Bornstein
 
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PyconIL 2017  Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...PyconIL 2017  Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
Aaron (Ari) Bornstein
 
DLD TLV Cognitive Services: The Brains Behind Your Bot
DLD TLV Cognitive Services:The Brains Behind Your BotDLD TLV Cognitive Services:The Brains Behind Your Bot
DLD TLV Cognitive Services: The Brains Behind Your Bot
Aaron (Ari) Bornstein
 
The Importance of Developing Interpretable AI Applications
The Importance of Developing Interpretable AI ApplicationsThe Importance of Developing Interpretable AI Applications
The Importance of Developing Interpretable AI Applications
Aaron (Ari) Bornstein
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
Aaron (Ari) Bornstein
 
What startups need to know about NLP, AI, & ML on the cloud.
What startups need to know about NLP, AI, & ML on the cloud.What startups need to know about NLP, AI, & ML on the cloud.
What startups need to know about NLP, AI, & ML on the cloud.
Aaron (Ari) Bornstein
 
PyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings SlidesPyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings Slides
Aaron (Ari) Bornstein
 
Democratizing AI Istanbul Open Source Summit
Democratizing AI Istanbul Open Source SummitDemocratizing AI Istanbul Open Source Summit
Democratizing AI Istanbul Open Source Summit
Aaron (Ari) Bornstein
 
Data Hack 2018 Microsoft Math Teacher Challenge
Data Hack 2018 Microsoft Math Teacher ChallengeData Hack 2018 Microsoft Math Teacher Challenge
Data Hack 2018 Microsoft Math Teacher Challenge
Aaron (Ari) Bornstein
 
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PyconIL 2017  Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...PyconIL 2017  Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
Aaron (Ari) Bornstein
 
DLD TLV Cognitive Services: The Brains Behind Your Bot
DLD TLV Cognitive Services:The Brains Behind Your BotDLD TLV Cognitive Services:The Brains Behind Your Bot
DLD TLV Cognitive Services: The Brains Behind Your Bot
Aaron (Ari) Bornstein
 
Ad

Recently uploaded (20)

AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 

Big Data with Azure

  • 9. What is Big Data?
  • 10. What is Big Data? Big Data = All Data!
  • 11. Big Data = All Data! Unstructured
  • 12. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured
  • 13. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured Semi-Structured
  • 14. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured
  • 15. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured Structured
  • 16. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured Structured CSV, Columnar Storage (Parquet, ORC). Strict data model structure
  • 17. Why is Processing Big Data Challenging ?
  • 18. • Variety: It can be structured, semi-structured, or unstructured Why is Processing Big Data Challenging ?
  • 19. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch Why is Processing Big Data Challenging ?
  • 20. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch • Volume: It can be 1GB or 1PB Why is Processing Big Data Challenging ?
  • 21. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch • Volume: It can be 1GB or 1PB Why is Processing Big Data Challenging ?
  • 25. >80% of Fortune 500 use the Microsoft Cloud
  • 27. Azure Big Data Processing Pipeline Ingest
  • 29. Compose, orchestrate & monitor data services at scale • Fully managed service • Any data on-premises or in the cloud • Single pane of glass management • Global service infrastructure • Cost Effective Azure Data Factory BI & analytics Stored Procedures Hadoop on Azure Data Lake Analytics Custom Code Machine Learning Trusted data
  • 30. Azure Big Data Processing Pipeline Store
  • 31. A Z U R E B L O B S T O R A G E • A highly scalable object storage for unstructured data  Serverless Azure Service.  Can store billions of Images, Videos, Audio, Documents etc.  Automatically scales as more data is uploaded.  Four Replication Options: LRS, GRS, ZRS and RA-GRS
  • 32. A Z U R E D A T A L A K E S T O R E • A highly scalable, parallel, file system in the cloud specifically optimized for big data Analytics  No limits on: data types, number of files, size of individual files, total amount of data stored, how long data can be stored or ingestion throughput  Low latency and high throughput workloads can be used for ingesting streaming data.  Is Hadoop-compatible (via WebHDFS REST API). Supported by leading Hadoop distros and HDInsight. Backend Storage in Azure Data Node Data Node Data Node Data Node Data NodeData Node Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rdBlock Block Block Block Block Block Block 1 Block 2 Block n… Azure Data Lake Store File
  • 33. Azure Big Data Processing Pipeline Process
  • 34. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits Azure Databricks
  • 35. A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W • Notebooks are a popular way to develop, and run, Spark Applications  Notebooks are not only for authoring Spark applications but can be run/executed directly on clusters • Shift+Enter • •  Notebooks support fine grained permissions—so they can be securely shared with colleagues for collaboration (see following slide for details on permissions and abilities)  Notebooks are well-suited for prototyping, rapid development, exploration, discovery and iterative development Notebooks typically consist of code, data, visualization, comments and notes
  • 36. Big Data Processing Pipeline Azure Machine Learning
  • 37. SQL MongoDB Table API Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Azure Cosmos DB DocumentColumn-family Key-value Graph A globally distributed, massively scalable, multi-model database service
  • 39. Azure Data Explorer Kusto (Developed in Israel)
  • 40. Azure Data Explorer Kusto (Developed in Israel)
  • 41. Azure Data Explorer Kusto (Developed in Israel)
  • 42. Azure Data Explorer • Perform near real-time queries on terabytes of data • A lightning-fast indexing and querying service for complex analytics. • Allows you to quickly identify trends, patterns, or anomalies in all data types inclusive of structured, semi structured and unstructured data.
  • 43. Big Data Processing Pipeline Visualize Azure Machine Learning
  • 48. DEMO