SlideShare a Scribd company logo
Database Choices
Lynn Langit

Jan 2014 – Startup Code Camp in the OC
Data Expertise / Lynn Langit
• Industry awards
– Microsoft – MVP for SQL Server
– Google – GDE for Cloud Platform
– 10Gen – Master for MongoDB

• Practicing Architect
• Technical author / trainer
–
–
–
–

Pluralsight – Google Cloud Series
DevelopMentor – SQL Server 2012 Series
2 books on SQL Server BI
Cloudera trainer (certified)

• Former MSFT FTE
– 4 years
Databases Now
a Menu of
Choices
Data Pipeline

Process All

Acquire
New
Clean
Existing

Store
Some

Query &
Mine
Is Big Data = NoSQL and just Hadoop?
HUGE Hype factor since 2011

Apache Hadoop
• a software framework that supports data-intensive distributed
applications
• under a free license enables applications to work with thousands of
nodes and petabytes of data
• was inspired by Google's MapReduce and Google File System (GFS)
papers
Hadoop in the Enterprise
How you ‘get’ Hadoop
Open source
• roll your own
Commercial distribution
•
•
•
•

Cloudera
MapR
Hortonworks
More…

Rent it via the cloud
• AWS
• HDInsight
Demo – AWS MapReduce
Working with Hadoop
About Hadoop MapReduce

Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
The Hadoop on
premises
Market Leader
Is
Cloudera
Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS

Hadoop / MapReduce

Data Size

Gigabytes (Terabytes)

Petabytes and greater

Access

Interactive and Batch

Batch – NOT Interactive

Updates

Read / Write many times

Write once, Read many times

Structure

Static Schema

Dynamic Schema

Integrity

High (ACID)

Low

Scaling

Nonlinear

Linear

Query
Response
Time

Can be near immediate

Has latency (due to batch
processing)
“Small” BigData vs. “Big” BigData
On Premises

In the Cloud

Hadoop

Hadoop

NoSQL

NoSQL

RDBMS

RDBMS
But wait…
is there a
relational database
that scales
that is cheap
that runs in the cloud?
DEMO - AWS Redshift
• About $1k per Terabyte per year - relational
Cloud-hosted NoSQL up to 50x CHEAPER
So many NoSQL options
• More than just the Elephant in the room
• Over 150+ types of NoSQL databases
Flavors of NoSQL
Key/Value
Volatile

Key/value
Persistent

Wide-Column

Document

Graph
Key / Value Database
• Just keys and values
– No schema

• Persistent or Volatile
• Examples
– AWS Dynamo DB
– Riak
DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud
File (BLOB) Storage Buckets in the Cloud
• Amazon – S3 or Glacier
• Google – Cloud Storage
• Microsoft Azure BLOBS
DEMO - Battle of the Buckets
• Google Cloud Storage VS.
• Windows Azure BLOBS VS.
• AWS S3  (Archiving) in to AWS Glacier
Column Database
• Wide, sparse column sets
• Schema-light

• Examples:
– HBase w/Hadoop
– Google Cloud Datastore
– SQL Server Columnstore Indexes or SSAS Tabular Models
Types of Column Databases
• Column-families
– Non-relational
– Sparse
– Examples:
• HBase
• Cassandra
• xVelocity (SQL 2012 Tabular)

• Column-stores
– Relational
– Dense
– Example:
• SQL Server 2012
– Columnstore index
DEMO – Google Cloud Datastore
DEMO – SQL Server ‘NoSQL’
• SQL Server 2012 Columnstore Index
• SQL Server 2012 Tabular Model (SSAS)
Document Database (Mongo DB)
• document-oriented (collection of
JSON documents) w/semi structured
data
– Encodings include BSON, JSON, XML…

• binary forms
– PDF, Microsoft Office documents -Word, Excel…)

• Examples:
– MongoDB
– Couchbase
Demo - Mongo DB
Graph Databases
• a lot of many-to-many relationships
• recursive self-joins
• when your primary objective is quickly
finding connections, patterns and
relationships between the objects
within lots of data
• Examples:
– Neo4J
– Google Freebase
DEMO – Neo4J
“Small” BigData vs. “Big” BigData
Hadoop

Key/Value or
Column

Document or
Graph

RDBMS

On Premise or
In the Cloud
Cloud-hosted RDBMS
• AWS RDS – SQL Server,
mySQL, Oracle
– Medium cost
– Solid feature set, i.e.
backup, snapshot
– Use existing tooling

• Google – mySQL
– Lowest cost
– Most limited RDBMS
functionality

• Microsoft – SQLAzure
– Highest cost
DEMO - AWS RDS
• SQL Server, MySQL or Oracle
• Essential to understand pricing models
Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
Document
MongoDB

Graph
Neo4j

RDBMS
SQL Server

Line-of-Business

DynamoDB

Social aggregators

Key/Value

Social Games

HBase

Product Catalogs

Columnstore

Log Files

NoSQL Applied
Cloud Offerings– RDBMS AND NoSQL
AWS

Google

Microsoft

RDBMS

RDS – all major

mySQL

SQL Azure

NoSQL buckets

S3 or Glacier

Cloud Storage

Azure Blobs

NoSQL Key-Value

DynamoDB

Cloud Datastore

Azure Tables

Streaming ML or
(Mahout)

Custom EC2

Prospective Search
&
Prediction API

StreamInsight

NoSQL Document or MongoDB on EC2
Graph

Freebase

MongoDB on
Windows Azure

NoSQL – Column
Hadoop (HBase)

Elastic MapReduce
using S3 & EC2

none

HDInsight

Dremel/Warehousi
ng

RedShift

BigQuery

none
But wait…
how do I query
NoSQL data?
Always MapReduce?
Can Excel help?
Connector to
Hadoop

Data Explorer

Data Quality
Services

Master Data
Services

Integration
with Azure
Data Market

Visualize with
PowerView

Data Mining
w/Predixion
Demo - Hadoop Connector to Excel
Other types of cloud data services
Hosting public datasets
• Pay to read
• Earn revenue by offering for
read

Cleaning / matching
(your) data
• ETL – Microsoft Data
Explorer, Google Refine
• Data Quality – Windows
Azure Data Market,
InfoChimps, DataMarket.com
Collecting for “BigData”
• Sensors everywhere
• Structured, Semi-structured, Unstructured vs. Data
Standards
• M2M
• Public Datasets
– Freebase
– Azure DataMarket
– Hillary Mason’s list

42
NoSQL To-Do List
Understand types of NoSQL databases
• Use NoSQL when business needs designate
• Use the right type of NoSQL for your business problem

Try out NoSQL on the cloud
• Quick and cheap for behavioral data
• Mashup cloud datasets
• Good for specialized use cases, i.e. dev, test , training environments

Learn NoSQL access technologies & services
• New query languages, i.e. MapReduce, R, Infer.NET
• New query tools (vendor-specific) – Google Refine, Amazon
Karmasphere, Microsoft Excel connectors, etc…
• Windows Azure Data Market, other public data markets
• recipes)

www.TeachingKidsProgramming.org
•
•

Free Courseware (Java, Small Basic or C# [on Pluralsight])
Do a Recipe  Teach a Kid (Ages 10 ++)
Keep Learning
• Twitter: @LynnLangit
• YouTube:
http://www.youtube.com/user/SoCalDevGal

• Hire me
– To help build your BI/Big Data solution
– To teach your team next gen BI
– To learn more about using NoSQL
solutions
Ad

More Related Content

What's hot (20)

Using Premium Data - for Business Analysts
Using Premium Data - for Business AnalystsUsing Premium Data - for Business Analysts
Using Premium Data - for Business Analysts
Lynn Langit
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
Tom Kerkhove
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
Gary Stafford
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
Michael Rys
 
Clinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySparkClinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySpark
Databricks
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
Nascenia IT
 
How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...
Databricks
 
R in Power BI
R in Power BIR in Power BI
R in Power BI
Eric Bragas
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
Grant Fritchey
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on Databricks
Databricks
 
Introducing Azure Databases
Introducing Azure DatabasesIntroducing Azure Databases
Introducing Azure Databases
Grant Fritchey
 
MLflow on and inside Azure
MLflow on and inside AzureMLflow on and inside Azure
MLflow on and inside Azure
Databricks
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Eric Bragas
 
BTUG - Dec 2014 - Hybrid Connectivity Options
BTUG - Dec 2014 - Hybrid Connectivity OptionsBTUG - Dec 2014 - Hybrid Connectivity Options
BTUG - Dec 2014 - Hybrid Connectivity Options
Michael Stephenson
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
BizTalk360
 
Marketing vs Technology
Marketing vs TechnologyMarketing vs Technology
Marketing vs Technology
Nguyen Ngoc Hoai Aan
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
Mark Kromer
 
Using Premium Data - for Business Analysts
Using Premium Data - for Business AnalystsUsing Premium Data - for Business Analysts
Using Premium Data - for Business Analysts
Lynn Langit
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
Tom Kerkhove
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
Gary Stafford
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
Michael Rys
 
Clinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySparkClinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySpark
Databricks
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
Nascenia IT
 
How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...
Databricks
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
Grant Fritchey
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on Databricks
Databricks
 
Introducing Azure Databases
Introducing Azure DatabasesIntroducing Azure Databases
Introducing Azure Databases
Grant Fritchey
 
MLflow on and inside Azure
MLflow on and inside AzureMLflow on and inside Azure
MLflow on and inside Azure
Databricks
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Eric Bragas
 
BTUG - Dec 2014 - Hybrid Connectivity Options
BTUG - Dec 2014 - Hybrid Connectivity OptionsBTUG - Dec 2014 - Hybrid Connectivity Options
BTUG - Dec 2014 - Hybrid Connectivity Options
Michael Stephenson
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
BizTalk360
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
Mark Kromer
 

Viewers also liked (17)

Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Asis Mohanty
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
Csaba Toth
 
Spark overview
Spark overviewSpark overview
Spark overview
Lisa Hua
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Cloudera, Inc.
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with Hadoop
Sagar Jauhari
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Edureka!
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
Steve Min
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
Kenneth Peeples
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data Virtualization
Kenneth Peeples
 
NoSQL 간단한 소개
NoSQL 간단한 소개NoSQL 간단한 소개
NoSQL 간단한 소개
Wonchang Song
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
DataWorks Summit
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
NoSQL 모델링
NoSQL 모델링NoSQL 모델링
NoSQL 모델링
Hoyong Lee
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
NoSQL Guide & Sample
NoSQL Guide &  SampleNoSQL Guide &  Sample
NoSQL Guide & Sample
Sangon Lee
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Asis Mohanty
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
Csaba Toth
 
Spark overview
Spark overviewSpark overview
Spark overview
Lisa Hua
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Cloudera, Inc.
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with Hadoop
Sagar Jauhari
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Edureka!
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
Steve Min
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
Kenneth Peeples
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data Virtualization
Kenneth Peeples
 
NoSQL 간단한 소개
NoSQL 간단한 소개NoSQL 간단한 소개
NoSQL 간단한 소개
Wonchang Song
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
DataWorks Summit
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
NoSQL 모델링
NoSQL 모델링NoSQL 모델링
NoSQL 모델링
Hoyong Lee
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
NoSQL Guide & Sample
NoSQL Guide &  SampleNoSQL Guide &  Sample
NoSQL Guide & Sample
Sangon Lee
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Ad

Similar to Not only SQL - Database Choices (20)

AWS for the SQL Server Pro
AWS for the SQL Server ProAWS for the SQL Server Pro
AWS for the SQL Server Pro
Lynn Langit
 
AWS for Big Data Experts
AWS for Big Data ExpertsAWS for Big Data Experts
AWS for Big Data Experts
Lynn Langit
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
Lynn Langit
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
Lynn Langit
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
Jesus Rodriguez
 
NoSQL: An Analysis
NoSQL: An AnalysisNoSQL: An Analysis
NoSQL: An Analysis
Andrew Brust
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
PolarSeven Pty Ltd
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
CCG
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
Naoki (Neo) SATO
 
AWS Certified Cloud Practitioner Course S11-S17
AWS Certified Cloud Practitioner Course S11-S17AWS Certified Cloud Practitioner Course S11-S17
AWS Certified Cloud Practitioner Course S11-S17
Neal Davis
 
NoSQL
NoSQLNoSQL
NoSQL
dbulic
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Travis Wright
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
balwinders
 
Ecosistema MLOps: Gestión y Despliegue de Modelos
Ecosistema MLOps: Gestión y Despliegue de ModelosEcosistema MLOps: Gestión y Despliegue de Modelos
Ecosistema MLOps: Gestión y Despliegue de Modelos
ManuelGarcia99558
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
AWS for the SQL Server Pro
AWS for the SQL Server ProAWS for the SQL Server Pro
AWS for the SQL Server Pro
Lynn Langit
 
AWS for Big Data Experts
AWS for Big Data ExpertsAWS for Big Data Experts
AWS for Big Data Experts
Lynn Langit
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
Lynn Langit
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
Lynn Langit
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
Jesus Rodriguez
 
NoSQL: An Analysis
NoSQL: An AnalysisNoSQL: An Analysis
NoSQL: An Analysis
Andrew Brust
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
CCG
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
Naoki (Neo) SATO
 
AWS Certified Cloud Practitioner Course S11-S17
AWS Certified Cloud Practitioner Course S11-S17AWS Certified Cloud Practitioner Course S11-S17
AWS Certified Cloud Practitioner Course S11-S17
Neal Davis
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Travis Wright
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
balwinders
 
Ecosistema MLOps: Gestión y Despliegue de Modelos
Ecosistema MLOps: Gestión y Despliegue de ModelosEcosistema MLOps: Gestión y Despliegue de Modelos
Ecosistema MLOps: Gestión y Despliegue de Modelos
ManuelGarcia99558
 
Ad

More from Lynn Langit (20)

VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
Lynn Langit
 
Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless Architectures
Lynn Langit
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming
Lynn Langit
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on Docker
Lynn Langit
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina Language
Lynn Langit
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa Skills
Lynn Langit
 
Practical cloud
Practical cloudPractical cloud
Practical cloud
Lynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examples
Lynn Langit
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
Lynn Langit
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids Programming
Lynn Langit
 
Practical Cloud
Practical CloudPractical Cloud
Practical Cloud
Lynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
Lynn Langit
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
Lynn Langit
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
Lynn Langit
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWS
Lynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
Lynn Langit
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
Lynn Langit
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
Lynn Langit
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformSQL Server on Google Cloud Platform
SQL Server on Google Cloud Platform
Lynn Langit
 
VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
Lynn Langit
 
Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless Architectures
Lynn Langit
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming
Lynn Langit
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on Docker
Lynn Langit
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina Language
Lynn Langit
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa Skills
Lynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examples
Lynn Langit
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
Lynn Langit
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids Programming
Lynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
Lynn Langit
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
Lynn Langit
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
Lynn Langit
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWS
Lynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
Lynn Langit
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
Lynn Langit
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
Lynn Langit
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformSQL Server on Google Cloud Platform
SQL Server on Google Cloud Platform
Lynn Langit
 

Recently uploaded (20)

AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 

Not only SQL - Database Choices

  • 1. Database Choices Lynn Langit Jan 2014 – Startup Code Camp in the OC
  • 2. Data Expertise / Lynn Langit • Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB • Practicing Architect • Technical author / trainer – – – – Pluralsight – Google Cloud Series DevelopMentor – SQL Server 2012 Series 2 books on SQL Server BI Cloudera trainer (certified) • Former MSFT FTE – 4 years
  • 3. Databases Now a Menu of Choices
  • 5. Is Big Data = NoSQL and just Hadoop? HUGE Hype factor since 2011 Apache Hadoop • a software framework that supports data-intensive distributed applications • under a free license enables applications to work with thousands of nodes and petabytes of data • was inspired by Google's MapReduce and Google File System (GFS) papers
  • 6. Hadoop in the Enterprise
  • 7. How you ‘get’ Hadoop Open source • roll your own Commercial distribution • • • • Cloudera MapR Hortonworks More… Rent it via the cloud • AWS • HDInsight
  • 8. Demo – AWS MapReduce
  • 10. About Hadoop MapReduce Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
  • 11. The Hadoop on premises Market Leader Is Cloudera
  • 12. Example Comparison: RDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes and greater Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
  • 13. “Small” BigData vs. “Big” BigData On Premises In the Cloud Hadoop Hadoop NoSQL NoSQL RDBMS RDBMS
  • 14. But wait… is there a relational database that scales that is cheap that runs in the cloud?
  • 15. DEMO - AWS Redshift • About $1k per Terabyte per year - relational
  • 16. Cloud-hosted NoSQL up to 50x CHEAPER
  • 17. So many NoSQL options • More than just the Elephant in the room • Over 150+ types of NoSQL databases
  • 19. Key / Value Database • Just keys and values – No schema • Persistent or Volatile • Examples – AWS Dynamo DB – Riak
  • 20. DEMO - AWS DynamoDB • Key/Value store on the AWS cloud
  • 21. File (BLOB) Storage Buckets in the Cloud • Amazon – S3 or Glacier • Google – Cloud Storage • Microsoft Azure BLOBS
  • 22. DEMO - Battle of the Buckets • Google Cloud Storage VS. • Windows Azure BLOBS VS. • AWS S3  (Archiving) in to AWS Glacier
  • 23. Column Database • Wide, sparse column sets • Schema-light • Examples: – HBase w/Hadoop – Google Cloud Datastore – SQL Server Columnstore Indexes or SSAS Tabular Models
  • 24. Types of Column Databases • Column-families – Non-relational – Sparse – Examples: • HBase • Cassandra • xVelocity (SQL 2012 Tabular) • Column-stores – Relational – Dense – Example: • SQL Server 2012 – Columnstore index
  • 25. DEMO – Google Cloud Datastore
  • 26. DEMO – SQL Server ‘NoSQL’ • SQL Server 2012 Columnstore Index • SQL Server 2012 Tabular Model (SSAS)
  • 27. Document Database (Mongo DB) • document-oriented (collection of JSON documents) w/semi structured data – Encodings include BSON, JSON, XML… • binary forms – PDF, Microsoft Office documents -Word, Excel…) • Examples: – MongoDB – Couchbase
  • 29. Graph Databases • a lot of many-to-many relationships • recursive self-joins • when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data • Examples: – Neo4J – Google Freebase
  • 31. “Small” BigData vs. “Big” BigData Hadoop Key/Value or Column Document or Graph RDBMS On Premise or In the Cloud
  • 32. Cloud-hosted RDBMS • AWS RDS – SQL Server, mySQL, Oracle – Medium cost – Solid feature set, i.e. backup, snapshot – Use existing tooling • Google – mySQL – Lowest cost – Most limited RDBMS functionality • Microsoft – SQLAzure – Highest cost
  • 33. DEMO - AWS RDS • SQL Server, MySQL or Oracle • Essential to understand pricing models
  • 36. Cloud Offerings– RDBMS AND NoSQL AWS Google Microsoft RDBMS RDS – all major mySQL SQL Azure NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables Streaming ML or (Mahout) Custom EC2 Prospective Search & Prediction API StreamInsight NoSQL Document or MongoDB on EC2 Graph Freebase MongoDB on Windows Azure NoSQL – Column Hadoop (HBase) Elastic MapReduce using S3 & EC2 none HDInsight Dremel/Warehousi ng RedShift BigQuery none
  • 37. But wait… how do I query NoSQL data?
  • 39. Can Excel help? Connector to Hadoop Data Explorer Data Quality Services Master Data Services Integration with Azure Data Market Visualize with PowerView Data Mining w/Predixion
  • 40. Demo - Hadoop Connector to Excel
  • 41. Other types of cloud data services Hosting public datasets • Pay to read • Earn revenue by offering for read Cleaning / matching (your) data • ETL – Microsoft Data Explorer, Google Refine • Data Quality – Windows Azure Data Market, InfoChimps, DataMarket.com
  • 42. Collecting for “BigData” • Sensors everywhere • Structured, Semi-structured, Unstructured vs. Data Standards • M2M • Public Datasets – Freebase – Azure DataMarket – Hillary Mason’s list 42
  • 43. NoSQL To-Do List Understand types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problem Try out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environments Learn NoSQL access technologies & services • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc… • Windows Azure Data Market, other public data markets
  • 44. • recipes) www.TeachingKidsProgramming.org • • Free Courseware (Java, Small Basic or C# [on Pluralsight]) Do a Recipe  Teach a Kid (Ages 10 ++)
  • 45. Keep Learning • Twitter: @LynnLangit • YouTube: http://www.youtube.com/user/SoCalDevGal • Hire me – To help build your BI/Big Data solution – To teach your team next gen BI – To learn more about using NoSQL solutions

Editor's Notes

  • #4: http://pragprog.com/book/rwdata/seven-databases-in-seven-weeks
  • #6: http://hadoop.apache.org/http://en.wikipedia.org/wiki/Apache_Hadoop
  • #7: Hadoop on Azure -- http://msdn.microsoft.com/en-us/magazine/jj190805.aspxhttp://www.oracle.com/technetwork/bdc/hadoop-loader/overview/index.htmlhttp://www.microsoft.com/download/en/details.aspx?id=27584
  • #10: http://hortonworks.com/technology/hortonworksdataplatform/More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”http://www.cloudera.com/
  • #11: http://hortonworks.com/technology/hortonworksdataplatform/More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”http://www.cloudera.com/
  • #12: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
  • #13: Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • #17: http://lynnlangit.wordpress.com/2011/11/09/relational-cloud-storage-is-50x-more-expensive-than-nosql/
  • #18: http://nosql-database.org/http://hadoop.apache.org/ & http://www.mongodb.org/Wikipedia - http://en.wikipedia.org/wiki/NoSQLList of noSQL databases – http://nosql-database.org/The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
  • #19: http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  • #20: http://en.wikipedia.org/wiki/Project_Voldemorthttp://aws.amazon.com/http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
  • #22: http://code.google.comAccess via REST APIsVery Cheap, but not much functionality includedLots of code to write for application developmentBut…can be a good backup solution
  • #24: http://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.htmlhttp://stage.hypertable.com/index.php/documentation/architecture/http://code.google.com/appengine/http://code.google.com/appengine/articles/datastore/overview.html
  • #25: http://cwebbbi.wordpress.com/2012/02/14/so-what-is-the-bi-semantic-model/http://www.databasejournal.com/features/mssql/understanding-new-column-store-index-of-sql-server-2012.htmlhttp://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.htmlhttp://ayende.com/blog/4500/that-no-sql-thing-column-family-databases
  • #26: https://developers.google.com/datastore/docs/concepts/overviewhttp://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.html
  • #28: http://en.wikipedia.org/wiki/MongoDBhttp://www.mongodb.org/downloadshttp://www.mongodb.org/display/DOCS/Drivers
  • #29: http://en.wikipedia.org/wiki/MongoDB & http://try.mongodb.org/http://www.mongodb.org/downloadshttp://www.mongodb.org/display/DOCS/Drivers
  • #30: http://www.infinitegraph.com/what-is-a-graph-database.html and http://www.neo4j.org/http://en.wikipedia.org/wiki/Graph_databasehttp://www.freebase.com/
  • #31: http://www.neo4j.org/learn/try
  • #33: For Google - http://code.google.comFor AWS - https://console.aws.amazon.com/console/home
  • #37: Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  • #39: http://rickosborne.org/download/SQL-to-MongoDB.pdf
  • #41: http://www.microsoft.com/en-us/bi/default.aspxhttp://dennyglee.com/Demos -   http://www.youtube.com/watch?v=djfpPsGwm6Aand http://www.youtube.com/watch?v=uh9bKWO1K7U
  • #42: DataMarkets – InfoChimps, Factual, DataMarket, Windows Azure Data Marketplace, Wolfram Alpha, Datasifthttp://www.microsoft.com/en-us/sqlazurelabs/default.aspx andhttp://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspxhttps://datamarket.azure.com/http://www.freebase.com/http://code.google.com/p/google-refine/
  • #43: http://www.inboundlogistics.com/cms/article/m2m-101/http://www.freebase.com/Hilary Mason’s datasets - https://bitly.com/bundles/hmason/1
  • #45: Lynn