SlideShare a Scribd company logo
INTRODUCTION TO
    HADOOP



  Cindy Gross | @SQLCindy | SQLCAT PM
    http://blogs.msdn.com/cindygross
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
SELECT
deviceplatform, state, c
ountry
FROM hivesampletable
LIMIT 200;
Sqoop
to/from
relational
Ha
do
op
Not fully pre-
structured
Hadoop Ecosystem
Snapshot
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
Open
Source
Apache
Hadoop
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
Introduction to Microsoft Hadoop
Hadoop: The Definitive Guide by Tom White
SQL Server Sqoop http://bit.ly/rulsjX
JavaScript http://bit.ly/wdaTv6
Twitter https://twitter.com/#!/search/%23bigdata

Hive http://hive.apache.org
Excel to Hadoop via Hive ODBC http://tinyurl.com/7c4qjjj
Hadoop On Azure Videos http://tinyurl.com/6munnx2
Klout http://tinyurl.com/6qu9php
Microsoft Big Data http://microsoft.com/bigdata
Denny Lee http://dennyglee.com/category/bigdata/
Carl Nolan http://tinyurl.com/6wbfxy9
Cindy Gross http://tinyurl.com/SmallBitesBigData
@SQLCindy / @SQLCATWoman
http://blogs.msdn.com/cindygross
INTRODUCTION TO
    HADOOP



  Cindy Gross | @SQLCindy | SQLCAT PM
    http://blogs.msdn.com/cindygross

More Related Content

Viewers also liked (20)

Hadoop 제주대
Hadoop 제주대Hadoop 제주대
Hadoop 제주대
DaeHeon Oh
 
하둡완벽가이드 Ch9
하둡완벽가이드 Ch9하둡완벽가이드 Ch9
하둡완벽가이드 Ch9
HyeonSeok Choi
 
Hdfs
HdfsHdfs
Hdfs
Mungyu Choi
 
하둡완벽가이드 Ch6. 맵리듀스 작동 방법
하둡완벽가이드 Ch6. 맵리듀스 작동 방법하둡완벽가이드 Ch6. 맵리듀스 작동 방법
하둡완벽가이드 Ch6. 맵리듀스 작동 방법
HyeonSeok Choi
 
hadoop ch1
hadoop ch1hadoop ch1
hadoop ch1
Mungyu Choi
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
Adam Kawa
 
빅데이터, big data
빅데이터, big data빅데이터, big data
빅데이터, big data
H K Yoon
 
about hadoop yes
about hadoop yesabout hadoop yes
about hadoop yes
Eunsil Yoon
 
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Matthew (정재화)
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기
beom kyun choi
 
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)Hadoop Introduction (1.0)
Hadoop Introduction (1.0)
Keeyong Han
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
Danairat Thanabodithammachari
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
Hanborq Inc.
 
Cluster - spark
Cluster - sparkCluster - spark
Cluster - spark
HyeonSeok Choi
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk
 
learning spark - Chatper8. Tuning and Debugging
learning spark - Chatper8. Tuning and Debugginglearning spark - Chatper8. Tuning and Debugging
learning spark - Chatper8. Tuning and Debugging
Mungyu Choi
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
Hortonworks
 
Hadoop 제주대
Hadoop 제주대Hadoop 제주대
Hadoop 제주대
DaeHeon Oh
 
하둡완벽가이드 Ch9
하둡완벽가이드 Ch9하둡완벽가이드 Ch9
하둡완벽가이드 Ch9
HyeonSeok Choi
 
하둡완벽가이드 Ch6. 맵리듀스 작동 방법
하둡완벽가이드 Ch6. 맵리듀스 작동 방법하둡완벽가이드 Ch6. 맵리듀스 작동 방법
하둡완벽가이드 Ch6. 맵리듀스 작동 방법
HyeonSeok Choi
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
Adam Kawa
 
빅데이터, big data
빅데이터, big data빅데이터, big data
빅데이터, big data
H K Yoon
 
about hadoop yes
about hadoop yesabout hadoop yes
about hadoop yes
Eunsil Yoon
 
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Matthew (정재화)
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기
beom kyun choi
 
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)Hadoop Introduction (1.0)
Hadoop Introduction (1.0)
Keeyong Han
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
Danairat Thanabodithammachari
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
Hanborq Inc.
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk
 
learning spark - Chatper8. Tuning and Debugging
learning spark - Chatper8. Tuning and Debugginglearning spark - Chatper8. Tuning and Debugging
learning spark - Chatper8. Tuning and Debugging
Mungyu Choi
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
Hortonworks
 

Similar to Introduction to Microsoft Hadoop (20)

HADOOP ONLINE TRAINING
HADOOP ONLINE TRAININGHADOOP ONLINE TRAINING
HADOOP ONLINE TRAINING
Santhosh Sap
 
HADOOP ONLINE TRAINING
HADOOP ONLINE TRAININGHADOOP ONLINE TRAINING
HADOOP ONLINE TRAINING
training3
 
Getting your Big Data on with HDInsight
Getting your Big Data on with HDInsightGetting your Big Data on with HDInsight
Getting your Big Data on with HDInsight
Simon Elliston Ball
 
CloudOps CloudStack Days, Austin April 2015
CloudOps CloudStack Days, Austin April 2015CloudOps CloudStack Days, Austin April 2015
CloudOps CloudStack Days, Austin April 2015
CloudOps2005
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshots
Kalyan Hadoop
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
Nathan Halko
 
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
Ravi Patel
 
Big Data in the Cloud - Montreal April 2015
Big Data in the Cloud - Montreal April 2015Big Data in the Cloud - Montreal April 2015
Big Data in the Cloud - Montreal April 2015
Cindy Gross
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Pentaho
 
HP Helion European Webinar Series ,Webinar #3
HP Helion European Webinar Series ,Webinar #3 HP Helion European Webinar Series ,Webinar #3
HP Helion European Webinar Series ,Webinar #3
BeMyApp
 
2016 05-cloudsoft-amp-and-brooklyn-new
2016 05-cloudsoft-amp-and-brooklyn-new2016 05-cloudsoft-amp-and-brooklyn-new
2016 05-cloudsoft-amp-and-brooklyn-new
BradDesAulniers2
 
Practical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectPractical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified Architect
Kamal A
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
Hadoop online training
 
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityZeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Jakub Kałużny
 
Effective DevOps by using Docker and Chef together !
Effective DevOps by using Docker and Chef together !Effective DevOps by using Docker and Chef together !
Effective DevOps by using Docker and Chef together !
WhiteHedge Technologies Inc.
 
Big problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces securityBig problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces security
SecuRing
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
Vskills
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
 
Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintro
Doug Chang
 
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a TreeMongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
MongoDB
 
HADOOP ONLINE TRAINING
HADOOP ONLINE TRAININGHADOOP ONLINE TRAINING
HADOOP ONLINE TRAINING
Santhosh Sap
 
HADOOP ONLINE TRAINING
HADOOP ONLINE TRAININGHADOOP ONLINE TRAINING
HADOOP ONLINE TRAINING
training3
 
Getting your Big Data on with HDInsight
Getting your Big Data on with HDInsightGetting your Big Data on with HDInsight
Getting your Big Data on with HDInsight
Simon Elliston Ball
 
CloudOps CloudStack Days, Austin April 2015
CloudOps CloudStack Days, Austin April 2015CloudOps CloudStack Days, Austin April 2015
CloudOps CloudStack Days, Austin April 2015
CloudOps2005
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshots
Kalyan Hadoop
 
Big Data in the Cloud - Montreal April 2015
Big Data in the Cloud - Montreal April 2015Big Data in the Cloud - Montreal April 2015
Big Data in the Cloud - Montreal April 2015
Cindy Gross
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Pentaho
 
HP Helion European Webinar Series ,Webinar #3
HP Helion European Webinar Series ,Webinar #3 HP Helion European Webinar Series ,Webinar #3
HP Helion European Webinar Series ,Webinar #3
BeMyApp
 
2016 05-cloudsoft-amp-and-brooklyn-new
2016 05-cloudsoft-amp-and-brooklyn-new2016 05-cloudsoft-amp-and-brooklyn-new
2016 05-cloudsoft-amp-and-brooklyn-new
BradDesAulniers2
 
Practical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectPractical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified Architect
Kamal A
 
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityZeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Jakub Kałużny
 
Effective DevOps by using Docker and Chef together !
Effective DevOps by using Docker and Chef together !Effective DevOps by using Docker and Chef together !
Effective DevOps by using Docker and Chef together !
WhiteHedge Technologies Inc.
 
Big problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces securityBig problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces security
SecuRing
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
Vskills
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
 
Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintro
Doug Chang
 
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a TreeMongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
MongoDB
 

Recently uploaded (20)

2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 

Introduction to Microsoft Hadoop

Editor's Notes

  • #4: Hadoop is part of NOSQL (Not Only SQL) and it’s a bit wild. You explore in/with Hadoop. You learn new things. You test hypotheses on unstructured jungle data. You eliminate noise.Then you take the best learnings and share them with the world via a relational or multidimensional database.Atomicity, consistency, isolation, durability (ACID) is used in relational databases to ensure immediate consistency. But what if eventual consistency is good enough? In stomps BASE – Basically available, soft state, eventual consistencyScale up or scale out?Pay up front or pay as you go?Which IT skills do you utilize?
  • #5: Hive is a database that sits on top of Hadoop. HiveQL (HQL) generates (possibly multiple) MapReduce programs to execute the joins, filters, aggregates, etc. The language is very SQL-like, perhaps closer to MySQL but still very familiar.
  • #6: Get your data from anywhere. There’s a data explosion and we can now use more of it than ever before. The HadoopOnAzure.com portal provides an easy interface to pull in data from sources including secure FTP, Amazon S3, Azure blob store, Azure Data Market. Use Sqoop to move data between Hadoop and SQL Server, PDW, SQL Azure. The Hive ODBC driver lets you display Hive data in Excel or apps.
  • #7: Many equate big data to MapReduce and in particular Hadoop. However, other applications like streaming, machine learning, and PDW type systems can also be described as big data solutions. Big Data is unstructured, flows fast, has many formats, and/or has quickly changing formats. How big is “big” really depends on what is too big/complex for your environment (hardware, people, software, processes). It’s done by scaling out on commodity (low end enterprise level) hardware.
  • #8: Big data solutions are comprised of matching the right set of tools to the right set of problems (architectures are compositional, not monolithic)Need to select appropriate combinations of storage, analytics and consumers.
  • #9: For demo steps see: http://blogs.msdn.com/b/cindygross/archive/2012/05/07/load-data-from-the-azure-datamarket-to-hadoop-on-azure-small-bites-of-big-data.aspx
  • #11: Big data is often described as problems that have one or more of the 3 (or 4) Vs – volume, velocity, variety, variability. Think about big data when you describe a problem with terms like tame the chaos, reduce the complexity, explore, I don’t know what I don’t know, unknown unknowns, unstructured, changing quickly, too much for what my environment can handle now, unused data.Volume = more data than the current environment can handle with vertical scaling, need to make sure of data that it is currently too expensive to useVelocity = Small decision window compared to data change rate, ask how quickly you need to analyze and how quickly data arrivesVariety = many different formats that are expensive to integrate, probably from many data sources/feedsVariability = many possible interpretations of the data
  • #12: It’s not the hammer for every problem and it’s not the answer to every large store of data. It does not replace relational or multi-dimensional dbs, it’s a solution to a different sort of problem. It’s a new, specialized type of db for certain scenarios. It will feed other types of dbs.
  • #13: Microsoft takes what is already there, makes it run on Windows, and offers the option of full control or simplificationHadoop in the cloud simplifies managementHadoop on Windows lets you reuse existing skillsJavaScript opens up more hiring optionsHive ODBC Driver / Excel Addin lets you combine data, move dataSqoop moves data – Linux based version to/from SQL available now, Windows based soon
  • #14: Demo2 –Mashup1)      Hive Panea.       Excel, blank worksheet, datab.      Use your HadoopOnAzure clusterc.      Object = Gender2007 or whatever table you pre-loaded in Hive (select * from gender2007 limit 200)d.      KEY POINT = pulled data from multiple files across many nodes and displayed via ODBC is user friendly format – not easy in Hadoop world2)      PowerPivota.       KEY POINTS = uses local memory, pulls data from multiple data sources (structured and unstructured), can be stored/scheduled in Sharepoint, creates relationships to add value -- MASHUPb.      Excel file DeviceAnalysisByRegion.xlsx (worksheet with region/country data, relationship defined between Gender2007 country and this country data), click on PowerPivot tab and open blank tabc.       Click on PowerPivot Window – show each tab is data from a different source – hivesampletable (Hadoop/unstructured) and regions (could be anything/structured)d.      Click on diagram view – show relationships, rich valuee.      Pivot table.pivotchart.newf.        Close hive query windowg.       Values = count of platform, axis=platform, zoom to selectionh.      Slicers Vertical = regions hierarchyi.         Region = North America, country = Canada == Windows Phone jokesj.        KEY Load to Sharepoint, schedule refreshes, use for Power View
  • #15: Expand your audience of decision makers by making BI easier with self-service, visualizationOur products interact and work together + one company for questions/issuesUse existing hardware, employeesExpand options for hiring/training/re-training with familiar tools Familiar tools = less rampup timeCloud = elasticity, easy scale up/down, pay for what you useEasier to move data to/from HDFS
  • #16: It’s about separating the signal from the noise so you have insight to make decisions to take action. Discover, explore, gain insight.
  • #17: Familiar tools, new tools, ease of use
  • #18: Take action! All the exploring doesn’t help if you don’t do something! Something might be starting another round of exploring, but eventually DO SOMETHING!