SlideShare a Scribd company logo
MapReduce on Azure
UN EXEMPLE CONCRET
Se connecter sur notre instance
 ssh pbury@168.63.111.90
 git clone https://gitlab.com/pbury/m2isf_hadoop.git
 cd m2isf_hadoop
 sudo docker-compose –f docker-compose-local.yml build
 sudo docker-compose –f docker-compose-local.yml up
 sudo docker ps
 sudo docker exec –it resourcemanager bash
 Ca y est, on est dans le container !!
 exit
2
Sans Hadoop
cat /usr/local/src/ascii_5000.txt |
/usr/local/src/mapper.py | sort -k1,1 |
/usr/local/src/reducer.py
3
Hadoop : on prépare le travail
On copie la donnée
hdfs dfs -mkdir /gutemberg
hdfs dfs -put /usr/local/src/ascii_5000.txt /gutemberg
On rentre dans le docker
docker exec –it resourcemanager bash
4
On lance Map Reduce
hadoop jar /opt/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-
2.7.1.jar
-file /usr/local/src/mapper.py
-mapper /usr/local/src/mapper.py
-file /usr/local/src/reducer.py
-reducer /usr/local/src/reducer.py
-input /gutemberg
-output /gutemberg-output5
5
Résultats
hdfs dfs -ls /gutemberg-output5/
Found 2 items
-rw-r--r-- 3 root supergroup 0 2019-01-29 07:24 /gutemberg-output5/_SUCCESS
-rw-r--r-- 3 root supergroup 340155 2019-01-29 07:24 /gutemberg-output5/part-00000
6
Résultats
hdfs dfs –cat gutemberg-output5/part-00000
7
Références
 https://gitlab.com/pbury/m2isf_hadoop
 Créer une clef ssh pour azure :
 https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/virtual-
machines/linux/ssh-from-windows.md
 Powershell : ssh-keygen
8

More Related Content

What's hot (20)

PDF
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Maarten Mulders
 
PDF
Declare your infrastructure: InfraKit, LinuxKit and Moby
Moby Project
 
PDF
The power of streams in node js
Jawahar
 
DOCX
Annette g09 job file for cyclohexene for niobium
Dr Robert Craig PhD
 
PPTX
Boosting your kubectl productivity @ KubeCon 19 NA
Mauricio (Salaboy) Salatino
 
PPT
Qemu - Raspberry | while42 Singapore #2
While42
 
DOCX
Note
Posoffaith1
 
PDF
Highlights of Go 1.1
jgrahamc
 
PDF
CloudBook Pro
Max Shytikov
 
PDF
Script for the geomeetup presentation
Steven Pousty
 
PDF
redis-benchmark with AMD RYZEN 1800X Intel Kaby Lake (i7-7700K) memo
Naoto MATSUMOTO
 
PDF
用 Bitbar Tool 寫 Script 自動擷取外幣
Win Yu
 
DOCX
Formaldehye2 job program
Dr Robert Craig PhD
 
PPTX
Improving go-git performance
source{d}
 
PDF
Linux kernel bug hunting
Andrea Righi
 
PPT
Openshift GeoSpatial Capabilities
Steven Pousty
 
PDF
GoLang & GoatCore
Sebastian Pożoga
 
PDF
Highlights of Go 1.1
Cloudflare
 
PDF
Micro-datacenter chaos monkeys!
stevesloka
 
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Maarten Mulders
 
Declare your infrastructure: InfraKit, LinuxKit and Moby
Moby Project
 
The power of streams in node js
Jawahar
 
Annette g09 job file for cyclohexene for niobium
Dr Robert Craig PhD
 
Boosting your kubectl productivity @ KubeCon 19 NA
Mauricio (Salaboy) Salatino
 
Qemu - Raspberry | while42 Singapore #2
While42
 
Highlights of Go 1.1
jgrahamc
 
CloudBook Pro
Max Shytikov
 
Script for the geomeetup presentation
Steven Pousty
 
redis-benchmark with AMD RYZEN 1800X Intel Kaby Lake (i7-7700K) memo
Naoto MATSUMOTO
 
用 Bitbar Tool 寫 Script 自動擷取外幣
Win Yu
 
Formaldehye2 job program
Dr Robert Craig PhD
 
Improving go-git performance
source{d}
 
Linux kernel bug hunting
Andrea Righi
 
Openshift GeoSpatial Capabilities
Steven Pousty
 
GoLang & GoatCore
Sebastian Pożoga
 
Highlights of Go 1.1
Cloudflare
 
Micro-datacenter chaos monkeys!
stevesloka
 

More from Patrick Bury (20)

PPTX
100 évaluation
Patrick Bury
 
PPTX
16 graph databases
Patrick Bury
 
PPTX
15 map reduce on azure
Patrick Bury
 
PPTX
11 big data aws
Patrick Bury
 
PPTX
14 big data gitlab
Patrick Bury
 
PPTX
13 big data docker
Patrick Bury
 
PPTX
10 big data hadoop
Patrick Bury
 
PPTX
08 big data dataviz
Patrick Bury
 
PPTX
12 big data azure
Patrick Bury
 
PPTX
09 big data mapreduce
Patrick Bury
 
PPTX
07 big data sgbd
Patrick Bury
 
PPTX
06 cloud souverain
Patrick Bury
 
PPTX
05 creation instance ovh
Patrick Bury
 
PPTX
04 big data fournisseurs
Patrick Bury
 
PPTX
03 big data stockage
Patrick Bury
 
PPTX
03 big data échelle
Patrick Bury
 
PPTX
02 big data definition
Patrick Bury
 
PPTX
01 open data
Patrick Bury
 
PPTX
01 big data introduction
Patrick Bury
 
PPTX
16 graph databases
Patrick Bury
 
100 évaluation
Patrick Bury
 
16 graph databases
Patrick Bury
 
15 map reduce on azure
Patrick Bury
 
11 big data aws
Patrick Bury
 
14 big data gitlab
Patrick Bury
 
13 big data docker
Patrick Bury
 
10 big data hadoop
Patrick Bury
 
08 big data dataviz
Patrick Bury
 
12 big data azure
Patrick Bury
 
09 big data mapreduce
Patrick Bury
 
07 big data sgbd
Patrick Bury
 
06 cloud souverain
Patrick Bury
 
05 creation instance ovh
Patrick Bury
 
04 big data fournisseurs
Patrick Bury
 
03 big data stockage
Patrick Bury
 
03 big data échelle
Patrick Bury
 
02 big data definition
Patrick Bury
 
01 open data
Patrick Bury
 
01 big data introduction
Patrick Bury
 
16 graph databases
Patrick Bury
 
Ad

Recently uploaded (20)

PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PPTX
Digital Professionalism and Interpersonal Competence
rutvikgediya1
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
PDF
Tips for Writing the Research Title with Examples
Thelma Villaflores
 
PPTX
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PPTX
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPT
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
PPTX
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
PPTX
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
Digital Professionalism and Interpersonal Competence
rutvikgediya1
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
Tips for Writing the Research Title with Examples
Thelma Villaflores
 
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Virus sequence retrieval from NCBI database
yamunaK13
 
Ad

15 map reduce on azure

  • 1. MapReduce on Azure UN EXEMPLE CONCRET
  • 2. Se connecter sur notre instance  ssh [email protected]  git clone https://gitlab.com/pbury/m2isf_hadoop.git  cd m2isf_hadoop  sudo docker-compose –f docker-compose-local.yml build  sudo docker-compose –f docker-compose-local.yml up  sudo docker ps  sudo docker exec –it resourcemanager bash  Ca y est, on est dans le container !!  exit 2
  • 3. Sans Hadoop cat /usr/local/src/ascii_5000.txt | /usr/local/src/mapper.py | sort -k1,1 | /usr/local/src/reducer.py 3
  • 4. Hadoop : on prépare le travail On copie la donnée hdfs dfs -mkdir /gutemberg hdfs dfs -put /usr/local/src/ascii_5000.txt /gutemberg On rentre dans le docker docker exec –it resourcemanager bash 4
  • 5. On lance Map Reduce hadoop jar /opt/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming- 2.7.1.jar -file /usr/local/src/mapper.py -mapper /usr/local/src/mapper.py -file /usr/local/src/reducer.py -reducer /usr/local/src/reducer.py -input /gutemberg -output /gutemberg-output5 5
  • 6. Résultats hdfs dfs -ls /gutemberg-output5/ Found 2 items -rw-r--r-- 3 root supergroup 0 2019-01-29 07:24 /gutemberg-output5/_SUCCESS -rw-r--r-- 3 root supergroup 340155 2019-01-29 07:24 /gutemberg-output5/part-00000 6
  • 7. Résultats hdfs dfs –cat gutemberg-output5/part-00000 7
  • 8. Références  https://gitlab.com/pbury/m2isf_hadoop  Créer une clef ssh pour azure :  https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/virtual- machines/linux/ssh-from-windows.md  Powershell : ssh-keygen 8