SlideShare a Scribd company logo
Increase computational power
with distributed processing
Neil Stein 03 Nov 2012
Distributed processing
A Discussion Example……..
Getting the data, and ordering it as needed…..
Familiar with grep and sort?

—  “grep” extracts all the matching lines
—  “sort” sorts all the lines
grep “some_record_parameters” hl7_transfer.data-file | sort
[2012/02/25/ 9:15] records sent to healthcare-1
[2012/02/28/ 6:15] records sent to healthcare-2
[2012/03/12/ 10:30] records sent to healthcare-3
A Discussion Example……..
—  As the amount of data increases, process requires more and
more resources

—  What if hl7_transfor.data-file is 500GB or bigger?
—  What if there are hundreds or thousands of data files?
—  What if there are multiple types of data files?
grep “provider 1” hl7_transfor.data-file | sort

—  Ignoring the process for a moment, how do we write all the data to
disk in the first place?

Need to rethink the process
Distributed processing
Distributed File-System – “the cloud”
—  Files can be stored across many machines
—  Files can be replicated across many machines
—  Files can be in a hyrbid-cloud model
—  Share the file-system transparently
—  You simply see the usual file structure
—  Opportunity to leverage private and public cloud environments
Distributed processing
Map-Reduce – the cloud
—  A way of processing large amounts of data across many machines
—  Must be able to split-up the data in chunks for processing, (Map)
—  Recombined after processing (Reduce)
—  Requires a constant flow of data from one simple state to another
—  Allows for a simple way of breaking down a large task into smaller
manageable tasks

—  Increase the available computational power
A look at Hadoop
What is Hadoop
—  A Map-Reduce framework
—  Designed to run applications on clusters of
local and remote systems

—  HDFS
—  The file system of Hadoop (Hadoop Distributed
File System)
—  Designed to access clusters of local and
remote systems
Putting the pieces together….
First, we need some code……
Map

Reduce
Map

Hadoop streams information on STDIN
Separate value with a newline (for Hadoop)
Reduce

Hadoop streams back to us on STDIN
Output the aggregated records
Sanity Checking
Command

Results
This should work with small data-sets
Push file to “the distributed file system”

Put file on the DFS

Check that the file is in the cloud
Running in “the distributed environment”

Call the Hadoop streaming command
Pass the appropriate parameters
Running in “the distributed environment”
Running in “the distributed environment”
Running in “the distributed environment”
Running in “the distributed environment”
Checking Status
—  Cluster Summary
—  Running Jobs
—  Completed Jobs
—  Failed Jobs
—  Job Statistics
—  Detailed Job Logs
Checking Distributed Cluster Health
—  List Data-Nodes
—  Dead Nodes
—  Node Heart-beat information
—  Failed Jobs
—  Job Statistics
—  Detailed Job Logs
Conclusion
—  A different paradigm for solving large-scale problems
—  Designed to solve specific problems that can be defined
in a focused map-reduce manner
Ad

More Related Content

What's hot (20)

Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
Information Technology
 
Centralised and distributed databases
Centralised and distributed databasesCentralised and distributed databases
Centralised and distributed databases
Forrester High School
 
hadoop
hadoophadoop
hadoop
swatic018
 
Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)
mobeen.laws
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
Pooja Dixit
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
Sulemang
 
Distributed database system
Distributed database systemDistributed database system
Distributed database system
M. Ahmad Mahmood
 
Parallel databases
Parallel databasesParallel databases
Parallel databases
Aniruddha Patil
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
Vinay D. Patel
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
shopnil786
 
Distributed Database
Distributed DatabaseDistributed Database
Distributed Database
JovyLee4
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
emailharmeet
 
Distributed databases,types of database
Distributed databases,types of databaseDistributed databases,types of database
Distributed databases,types of database
Boomadevi Shanmugam
 
Massive parallel processing database systems mpp
Massive parallel processing database systems mppMassive parallel processing database systems mpp
Massive parallel processing database systems mpp
Diana Patricia Rey Cabra
 
Dremel
DremelDremel
Dremel
Anhua Xu
 
Distributed database
Distributed databaseDistributed database
Distributed database
Ahmed Salama
 
Cluster computing
Cluster computingCluster computing
Cluster computing
Kajal Thakkar
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
Gyanmanjari Institute Of Technology
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
BOSS Webtech
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
Ali Usman
 
Centralised and distributed databases
Centralised and distributed databasesCentralised and distributed databases
Centralised and distributed databases
Forrester High School
 
Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)
mobeen.laws
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
Pooja Dixit
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
Sulemang
 
Distributed database system
Distributed database systemDistributed database system
Distributed database system
M. Ahmad Mahmood
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
Vinay D. Patel
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
shopnil786
 
Distributed Database
Distributed DatabaseDistributed Database
Distributed Database
JovyLee4
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
emailharmeet
 
Distributed databases,types of database
Distributed databases,types of databaseDistributed databases,types of database
Distributed databases,types of database
Boomadevi Shanmugam
 
Massive parallel processing database systems mpp
Massive parallel processing database systems mppMassive parallel processing database systems mpp
Massive parallel processing database systems mpp
Diana Patricia Rey Cabra
 
Distributed database
Distributed databaseDistributed database
Distributed database
Ahmed Salama
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
Ali Usman
 

Viewers also liked (20)

Distributed Processing
Distributed ProcessingDistributed Processing
Distributed Processing
Imtiaz Hussain
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
Alokeparna Choudhury
 
Compare Chihuahua and Queretaro
Compare Chihuahua and QueretaroCompare Chihuahua and Queretaro
Compare Chihuahua and Queretaro
American Industries Group
 
Cloud ready discussion
Cloud ready discussionCloud ready discussion
Cloud ready discussion
Neil Stein
 
Law presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. IntLaw presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. Int
DianeAmbrose
 
Viraj D Visual cv
Viraj D Visual cvViraj D Visual cv
Viraj D Visual cv
Viraj Deshmukh
 
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMAALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
Altos Escondidos Panama
 
AML Manual AltosEscondidos
AML Manual AltosEscondidosAML Manual AltosEscondidos
AML Manual AltosEscondidos
Altos Escondidos Panama
 
Altos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact StudyAltos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Panama
 
Visual Arts Workshop
Visual  Arts WorkshopVisual  Arts Workshop
Visual Arts Workshop
Yan Min Shan
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
royans
 
Qfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 pptQfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 ppt
Arun Ramanathan
 
Instructional Design Projects and Resources
Instructional Design Projects and ResourcesInstructional Design Projects and Resources
Instructional Design Projects and Resources
Lucimara Mello
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
Sri Prasanna
 
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 FinalsTHE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
Arun Ramanathan
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
DDBMS
DDBMSDDBMS
DDBMS
Griffinder VinHai
 
Parallel processing Concepts
Parallel processing ConceptsParallel processing Concepts
Parallel processing Concepts
Army Public School and College -Faisal
 
Presentation on data communication
Presentation on data communicationPresentation on data communication
Presentation on data communication
Harpreet Dhaliwal
 
Chapter 3 - Data and Signals
Chapter 3 - Data and SignalsChapter 3 - Data and Signals
Chapter 3 - Data and Signals
Wayne Jones Jnr
 
Distributed Processing
Distributed ProcessingDistributed Processing
Distributed Processing
Imtiaz Hussain
 
Cloud ready discussion
Cloud ready discussionCloud ready discussion
Cloud ready discussion
Neil Stein
 
Law presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. IntLaw presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. Int
DianeAmbrose
 
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMAALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
Altos Escondidos Panama
 
Altos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact StudyAltos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Panama
 
Visual Arts Workshop
Visual  Arts WorkshopVisual  Arts Workshop
Visual Arts Workshop
Yan Min Shan
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
royans
 
Qfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 pptQfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 ppt
Arun Ramanathan
 
Instructional Design Projects and Resources
Instructional Design Projects and ResourcesInstructional Design Projects and Resources
Instructional Design Projects and Resources
Lucimara Mello
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
Sri Prasanna
 
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 FinalsTHE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
Arun Ramanathan
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
Presentation on data communication
Presentation on data communicationPresentation on data communication
Presentation on data communication
Harpreet Dhaliwal
 
Chapter 3 - Data and Signals
Chapter 3 - Data and SignalsChapter 3 - Data and Signals
Chapter 3 - Data and Signals
Wayne Jones Jnr
 
Ad

Similar to Distributed processing (20)

Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
Sudar Muthu
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.in
TIB Academy
 
data analytics lecture4.pptx
data analytics lecture4.pptxdata analytics lecture4.pptx
data analytics lecture4.pptx
NamrataBhatt8
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
Arjen de Vries
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
Ankan Banerjee
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
hadoop
hadoophadoop
hadoop
swatic018
 
Unit 1
Unit 1Unit 1
Unit 1
SriKGangadharRaoAssi
 
Hadoop and MapReduce addDdaDadadDDAD.pptx
Hadoop and MapReduce addDdaDadadDDAD.pptxHadoop and MapReduce addDdaDadadDDAD.pptx
Hadoop and MapReduce addDdaDadadDDAD.pptx
ms236400269
 
Hadoop
HadoopHadoop
Hadoop
Mayuri Gupta
 
HADOOP
HADOOPHADOOP
HADOOP
Harinder Kaur
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
hdfs readrmation ghghg bigdats analytics info.pdf
hdfs readrmation ghghg bigdats analytics info.pdfhdfs readrmation ghghg bigdats analytics info.pdf
hdfs readrmation ghghg bigdats analytics info.pdf
ssuser2d043c
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
Sudar Muthu
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.in
TIB Academy
 
data analytics lecture4.pptx
data analytics lecture4.pptxdata analytics lecture4.pptx
data analytics lecture4.pptx
NamrataBhatt8
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
Arjen de Vries
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Hadoop and MapReduce addDdaDadadDDAD.pptx
Hadoop and MapReduce addDdaDadadDDAD.pptxHadoop and MapReduce addDdaDadadDDAD.pptx
Hadoop and MapReduce addDdaDadadDDAD.pptx
ms236400269
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
hdfs readrmation ghghg bigdats analytics info.pdf
hdfs readrmation ghghg bigdats analytics info.pdfhdfs readrmation ghghg bigdats analytics info.pdf
hdfs readrmation ghghg bigdats analytics info.pdf
ssuser2d043c
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Ad

Recently uploaded (20)

Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 

Distributed processing

  • 1. Increase computational power with distributed processing Neil Stein 03 Nov 2012
  • 3. A Discussion Example…….. Getting the data, and ordering it as needed….. Familiar with grep and sort? —  “grep” extracts all the matching lines —  “sort” sorts all the lines grep “some_record_parameters” hl7_transfer.data-file | sort [2012/02/25/ 9:15] records sent to healthcare-1 [2012/02/28/ 6:15] records sent to healthcare-2 [2012/03/12/ 10:30] records sent to healthcare-3
  • 4. A Discussion Example…….. —  As the amount of data increases, process requires more and more resources —  What if hl7_transfor.data-file is 500GB or bigger? —  What if there are hundreds or thousands of data files? —  What if there are multiple types of data files? grep “provider 1” hl7_transfor.data-file | sort —  Ignoring the process for a moment, how do we write all the data to disk in the first place? Need to rethink the process
  • 6. Distributed File-System – “the cloud” —  Files can be stored across many machines —  Files can be replicated across many machines —  Files can be in a hyrbid-cloud model —  Share the file-system transparently —  You simply see the usual file structure —  Opportunity to leverage private and public cloud environments
  • 8. Map-Reduce – the cloud —  A way of processing large amounts of data across many machines —  Must be able to split-up the data in chunks for processing, (Map) —  Recombined after processing (Reduce) —  Requires a constant flow of data from one simple state to another —  Allows for a simple way of breaking down a large task into smaller manageable tasks —  Increase the available computational power
  • 9. A look at Hadoop
  • 10. What is Hadoop —  A Map-Reduce framework —  Designed to run applications on clusters of local and remote systems —  HDFS —  The file system of Hadoop (Hadoop Distributed File System) —  Designed to access clusters of local and remote systems
  • 11. Putting the pieces together….
  • 12. First, we need some code…… Map Reduce
  • 13. Map Hadoop streams information on STDIN Separate value with a newline (for Hadoop)
  • 14. Reduce Hadoop streams back to us on STDIN Output the aggregated records
  • 15. Sanity Checking Command Results This should work with small data-sets
  • 16. Push file to “the distributed file system” Put file on the DFS Check that the file is in the cloud
  • 17. Running in “the distributed environment” Call the Hadoop streaming command Pass the appropriate parameters
  • 18. Running in “the distributed environment”
  • 19. Running in “the distributed environment”
  • 20. Running in “the distributed environment”
  • 21. Running in “the distributed environment”
  • 22. Checking Status —  Cluster Summary —  Running Jobs —  Completed Jobs —  Failed Jobs —  Job Statistics —  Detailed Job Logs
  • 23. Checking Distributed Cluster Health —  List Data-Nodes —  Dead Nodes —  Node Heart-beat information —  Failed Jobs —  Job Statistics —  Detailed Job Logs
  • 24. Conclusion —  A different paradigm for solving large-scale problems —  Designed to solve specific problems that can be defined in a focused map-reduce manner