SlideShare a Scribd company logo
High Performance Computing
          Cloud point of view


                     Alexey Ragozin
         alexey.ragozin@gmail.com
                          Apr 2012
Massive parallel computing

 I/O bound workload
  • Data mining / machine learning / indexing
  • Focus: Do not move data, process in place
 CPU bound
  • complex simulations / complex math models
  • Focus: Keep all cores busy
 Latency bound
  • Physical process simulations
    (e.g. weather forecast)
  • Focus: Minimize communication latencies
CPU bound task

 Stream like workload
  • Independent tasks
  • Random continuous stream of tasks
  • E.g. video conversion, crawling
 Structured batch jobs
  •   Single batch is split into subtasks for parallel execution
  •   Task may have data dependency on each other
  •   Task may be generated during batch execution
  •   E.g. portfolio risk calculation
Handling task stream in Cloud

               Worker pool
                                                                   incoming
                                   in g
                                           Task queue
                             po ll                                 tasks
                                             queue metrics


                                                Controler

                                             adjusts pool size
                                          based on queue metrics




  Simple pattern. Exploiting “elasticy” of cloud. Cost effective.
Structured batch jobs in cloud

Batches are usually more sporadic
 e.g. end of day risk calculations
Task may have cross dependencies
 scheduler should be “cloud-aware”
Supplying tasks with data
 data delivery delay is critical
 worker pool is generally very large
 data sets also could be very large
Data delivery strategy
Push approach
 scheduler controls data delivery
 worker expects data to be available locally
 more opportunities for optimization
 complex
Pull approach
 worker pulls required data from central service
 scheduler is unaware about data sets
 requires scalable data service
 much simpler
What kind of data do we have?

 Working set
 • working set is divided between jobs
 • each portion of working set processed by single job
 • often jobs are producing working set for next
   computation stage
 Reference data
 • exactly same data shared by multiple/all jobs
 • usually static data set
Data distribution problem

Working set
• Spiky work load – especially at the start
• Hard to predict there piece of data will be required
• Caching is ineffective
Reference data set
• Naïve approach will produce huge volume of
  redundant transfers – smart caching required
• Spiky work load
Private grid practice

     HPC Grid
                                    RDBMS
                                      or
                                Data Warehouse




                    Data grid
Data grid, what is it?
• Key/Value storage
• Data distributed across cluster of servers
• RAM is usually used as storage
• Redundant copies provide level of fault tolerant /
  durability
• No single point of failure
• Automatic rebalancing of data when servers
  added/removed from grid
• Capacity and throughput are scaling linearly
Data service for cloud HPC

• Block storage service
  Azure drive / Amazon EBS
  – Lack of shared access to data
• Key / Value storage
  Azure Tables / Amazon Simple DB
  – Pricing: volume + usage
• Blob store
  Azure Tables (blobs) / Amazon S3
  – Pricing: volume + transactions
  – Good read scalability
Use case for caching

 Avoid storage of data in cloud
  • Upload data once per batch and cache in cloud
 Reduce storage cost by reducing number of
  operations
 Save IO bandwidth for shared data
  • Edge caching
  • Routing overlays
Routing overlays
• Each node knows (communicates) only subset
  of network.
• A request to unknown node is routed via one
  of neighbors which a closest to destination.
Task stealing

Task steeling – alternative scheduling approach
Task steeling in widely used for in-process multi-core concurrency

Why use it for cluster task scheduling
• Stochastic and adaptive
• Can use cost models accounting internal cloud
  topology
• Decently solves problem of data delivery, without
  additional caching
• Unproven for cluster computation, so far
Task stealing
       Worker 1

                     Work backlog is organized as
                      stack
                     Tasks are generated recursively
                     Top of stack – fine grained tasks
fork                 Bottom of stack – coarse
                      grained tasks
fork                 Execution from top of stack
fork                 Stealing – bottom of stack
       processing
Task stealing

       Worker 1             Worker 2

                    steal



                     fork

                     fork
                            processing
fork




fork
       processing
fork
         done
IO bound workload in cloud

Dawn of Map/Reduce
- high bandwidth interconnects are expensive
- network storage is expensive
- cheap serves and local processing for keeping costs
  low
“Cloud” reality
- network bandwidth is cheap
- disks are already “networked”
- RAM is abundant
Hadoop is cloud unfriendly

Assume I have 50 nodes Hadoop cluster in cloud
What will I gain by adding another 50 nodes?
- Not much, until they are populated with data.
What if I will shut these 50 afterward?
- Effort to populate them with data will be wasted.

Hadoop is coupling execution and storage services
  together – you have pay for both even if you use one.
How cloud M/R should look?
• Use cloud storage service and persistent storage
• Streaming M/R processing
• Aggressive use of memory for intermediate data

Peregrine – storeless M/R framework
  http://peregrine_mapreduce.bitbucket.org/
Spark – in-memory M/R framework
  http://www.spark-project.org/
Looking into future

Highly anticipated features
 Scheduler as a Service
  Azure HPC
 Simple middleware for organizing caches and
  routing overlays
  Existing solutions are far from simple
 Cloud friendly map/reduce frameworks
Thank you
http://aragozin.blogspot.com
- my articles


                              Alexey Ragozin
                   alexey.ragozin@gmail.com
Ad

More Related Content

What's hot (20)

Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
Caching In The Cloud
Caching In The CloudCaching In The Cloud
Caching In The Cloud
Alex Miller
 
Hazelcast
HazelcastHazelcast
Hazelcast
Jeevesh Pandey
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
ORM and distributed caching
ORM and distributed cachingORM and distributed caching
ORM and distributed caching
aragozin
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Spark Summit
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
elliando dias
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Haribabu Nandyal Padmanaban
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage Devices
Databricks
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
 
Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8
Speedment, Inc.
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
Santanu Dey
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
Ramya Sunil
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
DataStax Academy
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid
Karthik Deivasigamani
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
Caching In The Cloud
Caching In The CloudCaching In The Cloud
Caching In The Cloud
Alex Miller
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
ORM and distributed caching
ORM and distributed cachingORM and distributed caching
ORM and distributed caching
aragozin
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Spark Summit
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Haribabu Nandyal Padmanaban
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage Devices
Databricks
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
 
Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8
Speedment, Inc.
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
Santanu Dey
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
Ramya Sunil
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
DataStax Academy
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid
Karthik Deivasigamani
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 

Similar to High Performance Computing - Cloud Point of View (20)

Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
Olga Lavrentieva
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
Antonios Katsarakis
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
Lee Stott
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
elliando dias
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the Cloud
Eberhard Wolff
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows Azure
Nuno Godinho
 
Hadoop
HadoopHadoop
Hadoop
Girish Khanzode
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Cloud computing
Cloud computingCloud computing
Cloud computing
Aaron Tushabe
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - Geektalk
Malisa Ncube
 
Workflowsim escience12
Workflowsim escience12Workflowsim escience12
Workflowsim escience12
Weiwei Chen
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
try
trytry
try
Lamha Agarwal
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
Mohamed Sayed
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
Olga Lavrentieva
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
Antonios Katsarakis
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
Lee Stott
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
elliando dias
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the Cloud
Eberhard Wolff
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows Azure
Nuno Godinho
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - Geektalk
Malisa Ncube
 
Workflowsim escience12
Workflowsim escience12Workflowsim escience12
Workflowsim escience12
Weiwei Chen
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
Mohamed Sayed
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
 
Ad

More from aragozin (20)

Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
aragozin
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slow
aragozin
 
Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)
aragozin
 
Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016
aragozin
 
Распределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на JavaРаспределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на Java
aragozin
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?
aragozin
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourself
aragozin
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profiler
aragozin
 
Java black box profiling
Java black box profilingJava black box profiling
Java black box profiling
aragozin
 
Блеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшейБлеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшей
aragozin
 
JIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutionsJIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutions
aragozin
 
Casual mass parallel computing
Casual mass parallel computingCasual mass parallel computing
Casual mass parallel computing
aragozin
 
Nanocloud cloud scale jvm
Nanocloud   cloud scale jvmNanocloud   cloud scale jvm
Nanocloud cloud scale jvm
aragozin
 
Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)
aragozin
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
aragozin
 
Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)
aragozin
 
Filtering 100M objects in Coherence cache. What can go wrong?
Filtering 100M objects in Coherence cache. What can go wrong?Filtering 100M objects in Coherence cache. What can go wrong?
Filtering 100M objects in Coherence cache. What can go wrong?
aragozin
 
Cборка мусора в Java без пауз (HighLoad++ 2013)
Cборка мусора в Java без пауз  (HighLoad++ 2013)Cборка мусора в Java без пауз  (HighLoad++ 2013)
Cборка мусора в Java без пауз (HighLoad++ 2013)
aragozin
 
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
aragozin
 
Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)
aragozin
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
aragozin
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slow
aragozin
 
Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)
aragozin
 
Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016
aragozin
 
Распределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на JavaРаспределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на Java
aragozin
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?
aragozin
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourself
aragozin
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profiler
aragozin
 
Java black box profiling
Java black box profilingJava black box profiling
Java black box profiling
aragozin
 
Блеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшейБлеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшей
aragozin
 
JIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutionsJIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutions
aragozin
 
Casual mass parallel computing
Casual mass parallel computingCasual mass parallel computing
Casual mass parallel computing
aragozin
 
Nanocloud cloud scale jvm
Nanocloud   cloud scale jvmNanocloud   cloud scale jvm
Nanocloud cloud scale jvm
aragozin
 
Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)
aragozin
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
aragozin
 
Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)
aragozin
 
Filtering 100M objects in Coherence cache. What can go wrong?
Filtering 100M objects in Coherence cache. What can go wrong?Filtering 100M objects in Coherence cache. What can go wrong?
Filtering 100M objects in Coherence cache. What can go wrong?
aragozin
 
Cборка мусора в Java без пауз (HighLoad++ 2013)
Cборка мусора в Java без пауз  (HighLoad++ 2013)Cборка мусора в Java без пауз  (HighLoad++ 2013)
Cборка мусора в Java без пауз (HighLoad++ 2013)
aragozin
 
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
aragozin
 
Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)
aragozin
 
Ad

Recently uploaded (20)

tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 

High Performance Computing - Cloud Point of View

  • 1. High Performance Computing Cloud point of view Alexey Ragozin [email protected] Apr 2012
  • 2. Massive parallel computing  I/O bound workload • Data mining / machine learning / indexing • Focus: Do not move data, process in place  CPU bound • complex simulations / complex math models • Focus: Keep all cores busy  Latency bound • Physical process simulations (e.g. weather forecast) • Focus: Minimize communication latencies
  • 3. CPU bound task  Stream like workload • Independent tasks • Random continuous stream of tasks • E.g. video conversion, crawling  Structured batch jobs • Single batch is split into subtasks for parallel execution • Task may have data dependency on each other • Task may be generated during batch execution • E.g. portfolio risk calculation
  • 4. Handling task stream in Cloud Worker pool incoming in g Task queue po ll tasks queue metrics Controler adjusts pool size based on queue metrics Simple pattern. Exploiting “elasticy” of cloud. Cost effective.
  • 5. Structured batch jobs in cloud Batches are usually more sporadic  e.g. end of day risk calculations Task may have cross dependencies  scheduler should be “cloud-aware” Supplying tasks with data  data delivery delay is critical  worker pool is generally very large  data sets also could be very large
  • 6. Data delivery strategy Push approach  scheduler controls data delivery  worker expects data to be available locally  more opportunities for optimization  complex Pull approach  worker pulls required data from central service  scheduler is unaware about data sets  requires scalable data service  much simpler
  • 7. What kind of data do we have? Working set • working set is divided between jobs • each portion of working set processed by single job • often jobs are producing working set for next computation stage Reference data • exactly same data shared by multiple/all jobs • usually static data set
  • 8. Data distribution problem Working set • Spiky work load – especially at the start • Hard to predict there piece of data will be required • Caching is ineffective Reference data set • Naïve approach will produce huge volume of redundant transfers – smart caching required • Spiky work load
  • 9. Private grid practice HPC Grid RDBMS or Data Warehouse Data grid
  • 10. Data grid, what is it? • Key/Value storage • Data distributed across cluster of servers • RAM is usually used as storage • Redundant copies provide level of fault tolerant / durability • No single point of failure • Automatic rebalancing of data when servers added/removed from grid • Capacity and throughput are scaling linearly
  • 11. Data service for cloud HPC • Block storage service Azure drive / Amazon EBS – Lack of shared access to data • Key / Value storage Azure Tables / Amazon Simple DB – Pricing: volume + usage • Blob store Azure Tables (blobs) / Amazon S3 – Pricing: volume + transactions – Good read scalability
  • 12. Use case for caching  Avoid storage of data in cloud • Upload data once per batch and cache in cloud  Reduce storage cost by reducing number of operations  Save IO bandwidth for shared data • Edge caching • Routing overlays
  • 13. Routing overlays • Each node knows (communicates) only subset of network. • A request to unknown node is routed via one of neighbors which a closest to destination.
  • 14. Task stealing Task steeling – alternative scheduling approach Task steeling in widely used for in-process multi-core concurrency Why use it for cluster task scheduling • Stochastic and adaptive • Can use cost models accounting internal cloud topology • Decently solves problem of data delivery, without additional caching • Unproven for cluster computation, so far
  • 15. Task stealing Worker 1  Work backlog is organized as stack  Tasks are generated recursively  Top of stack – fine grained tasks fork  Bottom of stack – coarse grained tasks fork  Execution from top of stack fork  Stealing – bottom of stack processing
  • 16. Task stealing Worker 1 Worker 2 steal fork fork processing fork fork processing fork done
  • 17. IO bound workload in cloud Dawn of Map/Reduce - high bandwidth interconnects are expensive - network storage is expensive - cheap serves and local processing for keeping costs low “Cloud” reality - network bandwidth is cheap - disks are already “networked” - RAM is abundant
  • 18. Hadoop is cloud unfriendly Assume I have 50 nodes Hadoop cluster in cloud What will I gain by adding another 50 nodes? - Not much, until they are populated with data. What if I will shut these 50 afterward? - Effort to populate them with data will be wasted. Hadoop is coupling execution and storage services together – you have pay for both even if you use one.
  • 19. How cloud M/R should look? • Use cloud storage service and persistent storage • Streaming M/R processing • Aggressive use of memory for intermediate data Peregrine – storeless M/R framework http://peregrine_mapreduce.bitbucket.org/ Spark – in-memory M/R framework http://www.spark-project.org/
  • 20. Looking into future Highly anticipated features  Scheduler as a Service Azure HPC  Simple middleware for organizing caches and routing overlays Existing solutions are far from simple  Cloud friendly map/reduce frameworks