Hadoop + GPU

Oct 1, 20137 likes2,411 views

Brief intro into the problem and perspectives of OpenCL and distributed heterogeneous calculations with Hadoop. For Big Data Dive 2013 (Belarus Java User Group).

© ALTOROS Systems | CONFIDENTIAL
“The norm for data analytics is now to run them on commodity clusters with
MapReduce-like abstractions. One only needs to read the popular blogs to see the
evidence of this. We believe that we could now say that
“nobody ever got fired
for using Hadoop on a cluster”!

© ALTOROS Systems | CONFIDENTIAL
Breaking
News
IBM Keynote at JavaOne 2013: Java Flies in Blue Skies and Open Clouds
Java and GPUs open up a world of new opportunities
for GPU accelerators and Java programmers alike.

© ALTOROS Systems | CONFIDENTIAL
Breaking
News
Duimovich showed an example of GPU acceleration
of sorting using standard NVIDIA CUDA libraries
that are already available!
The speedups are phenomenal — ranging from 2x to 48x faster!

© ALTOROS Systems | CONFIDENTIAL
Breaking
News?

© ALTOROS Systems | CONFIDENTIAL
Breaking
Hadoop

© ALTOROS Systems | CONFIDENTIAL
Breaking
Hadoop
10 000x faster

© ALTOROS Systems | CONFIDENTIAL
Hadoop vs GPU
Hadoop & GPU
Hadoop + GPU
HPC
Big Data
GPGPU in Java
Heterogeneous systems
Horizontal and vertical scalability

© ALTOROS Systems | CONFIDENTIAL
Hadoop horizontal scalability
file01 file02 file03

© ALTOROS Systems | CONFIDENTIAL
Hadoop horizontal scalability
file01 file02 file03
Node 1 Node 2 Node 3
01 02 03 04 05 06 07 08 09 10
01
02
03
04
05 0607 0809 10

© ALTOROS Systems | CONFIDENTIAL
Hadoop horizontal scalability
Node 1 Node 2 Node 3
01 02
03 04
05 06
07 08
09 10
Node 4 Node 5 Node 6
01 02 03
04
05 06 07
08 09 10
221 1 2 2

© ALTOROS Systems | CONFIDENTIAL
Use GPU to scale vertically
Node 1 Node 2 Node 3
01 02
03 04
05 06
07 08
09 10
Node 4 Node 5 Node 6
01 02 03
04
05 06 07
08 09 10
221 1 2 20.5 1 1 0.5 1 1

© ALTOROS Systems | CONFIDENTIAL
Profit estimation
“Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU” by Intel
NVidia GTX280
vs
Intel Core i7-960

© ALTOROS Systems | CONFIDENTIAL
How to use OpenCL?

© ALTOROS Systems | CONFIDENTIAL
How to use OpenCL?
Hadoop streaming

© ALTOROS Systems | CONFIDENTIAL
Aparapi
Expands Java's “Write Once Run Anywhere” to include APU and GPU devices
by expressing data parallel algorithm through extending Kernel base class.
MyKernel.class

© ALTOROS Systems | CONFIDENTIAL
Aparapi
Characteristics of ideal data parallel workload

© ALTOROS Systems | CONFIDENTIAL
Aparapi
Characteristics of ideal data parallel workload
Code which iterates over large arrays of primitives
- 32/64 bit data types preferred
- where the order of iterations is not critical
avoid data dependencies between iterations
- each iteration contains sequential code (few branches)

© ALTOROS Systems | CONFIDENTIAL
HadoopCL
Rice University, AMD

© ALTOROS Systems | CONFIDENTIAL
HadoopCL

© ALTOROS Systems | CONFIDENTIAL
HadoopCL
2 six-core Intel X5660
(48 GB mem)
2 NVidia Tesla M2050
(2*2.5 GB mem)
AMD A10-5800K APU
(16 GB mem)

© ALTOROS Systems | CONFIDENTIAL
HadoopCL
2 six-core Intel X5660
(48 GB mem)
2 NVidia Tesla M2050
(2*2.5 GB mem)
AMD A10-5800K APU
(16 GB mem)
WHY?

© ALTOROS Systems | CONFIDENTIAL
Back to OpenCL, Aparapi and heterogeneous computing

© ALTOROS Systems | CONFIDENTIAL
OpenCL, Aparapi and heterogeneous computing
GPU cache
GPU GDDR5
CPU cache
SATA 3.0 (HDD)
SATA 2.0 (SSD)
1 GBit networkFormula in terms of time:
(CPU calc1) + disk read + disk write
>
(CPU calc2 + GPU calc + GPU-write + GPU-read) + disk read + disk write

© ALTOROS Systems | CONFIDENTIAL
OpenCL future

© ALTOROS Systems | CONFIDENTIAL
OpenCL future
http://streamcomputing.eu/

© ALTOROS Systems | CONFIDENTIAL
Questions?
Big Data Experts FB group

This document summarizes a study on using GPUs (CUDA) to accelerate Hadoop MapReduce workloads. It introduces CUDA into Hadoop clusters, evaluates the performance speedup and power efficiency on matrix multiplication and molecular dynamics simulations, and concludes that GPU acceleration provides up to 20x speedup and reduces power consumption by up to 19/20, making it a cost-effective approach compared to CPU-only upgrades. Future work is outlined to port more applications and support heterogeneous GPU/CPU clusters.

Hadoop mapreduce performance study on arm clusterairbots

Using GPUs to Handle Big Data with JavaTim Ellison

A copy of the slides presented at JavaOne conference 2014. Learn how Java can exploit the power of graphics processing units (GPUs) to optimize high-performance enterprise and technical computing applications such as big data and analytics workloads. This presentation covers principles and considerations for GPU programming from Java and looks at the software stack and developer tools available. It also presents a demo showing GPU acceleration and discusses what is coming in the future.

Exploiting GPUs in SparkKazuaki Ishizaki

Kazuaki Ishizaki is a research staff member at IBM Research - Tokyo who is interested in compiler optimizations, language runtimes, and parallel processing. He has worked on the Java virtual machine and just-in-time compiler for over 20 years. His message is that Spark can utilize GPUs to accelerate computation-heavy applications in a transparent way. He proposes new binary columnar and GPU enabler components that would efficiently store and handle data on GPUs without requiring changes to Spark programs. This could be implemented either through a Spark plugin for RDDs or by enhancing the Catalyst optimizer in Spark to generate GPU code.

Exploiting GPUs in SparkKazuaki Ishizaki

Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki

IBM researchers presented techniques for executing Java programs on GPUs using IBM Java 8. Developers can write parallel programs using standard Java 8 stream APIs without annotations. The IBM Java runtime optimizes the programs for GPU execution by exploiting read-only caches, reducing data transfers between CPU and GPU, and eliminating redundant exception checks. Benchmark results showed the GPU version was 58.9x faster than single-threaded CPU code and 3.7x faster than 160-threaded CPU code on average, achieving good performance gains.

GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scalesparktc

1. GPU support in Spark allows for accelerating Spark applications by offloading compute-intensive tasks to GPUs. However, production deployments face challenges like low resource utilization and overload when scheduling mixed GPU and CPU workloads. 2. The presentation proposes solutions like recognizing GPU tasks to optimize the DAG and inserting new GPU stages. It also discusses policies for prioritizing and allocating GPU and CPU resources independently through multi-dimensional scheduling. 3. Evaluation shows the ALS Spark example achieving speedups on GPUs. IBM Spectrum Conductor provides a Spark-centric shared service with fine-grained resource scheduling, reducing wait times and improving utilization across shared GPU and CPU resources.

Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach

Modern graphics processing units (GPUs) are efficient general-purpose stream processors. Learn how Java can exploit the power of GPUs to optimize high-performance enterprise and technical computing applications such as big data and analytics workloads. This presentation covers principles and considerations for GPU programming from Java and looks at the software stack and developer tools available. It also presents a demo showing GPU acceleration and discusses what is coming in the future.

LCU13: GPGPU on ARM Experience ReportLinaro

Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks

Methods that scale with available computation are the future of AI. Distributed deep learning is one such method that enables data scientists to massively increase their productivity by (1) running parallel experiments over many devices (GPUs/TPUs/servers) and (2) massively reducing training time by distributing the training of a single network over many devices. Apache Spark is a key enabling platform for distributed deep learning, as it enables different deep learning frameworks to be embedded in Spark workflows in a secure end-to-end pipeline. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows to build distributed deep learning applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Horovod to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We will also look at where you will find the bottlenecks when training models (in your frameworks, the network, GPUs, and with your data scientists) and how to get around them. We will look at how to use Spark Estimator model to perform hyper-parameter optimization with Spark/TensorFlow and model-architecture search, where Spark executors perform experiments in parallel to automatically find good model architectures. The talk will include a live demonstration of training and inference for a Tensorflow application embedded in a Spark pipeline written in a Jupyter notebook on the Hops platform. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training. The demo will be run on the Hops platform, currently used by over 450 researchers and students in Sweden, as well as at companies such as Scania and Ericsson.

JMI Techtalk: 한재근 - How to use GPU for developing AILablup Inc.

이 Techtalk에서는 AI 개발을 위해 GPU를 사용할 때 Nvidia가 제공하는 성능 향상을 위한 다양한 방법들을 기술자료들과 함께 소개합니다. 특히 Volta 아키텍처를 기반으로 Mixed precision을 도입하여 성능을 향상하는 과정에 관한 내용을 자세히 다룹니다. This Techtalk introduces a variety of ways to improve the performance that Nvidia provides when using the GPU for AI development, along with technical resources. In particular, this talk discusses the process of improving performance by introducing mixed precision based on the Volta architecture.

Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakDatabricks

There has been growing interest in harnessing the parallelism of Graphics Processing Units (GPUs) to accelerate analytics workloads. GPUs have become the standard platform for many machine learning algorithms, particularly in the field of deep neural networks (DNNs), while making increasing inroads into more traditional domains such as analytics databases and visual analytics. However there is a strong need to couple these new platforms with Apache Spark, which has emerged as the de facto analytics platform for data scientists. In this talk we discuss how we built a connector from Spark to the open source GPU-powered MapD Analytics Platform, and the use cases such a connector enables around being able to pull high value data from Spark and cache it on the GPU for subsequent interactive visual analysis and machine learning. We will conclude with a brief demo of an end-to-end Spark-to-MapD pipeline.

Parallel Linear Regression in Interative Reduce and YARNDataWorks Summit

Online learning techniques, such as Stochastic Gradient Descent (SGD), are powerful when applied to risk minimization and convex games on large problems. However, their sequential design prevents them from taking advantage of newer distributed frameworks such as Hadoop/MapReduce. In this session, we will take a look at how we parallelized linear regression parameter optimization on the next-gen YARN framework Iterative Reduce.

Making Hardware Accelerator Easier to UseKazuaki Ishizaki

The document summarizes Kazuaki Ishizaki's talk on making hardware accelerators easier to use. Some key points: - Programs are becoming simpler while hardware is becoming more complicated, with commodity processors including hardware accelerators like GPUs. - The speaker's recent work focuses on generating hardware accelerator code from high-level programs without needing specific hardware knowledge. - An approach using a Java JIT compiler was presented that can generate optimized GPU code from parallel Java streams, requiring programmers to only express parallelism. - The JIT compiler performs optimizations like aligning arrays, using read-only caches, reducing data transfer, and eliminating exception checks. - Benchmarks show the generated GPU

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit

1. GPU support in Spark can accelerate analytics workloads through automatically generating CUDA code from Spark Java code or integrating Spark with GPU-enabled libraries and applications. 2. Production deployments face challenges in identifying GPU vs CPU execution, data preparation for GPU, and low resource utilization. Scheduling must handle mixed GPU and CPU workloads across non-identical hosts to avoid overload and improve utilization. 3. IBM Conductor with Spark provides solutions through fine-grained scheduling that recognizes GPU tasks, prioritizes and allocates resources independently, and allows adaptive scheduling between CPU and GPU. This improves time to results through better resource utilization.

MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy

The document discusses using temporal shift modules (TSM) for efficient video recognition, where TSM enables temporal modeling in 2D CNNs with no additional computation cost; TSM models achieve better performance than 3D CNNs and previous methods while using less computation, and can be used for applications like online video understanding, low-latency deployment on edge devices, and large-scale distributed training on supercomputers.

GPU Computing With Apache Spark And PythonJen Aman

GPU Computing With Apache Spark And Python - Python is a popular language for data science and analytics due to its large ecosystem of libraries and ease of use, but it is slow for number crunching tasks. GPU computing is a way to accelerate Python workloads. - This presentation demonstrates using GPUs with Apache Spark and Python through libraries like Accelerate, which provides drop-in GPU-accelerated functions, and Numba, which can compile Python functions to run on GPUs. - As an example, the task of image registration, which involves computationally expensive 2D FFTs, is accelerated using these GPU libraries within a PySpark job, achieving a 2-4x speedup over CPU-only versions

MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott

Optimizing High Performance Computing Applications for EnergyDavid Lecomber

Distributed Model Training using MXNet with HorovodLin Yuan

Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit

This document compares several deep learning frameworks that run on Apache Spark, including SparkNet, Deeplearning4J, CaffeOnSpark, and Tensorflow on Spark. It outlines the theoretical principles behind data parallelism for distributed stochastic gradient descent. It then evaluates and benchmarks each framework based on criteria like ease of use, functionality, performance, and community support. SparkNet, CaffeOnSpark, and Tensorflow on Spark are shown to have stronger communities and support from organizations. The document concludes that while these frameworks currently lack model parallelism and could experience network congestion, integrating GPUs and improving scalability are areas for future work.

Overview of Scientific Workflows - Why Use Them?inside-BigData.com

This document provides an overview of scientific workflows, including what they are, common elements, and problems they help address. A workflow is a formal way to express a calculation as a series of tasks with dependencies. Workflow tools automate task execution, data management, scheduling, and more. They can help scale applications from a local system to large clusters. An example is provided of how the CyberShake project uses the Pegasus workflow system to automate probabilistic seismic hazard analysis calculations involving hundreds of thousands of tasks and petabytes of data.

Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analyticsinside-BigData.com

Today NVIDIA announced a GPU-acceleration platform for data science and machine learning, with broad adoption from industry leaders, that enables even the largest companies to analyze massive amounts of data and make accurate business predictions at unprecedented speed. Data analytics and machine learning are the largest segments of the high performance computing market that have not been accelerated — until now,” said Jensen Huang, founder and CEO of NVIDIA, who revealed RAPIDS in his keynote address at the GPU Technology Conference. “The world’s largest industries run algorithms written by machine learning on a sea of servers to sense complex patterns in their market and environment, and make fast, accurate predictions that directly impact their bottom line. "RAPIDS open-source software gives data scientists a giant performance boost as they address highly complex business challenges, such as predicting credit card fraud, forecasting retail inventory and understanding customer buying behavior. Reflecting the growing consensus about the GPU’s importance in data analytics, an array of companies is supporting RAPIDS — from pioneers in the open-source community, such as Databricks and Anaconda, to tech leaders like Hewlett Packard Enterprise, IBM and Oracle." Learn more: https://insidehpc.com/2018/10/open-source-rapids-gpu-platform-accelerate-predictive-data-analytics/ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Databricks

Data is the key ingredient to building high-quality, production AI applications. It comes in during the training phase, where more and higher-quality training data enables better models, as well as during the production phase, where understanding the model’s behavior in production and detecting changes in the predictions and input data are critical to maintaining a production application. However, so far most data management and machine learning tools have been largely separate. In this presentation, we’ll talk about several efforts from Databricks, in Apache Spark, as well as other open source projects, to unify data and AI in order to make it significantly simpler to build production AI applications.

RAPIDS – Open GPU-accelerated Data ScienceData Works MD

RAPIDS – Open GPU-accelerated Data Science RAPIDS is an initiative driven by NVIDIA to accelerate the complete end-to-end data science ecosystem with GPUs. It consists of several open source projects that expose familiar interfaces making it easy to accelerate the entire data science pipeline- from the ETL and data wrangling to feature engineering, statistical modeling, machine learning, and graph analysis. Corey J. Nolet Corey has a passion for understanding the world through the analysis of data. He is a developer on the RAPIDS open source project focused on accelerating machine learning algorithms with GPUs. Adam Thompson Adam Thompson is a Senior Solutions Architect at NVIDIA. With a background in signal processing, he has spent his career participating in and leading programs focused on deep learning for RF classification, data compression, high-performance computing, and managing and designing applications targeting large collection frameworks. His research interests include deep learning, high-performance computing, systems engineering, cloud architecture/integration, and statistical signal processing. He holds a Masters degree in Electrical & Computer Engineering from Georgia Tech and a Bachelors from Clemson University.

Ac922 cdac webinarGanesan Narayanasamy

The IBM Power System AC922 is a high-performance server designed for supercomputing and AI workloads. It features IBM's POWER9 CPUs, NVIDIA Tesla V100 GPUs connected via NVLink 2.0, and a high-speed Mellanox interconnect. The AC922 delivers high memory bandwidth, GPU computing power, and optimized hardware and software for workloads like deep learning. Several of the world's most powerful supercomputers, including Summit and Sierra, use large numbers of AC922 nodes to achieve exascale-level performance for scientific research.

Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit

Google Cloud Platform empowers TensorFlow and machine learning by providing scalable computing resources and APIs. It allows developers to build neural networks with TensorFlow, and easily integrate pre-trained machine learning models into applications using Cloud Vision and Speech APIs. Cloud Machine Learning offers a managed service for distributed TensorFlow training and prediction at scale in the cloud.

Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks

Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS Abstract: We will introduce RAPIDS, a suite of open source libraries for GPU-accelerated data science, and illustrate how it operates seamlessly with MLflow to enable reproducible training, model storage, and deployment. We will walk through a baseline example that incorporates MLflow locally, with a simple SQLite backend, and briefly introduce how the same workflow can be deployed in the context of GPU enabled Kubernetes clusters.

IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and ...AMD Developer Central

The document discusses accelerating analytics on Hadoop using OpenCL. It begins by outlining three key macro analytics trends: computational enablement, usability and visualization, and intelligent systems. It then focuses on computational enablement, describing how big data and Hadoop are enabling scalable analytics. The document discusses using OpenCL and GPUs to accelerate Hadoop by speeding up computationally intensive tasks. It provides an example of using logistic regression via map-reduce on Hadoop to better understand customer purchase data and attribute sales to marketing creatives. In under 3 sentences.

PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai

1) The PG-Strom project aims to accelerate PostgreSQL queries using GPUs. It generates CUDA code from SQL queries and runs them on Nvidia GPUs for parallel processing. 2) Initial results show PG-Strom can be up to 10 times faster than PostgreSQL for queries involving large table joins and aggregations. 3) Future work includes better supporting columnar formats and integrating with PostgreSQL's native column storage to improve performance further.

More Related Content

What's hot (20)

LCU13: GPGPU on ARM Experience ReportLinaro

Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks

JMI Techtalk: 한재근 - How to use GPU for developing AILablup Inc.

Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakDatabricks

Parallel Linear Regression in Interative Reduce and YARNDataWorks Summit

Making Hardware Accelerator Easier to UseKazuaki Ishizaki

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit

MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy

GPU Computing With Apache Spark And PythonJen Aman

MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott

Optimizing High Performance Computing Applications for EnergyDavid Lecomber

Distributed Model Training using MXNet with HorovodLin Yuan

Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit

Overview of Scientific Workflows - Why Use Them?inside-BigData.com

Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analyticsinside-BigData.com

Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Databricks

RAPIDS – Open GPU-accelerated Data ScienceData Works MD

Ac922 cdac webinarGanesan Narayanasamy

Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit

Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks

LCU13: GPGPU on ARM Experience ReportLinaro

Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks

JMI Techtalk: 한재근 - How to use GPU for developing AILablup Inc.

Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakDatabricks

Parallel Linear Regression in Interative Reduce and YARNDataWorks Summit

Making Hardware Accelerator Easier to UseKazuaki Ishizaki

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit

MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy

GPU Computing With Apache Spark And PythonJen Aman

MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott

Optimizing High Performance Computing Applications for EnergyDavid Lecomber

Distributed Model Training using MXNet with HorovodLin Yuan

Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit

Overview of Scientific Workflows - Why Use Them?inside-BigData.com

Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analyticsinside-BigData.com

Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Databricks

RAPIDS – Open GPU-accelerated Data ScienceData Works MD

Ac922 cdac webinarGanesan Narayanasamy

Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit

Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks

Viewers also liked (20)

IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and ...AMD Developer Central

PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai

PG-StromKohei KaiGai

GPUs in Big Data - StampedeCon 2014StampedeCon

At StampedeCon 2014, John Tran of NVIDIA presented "GPUs in Big Data." Modern graphics processing units (GPUs) are massively parallel general-purpose processors that are taking Big Data by storm. In terms of power efficiency, compute density, and scalability, it is clear now that commodity GPUs are the future of parallel computing. In this talk, we will cover diverse examples of how GPUs are revolutionizing Big Data in fields such as machine learning, databases, genomics, and other computational sciences.

GPU EcosystemOfer Rosenberg

SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard

Video replay: http://nvidia.fullviewmedia.com/siggraph2012/ondemand/SS106.html Location: West Hall Meeting Room 503, Los Angeles Convention Center Date: Wednesday, August 8, 2012 Time: 2:40 PM – 3:40 PM The future of GPU-based visual computing integrates the web, resolution-independent 2D graphics, and 3D to maximize interactivity and quality while minimizing consumed power. See what NVIDIA is doing today to accelerate resolution-independent 2D graphics for web content. This presentation explains NVIDIA's unique "stencil, then cover" approach to accelerating path rendering with OpenGL and demonstrates the wide variety of web content that can be accelerated with this approach. More information: http://developer.nvidia.com/nv-path-rendering

PG-Strom - GPU Accelerated AsyncrKohei KaiGai

This document discusses GPU accelerated computing and programming with GPUs. It provides characteristics of GPUs from Nvidia, AMD, and Intel including number of cores, memory size and bandwidth, and power consumption. It also outlines the 7 steps for programming with GPUs which include building and loading a GPU kernel, allocating device memory, transferring data between host and device memory, setting kernel arguments, enqueueing kernel execution, transferring results back, and synchronizing the command queue. The goal is to achieve super parallel execution with GPUs.

Deep learning on sparkSatyendra Rana

Computational Techniques for the Statistical Analysis of Big Data in Rherbps10

The document describes techniques for improving the computational performance of statistical analysis of big data in R. It uses as a case study the rlme package for rank-based regression of nested effects models. The workflow involves identifying bottlenecks, rewriting algorithms, benchmarking versions, and testing. Examples include replacing sorting with a faster C++ selection algorithm for the Wilcoxon Tau estimator, vectorizing a pairwise function, and preallocating memory for a covariance matrix calculation. The document suggests future directions like parallelization using MPI and GPUs to further optimize R for big data applications.

GTC 2012: GPU-Accelerated Path RenderingMark Kilgard

Presented at the GPU Technology Conference 2012 in San Jose, California. Tuesday, May 15, 2012. Standards such as Scalable Vector Graphics (SVG), PostScript, TrueType outline fonts, and immersive web content such as Flash depend on a resolution-independent 2D rendering paradigm that GPUs have not traditionally accelerated. This tutorial explains a new opportunity to greatly accelerate vector graphics, path rendering, and immersive web standards using the GPU. By attending, you will learn how to write OpenGL applications that accelerate the full range of path rendering functionality. Not only will you learn how to render sophisticated 2D graphics with OpenGL, you will learn to mix such resolution-independent 2D rendering with 3D rendering and do so at dynamic, real-time rates.

Accelerating Machine Learning Applications on Spark Using GPUsIBM

Matrix factorization (MF) is widely used in recommendation systems. We present cuMF, a highly-optimized matrix factorization tool with supreme performance on graphics processing units (GPUs) by fully utilizing the GPU compute power and minimizing the overhead of data movement. Firstly, we introduce a memory-optimized alternating least square (ALS) method by reducing discontiguous memory access and aggressively using registers to reduce memory latency. Secondly, we combine data parallelism with model parallelism to scale to multiple GPUs. Results show that with up to four GPUs on one machine, cuMF can be up to ten times as fast as those on sizable clusters on large scale problems, and has impressively good performance when solving the largest matrix factorization problem ever reported.

Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...odsc

Heterogeneous System Architecture Overviewinside-BigData.com

In this video from SC13, Vinod Tipparaju presents an Heterogeneous System Architecture Overview. "The HSA Foundation seeks to create applications that seamlessly blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing on the DSP via high bandwidth shared memory access enabling greater application performance at low power consumption. The Foundation is defining key interfaces for parallel computation utilizing CPUs, GPUs, DSPs, and other programmable and fixed-function devices, thus supporting a diverse set of high-level programming languages and creating the next generation in general-purpose computing." Learn more: http://hsafoundation.com/ Watch the video presentation: http://wp.me/p3RLHQ-aXk

PyData Amsterdam - Name Matching at ScaleGoDataDriven

Wendell Kuling works as a Data Scientist at ING in the Wholesale Banking Advanced Analytics team. Their projects aim to provide better services to corporate customers of ING, by using innovative techniques from data-science. In this talk, Wendell covers key insights from their experience in matching large datasets based on names. After covering the key algorithms and packages ING uses for name matching, Wendell will share his best-practice approach in applying these algorithms at scale… would you bet on a Cruncher (48-CPU/512 MB RAM machine), a Tesla (Cuda Tesla K80 with 4992 cores, 24GB memory) or a Spark cluster (80 cores/2,5 TB memory)?

Deep Learning on HadoopDataWorks Summit

This document discusses deep learning and implementing deep belief networks on Hadoop and YARN. It introduces Adam Gibson and Josh Patterson who have worked on deep learning. It then explains what deep learning and deep belief networks are, and how DeepLearning4J implements them in Java on distributed systems using techniques like parameter averaging. Metrics show DeepLearning4J can train models faster and generalize better by distributing training across clusters. The document envisions using this system with GPUs and unlabeled data to train very large deep learning models.

From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...Spark Summit

This document discusses the evolution from traditional machine learning to learning machines. It outlines the machine learning process and highlights how learning machines enable continuous feedback and retraining through automated modeling. The key design principles of learning machines are presented as collaboration across roles, convergence of technologies, and simplicity through automation and intuitiveness. Examples are provided of how learning machines can power experiences and services.

DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc

Deeplearning4J is an open-source, distributed deep learning library written for Java and Scala. It provides tools for training neural networks on distributed systems. While large companies can distribute training across many servers, Deeplearning4J allows other organizations to do distributed training as well. It includes libraries for vectorization, linear algebra, data preprocessing, model definition and training. The library aims to make deep learning more accessible to enterprises by allowing them to train models on their own large datasets.

How to Solve Real-Time Data ProblemsIBM Power Systems

Containerizing GPU Applications with Docker for Scaling to the CloudSubbu Rama

This document discusses containerizing GPU applications with Docker to enable scaling to the cloud. It describes how containers can solve problems of hardware and software portability by allowing applications to run consistently across different infrastructure. The document demonstrates how to build a GPU container using Dockerfiles and deploy it across multiple clouds. It also introduces Boost Containers which combine Bitfusion Boost technology with containers to build virtual GPU machines and clusters, enabling flexible scheduling of GPU workflows without code changes.

Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Chris Fregly

This document discusses TensorFrames, which bridges Spark and TensorFlow to enable data-parallel model training. TensorFrames allows Spark datasets to be used as input to TensorFlow models, and distributes the model training across Spark workers. The workers train on partitioned data in parallel and periodically aggregate results. This combines the benefits of Spark's distributed processing with TensorFlow's capabilities for neural networks and other machine learning models. A demo is provided of using TensorFrames in Python and Scala to perform distributed deep learning on Spark clusters.

IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and ...AMD Developer Central

PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai

PG-StromKohei KaiGai

GPUs in Big Data - StampedeCon 2014StampedeCon

GPU EcosystemOfer Rosenberg

SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard

PG-Strom - GPU Accelerated AsyncrKohei KaiGai

Deep learning on sparkSatyendra Rana

Computational Techniques for the Statistical Analysis of Big Data in Rherbps10

GTC 2012: GPU-Accelerated Path RenderingMark Kilgard

Accelerating Machine Learning Applications on Spark Using GPUsIBM

Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...odsc

Heterogeneous System Architecture Overviewinside-BigData.com

PyData Amsterdam - Name Matching at ScaleGoDataDriven

Deep Learning on HadoopDataWorks Summit

From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...Spark Summit

DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc

How to Solve Real-Time Data ProblemsIBM Power Systems

Containerizing GPU Applications with Docker for Scaling to the CloudSubbu Rama

Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Chris Fregly

Similar to Hadoop + GPU (20)

HPC and cloud distributed computing, as a journeyPeter Clapham

Simplify IT: Oracle SuperCluster Fran Navarro

The document summarizes Oracle's SuperCluster engineered system. It provides consolidated application and database deployment with in-memory performance. Key features include Exadata intelligent storage, Oracle M6 and T5 servers, a high-speed InfiniBand network, and Oracle VM virtualization. The SuperCluster enables database as a service with automated provisioning and security for multi-tenant deployment across industries.

Introduction to Distributed Computing & Distributed DatabasesShankar Iyer

OpenStack Preso: DevOps on Hybrid Infrastructurerhirschfeld

2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructuredevopsdaysaustin

The document discusses the need for hybrid infrastructure and hybrid DevOps to manage different cloud platforms and physical infrastructure in a consistent way. It notes that while no single API or platform can meet all needs, AWS dominance means its operational patterns have become the benchmark. The key is developing composable infrastructure modules that can be orchestrated together to provide portability across environments using a common operational process.

AMD It's Time to ROCinside-BigData.com

In this video from the HPC User Forum in Tucson, Gregory Stoner from AMD presents: It's Time to ROC. "With the announcement of the Boltzmann Initiative and the recent releases of ROCK and ROCR, AMD has ushered in a new era of Heterogeneous Computing. The Boltzmann initiative exposes cutting edge compute capabilities and features on targeted AMD/ATI Radeon discrete GPUs through an open source software stack. The Boltzmann stack is comprised of several components based on open standards, but extended so important hardware capabilities are not hidden by the implementation." Learn more: http://gpuopen.com/getting-started-with-boltzmann-components-platforms-installation/ and http://hpcuserforum.com Watch the video presentation: http://wp.me/p3RLHQ-fcJ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp

How to Win When Migrating to AzureKellyn Pot'Vin-Gorman

This document provides an overview of how to successfully migrate Oracle workloads to Microsoft Azure. It begins with an introduction of the presenter and their experience. It then discusses why customers might want to migrate to the cloud and the different Azure database options available. The bulk of the document outlines the key steps in planning and executing an Oracle workload migration to Azure, including sizing, deployment, monitoring, backup strategies, and ensuring high availability. It emphasizes adapting architectures for the cloud rather than directly porting on-premises systems. The document concludes with recommendations around automation, education resources, and references for Oracle-Azure configurations.

Cloud comparison - AWS vs Azure vs GooglePatrick Pierson

Oracle Cloud InfrastructureMarketingArrowECS_CZ

Oracle provides a comprehensive cloud infrastructure platform with compute, storage, networking and database services. Key features include fast NVMe SSD storage both locally and network attached, high performance bare metal and VM instances with GPU and AMD EPYC options, autonomous database services, and advanced networking capabilities like low latency and RDMA. Oracle's regional architecture and dedicated fast interconnects enable high availability across availability domains and regions.

Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central

Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014. Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.

Migrating enterprise workloads to AWS Tom Laszewski

Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham

Flexible computePeter Clapham

Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA

Abstract:- Traditional machine learning and feature engineering algorithms are not efficient enough to extract complex and nonlinear patterns hallmarks of big data. Deep learning, on the other hand, helps translate the scale and complexity of the data into solutions like molecular interaction in drug design, the search for subatomic particles and automatic parsing of microscopic images. Co-locating a data processing pipeline with a deep learning framework makes data exploration/algorithm and model evolution much simpler, while streamlining data governance and lineage tracking into a more focused effort. In this talk, we will discuss and compare the different deep learning frameworks on Spark in a distributed mode, ease of integration with the Hadoop ecosystem, and relative comparisons in terms of feature parity.

Helix core on aws webinar Perforce

This document provides guidance on deploying Helix Core, a version control system, on Amazon Web Services (AWS). It discusses example architectures like using multiple availability zones for high performance and reliability. It also covers setting up virtual private clouds (VPCs) for security, using placement groups for continuous integration/delivery, storage options, and advanced topologies. The document concludes by discussing hybrid deployments, compliance, and options for managing Helix Core deployments through Assembla which specializes in hosting version control systems on AWS.

MySQL Fabric - High Availability & Automated Sharding for MySQLTed Wennmark

The document discusses MySQL Fabric, which provides an extensible framework for high availability and sharding of MySQL databases. It allows clustering of MySQL servers for transparent failover and scale-out through sharding. MySQL Fabric handles shard mapping, global transactions and rebalancing shards across server groups. It provides connectors for applications to access the sharded and replicated database infrastructure with normal SQL queries.

Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani

High Performance Computing Pitch DeckNicholas Vossburg

This document discusses high performance computing (HPC) on Microsoft Azure. It begins with an overview of the HPC opportunity in the cloud, highlighting how the cloud provides elasticity and scale to accommodate variable computing demands. It then outlines Azure's value proposition for HPC, including its productive, trusted and hybrid capabilities. The document reviews the various HPC resources available on Azure like VMs, GPUs, and Cray supercomputers. It also discusses solutions for HPC like Azure Batch, Azure Machine Learning Compute, Azure CycleCloud and Avere vFXT. Example industry use cases are provided for automotive, financial services, manufacturing, media/entertainment and oil/gas. The summary reiterates that Azure is uniquely positioned

Introduction to HPC & Supercomputing in AITyrone Systems

HPC and cloud distributed computing, as a journeyPeter Clapham

Simplify IT: Oracle SuperCluster Fran Navarro

Introduction to Distributed Computing & Distributed DatabasesShankar Iyer

OpenStack Preso: DevOps on Hybrid Infrastructurerhirschfeld

2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructuredevopsdaysaustin

AMD It's Time to ROCinside-BigData.com

Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp

How to Win When Migrating to AzureKellyn Pot'Vin-Gorman

Cloud comparison - AWS vs Azure vs GooglePatrick Pierson

Oracle Cloud InfrastructureMarketingArrowECS_CZ

Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central

Migrating enterprise workloads to AWS Tom Laszewski

Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham

Flexible computePeter Clapham

Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA

Helix core on aws webinar Perforce

MySQL Fabric - High Availability & Automated Sharding for MySQLTed Wennmark

Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani

High Performance Computing Pitch DeckNicholas Vossburg

Introduction to HPC & Supercomputing in AITyrone Systems

Recently uploaded (20)

Metamorphosis: Life's Transformative JourneyArshad Shaikh

Quality Contril Analysis of Containers.pdfDr. Bindiya Chauhan

SPRING FESTIVITIES - UK AND USA -Colégio Santa Teresinha

Stein, Hunt, Green letter to Congress April 2025Mebane Rash

Operations Management (Dr. Abdulfatah Salem).pdfArab Academy for Science, Technology and Maritime Transport

Biophysics Chapter 3 Methods of Studying Macromolecules.pdfPKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.

This chapter provides an in-depth overview of the viscosity of macromolecules, an essential concept in biophysics and medical sciences, especially in understanding fluid behavior like blood flow in the human body. Key concepts covered include: ✅ Definition and Types of Viscosity: Dynamic vs. Kinematic viscosity, cohesion, and adhesion. ⚙️ Methods of Measuring Viscosity: Rotary Viscometer Vibrational Viscometer Falling Object Method Capillary Viscometer 🌡️ Factors Affecting Viscosity: Temperature, composition, flow rate. 🩺 Clinical Relevance: Impact of blood viscosity in cardiovascular health. 🌊 Fluid Dynamics: Laminar vs. turbulent flow, Reynolds number. 🔬 Extension Techniques: Chromatography (adsorption, partition, TLC, etc.) Electrophoresis (protein/DNA separation) Sedimentation and Centrifugation methods.

How to Subscribe Newsletter From Odoo 18 WebsiteCeline George

LDMMIA Reiki Master Spring 2025 Mini UpdatesLDM Mia eStudios

To study Digestive system of insect.pptxArshad Shaikh

New Microsoft PowerPoint Presentation.pptxmilanasargsyan5

Political History of Pala dynasty Pala Rulers NEP.pptxArya Mahila P. G. College, Banaras Hindu University, Varanasi, India.

The Pala kings were people-protectors. In fact, Gopal was elected to the throne only to end Matsya Nyaya. Bhagalpur Abhiledh states that Dharmapala imposed only fair taxes on the people. Rampala abolished the unjust taxes imposed by Bhima. The Pala rulers were lovers of learning. Vikramshila University was established by Dharmapala. He opened 50 other learning centers. A famous Buddhist scholar named Haribhadra was to be present in his court. Devpala appointed another Buddhist scholar named Veerdeva as the vice president of Nalanda Vihar. Among other scholars of this period, Sandhyakar Nandi, Chakrapani Dutta and Vajradatta are especially famous. Sandhyakar Nandi wrote the famous poem of this period 'Ramcharit'.

How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingCeline George

YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptxYale School of Public Health - The Virtual Medical Operations Center (VMOC)

A measles outbreak originating in West Texas has been linked to confirmed cases in New Mexico, with additional cases reported in Oklahoma and Kansas. The current case count is 795 from Texas, New Mexico, Oklahoma, and Kansas. 95 individuals have required hospitalization, and 3 deaths, 2 children in Texas and one adult in New Mexico. These fatalities mark the first measles-related deaths in the United States since 2015 and the first pediatric measles death since 2003. The YSPH Virtual Medical Operations Center Briefs (VMOC) were created as a service-learning project by faculty and graduate students at the Yale School of Public Health in response to the 2010 Haiti Earthquake. Each year, the VMOC Briefs are produced by students enrolled in Environmental Health Science Course 581 - Public Health Emergencies: Disaster Planning and Response. These briefs compile diverse information sources – including status reports, maps, news articles, and web content– into a single, easily digestible document that can be widely shared and used interactively. Key features of this report include: - Comprehensive Overview: Provides situation updates, maps, relevant news, and web resources. - Accessibility: Designed for easy reading, wide distribution, and interactive use. - Collaboration: The “unlocked" format enables other responders to share, copy, and adapt seamlessly. The students learn by doing, quickly discovering how and where to find critical information and presenting it in an easily understood manner.

Geography Sem II Unit 1C Correlation of Geography with other school subjectsProfDrShaikhImran

World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...larencebapu132

P-glycoprotein pamphlet: iteration 4 of 4 finalbs22n2s

Presentation of the MIPLM subject matter expert Erdem KayaMIPLM

To study the nervous system of insect.pptxArshad Shaikh

The *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview: Structure 1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information. 2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing. 3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles. Functions 1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell. 2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding. 3. *Behavioral responThe *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview: Structure 1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information. 2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing. 3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles. Functions 1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell. 2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding. 3. *Behavioral responses*: Insects can exhibit complex behaviors, such as mating, foraging, and social interactions. Characteristics 1. *Decentralized*: Insect nervous systems have some autonomy in different body parts. 2. *Specialized*: Different parts of the nervous system are specialized for specific functions. 3. *Efficient*: Insect nervous systems are highly efficient, allowing for rapid processing and response to stimuli. The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive in diverse environments. The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive

CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetSritoma Majumder

Introduction All the materials around us are made up of elements. These elements can be broadly divided into two major groups: Metals Non-Metals Each group has its own unique physical and chemical properties. Let's understand them one by one. Physical Properties 1. Appearance Metals: Shiny (lustrous). Example: gold, silver, copper. Non-metals: Dull appearance (except iodine, which is shiny). 2. Hardness Metals: Generally hard. Example: iron. Non-metals: Usually soft (except diamond, a form of carbon, which is very hard). 3. State Metals: Mostly solids at room temperature (except mercury, which is a liquid). Non-metals: Can be solids, liquids, or gases. Example: oxygen (gas), bromine (liquid), sulphur (solid). 4. Malleability Metals: Can be hammered into thin sheets (malleable). Non-metals: Not malleable. They break when hammered (brittle). 5. Ductility Metals: Can be drawn into wires (ductile). Non-metals: Not ductile. 6. Conductivity Metals: Good conductors of heat and electricity. Non-metals: Poor conductors (except graphite, which is a good conductor). 7. Sonorous Nature Metals: Produce a ringing sound when struck. Non-metals: Do not produce sound. Chemical Properties 1. Reaction with Oxygen Metals react with oxygen to form metal oxides. These metal oxides are usually basic. Non-metals react with oxygen to form non-metallic oxides. These oxides are usually acidic. 2. Reaction with Water Metals: Some react vigorously (e.g., sodium). Some react slowly (e.g., iron). Some do not react at all (e.g., gold, silver). Non-metals: Generally do not react with water. 3. Reaction with Acids Metals react with acids to produce salt and hydrogen gas. Non-metals: Do not react with acids. 4. Reaction with Bases Some non-metals react with bases to form salts, but this is rare. Metals generally do not react with bases directly (except amphoteric metals like aluminum and zinc). Displacement Reaction More reactive metals can displace less reactive metals from their salt solutions. Uses of Metals Iron: Making machines, tools, and buildings. Aluminum: Used in aircraft, utensils. Copper: Electrical wires. Gold and Silver: Jewelry. Zinc: Coating iron to prevent rusting (galvanization). Uses of Non-Metals Oxygen: Breathing. Nitrogen: Fertilizers. Chlorine: Water purification. Carbon: Fuel (coal), steel-making (coke). Iodine: Medicines. Alloys An alloy is a mixture of metals or a metal with a non-metal. Alloys have improved properties like strength, resistance to rusting.

Unit 6_Introduction_Phishing_Password Cracking.pdfKanchanPatil34

Metamorphosis: Life's Transformative JourneyArshad Shaikh

Quality Contril Analysis of Containers.pdfDr. Bindiya Chauhan

SPRING FESTIVITIES - UK AND USA -Colégio Santa Teresinha

Stein, Hunt, Green letter to Congress April 2025Mebane Rash

Operations Management (Dr. Abdulfatah Salem).pdfArab Academy for Science, Technology and Maritime Transport

Biophysics Chapter 3 Methods of Studying Macromolecules.pdfPKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.

How to Subscribe Newsletter From Odoo 18 WebsiteCeline George

LDMMIA Reiki Master Spring 2025 Mini UpdatesLDM Mia eStudios

To study Digestive system of insect.pptxArshad Shaikh

New Microsoft PowerPoint Presentation.pptxmilanasargsyan5

Political History of Pala dynasty Pala Rulers NEP.pptxArya Mahila P. G. College, Banaras Hindu University, Varanasi, India.

How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingCeline George

YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptxYale School of Public Health - The Virtual Medical Operations Center (VMOC)

Geography Sem II Unit 1C Correlation of Geography with other school subjectsProfDrShaikhImran

World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...larencebapu132

P-glycoprotein pamphlet: iteration 4 of 4 finalbs22n2s

Presentation of the MIPLM subject matter expert Erdem KayaMIPLM

To study the nervous system of insect.pptxArshad Shaikh

CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetSritoma Majumder

Unit 6_Introduction_Phishing_Password Cracking.pdfKanchanPatil34

Hadoop + GPU

1. © ALTOROS Systems | CONFIDENTIAL “The norm for data analytics is now to run them on commodity clusters with MapReduce-like abstractions. One only needs to read the popular blogs to see the evidence of this. We believe that we could now say that “nobody ever got fired for using Hadoop on a cluster”!

2. © ALTOROS Systems | CONFIDENTIAL Breaking News IBM Keynote at JavaOne 2013: Java Flies in Blue Skies and Open Clouds Java and GPUs open up a world of new opportunities for GPU accelerators and Java programmers alike.

3. © ALTOROS Systems | CONFIDENTIAL Breaking News Duimovich showed an example of GPU acceleration of sorting using standard NVIDIA CUDA libraries that are already available! The speedups are phenomenal — ranging from 2x to 48x faster!

9. © ALTOROS Systems | CONFIDENTIAL Hadoop vs GPU Hadoop & GPU Hadoop + GPU HPC Big Data GPGPU in Java Heterogeneous systems Horizontal and vertical scalability

12. © ALTOROS Systems | CONFIDENTIAL Hadoop horizontal scalability file01 file02 file03 Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 01 02 03 04 05 0607 0809 10

13. © ALTOROS Systems | CONFIDENTIAL Hadoop horizontal scalability file01 file02 file03 Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 01 02 03 04 05 0607 0809 10 3 4 3

14. © ALTOROS Systems | CONFIDENTIAL Hadoop horizontal scalability file01 file02 file03 Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 01 02 03 04 05 0607 0809 10 3 4 3 Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 Node 4 Node 5 Node 6 01 02 03 04 05 06 07 08 09 10

15. © ALTOROS Systems | CONFIDENTIAL Hadoop horizontal scalability file01 file02 file03 Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 01 02 03 04 05 0607 0809 10 3 4 3 Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 Node 4 Node 5 Node 6 01 02 03 04 05 06 07 08 09 10 221 1 2 2

16. © ALTOROS Systems | CONFIDENTIAL Hadoop horizontal scalability Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 Node 4 Node 5 Node 6 01 02 03 04 05 06 07 08 09 10 221 1 2 2

17. © ALTOROS Systems | CONFIDENTIAL Hadoop horizontal scalability Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 Node 4 Node 5 Node 6 01 02 03 04 05 06 07 08 09 10 221 1 2 2

18. © ALTOROS Systems | CONFIDENTIAL Use GPU to scale vertically Node 1 Node 2 Node 3 01 02 03 04 05 06 07 08 09 10 Node 4 Node 5 Node 6 01 02 03 04 05 06 07 08 09 10 221 1 2 20.5 1 1 0.5 1 1

19. © ALTOROS Systems | CONFIDENTIAL Profit estimation “Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU” by Intel NVidia GTX280 vs Intel Core i7-960

20. © ALTOROS Systems | CONFIDENTIAL Profit estimation “Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU” by Intel “OpenCL: the advantages of heterogeneous approach” by Intel NVidia GTX280 vs Intel Core i7-960

24. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class. MyKernel.class

25. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class. MyKernel.class Platform Supports OpenCL?

26. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class. MyKernel.class Platform Supports OpenCL? Execute using Java Thread Pool

27. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class. MyKernel.class Platform Supports OpenCL? Bytecode can be converted to OpenCL? Execute using Java Thread Pool

28. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class. MyKernel.class Platform Supports OpenCL? Bytecode can be converted to OpenCL? Convert it Execute OpenCL Kernel on DeviceExecute using Java Thread Pool

29. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class.

30. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class.

31. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class.

32. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class. lambda

33. © ALTOROS Systems | CONFIDENTIAL Aparapi Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class. lambda HSA

35. © ALTOROS Systems | CONFIDENTIAL Aparapi Characteristics of ideal data parallel workload Code which iterates over large arrays of primitives - 32/64 bit data types preferred - where the order of iterations is not critical avoid data dependencies between iterations - each iteration contains sequential code (few branches)

36. © ALTOROS Systems | CONFIDENTIAL Aparapi Characteristics of ideal data parallel workload Code which iterates over large arrays of primitives - 32/64 bit data types preferred - where the order of iterations is not critical avoid data dependencies between iterations - each iteration contains sequential code (few branches) Balance between data size (low) and compute (high) - data transfer to/from the GPU can be costly - trivial compute not worth the transfer cost - may still benefit by freeing up CPU for other work(?)

41. © ALTOROS Systems | CONFIDENTIAL HadoopCL 2 six-core Intel X5660 (48 GB mem) 2 NVidia Tesla M2050 (2*2.5 GB mem) AMD A10-5800K APU (16 GB mem)

42. © ALTOROS Systems | CONFIDENTIAL HadoopCL 2 six-core Intel X5660 (48 GB mem) 2 NVidia Tesla M2050 (2*2.5 GB mem) AMD A10-5800K APU (16 GB mem) WHY?

45. © ALTOROS Systems | CONFIDENTIAL OpenCL, Aparapi and heterogeneous computing GPU cache GPU GDDR5 CPU cache SATA 3.0 (HDD) SATA 2.0 (SSD) 1 GBit networkFormula in terms of time: (CPU calc1) + disk read + disk write > (CPU calc2 + GPU calc + GPU-write + GPU-read) + disk read + disk write

Hadoop + GPU

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Hadoop + GPU (20)

Recently uploaded (20)

Hadoop + GPU