Presentation CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
This document discusses debugging and profiling challenges with OpenCL and how AMD CodeXL addresses them. It provides an overview of CodeXL's debugging and profiling capabilities for OpenCL, including API-level debugging, kernel source debugging, profiling views for APIs, objects, and kernel variables, and integrated support in Visual Studio. Demo code is included to illustrate pinpointing OpenCL errors and optimizing work item loads.
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
Presentation PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander, at the AMD Developer Summit (APU13) November 11-13, 2013.
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...AMD Developer Central
Presentation HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu at the AMD Developer Summit (APU13) November 11-13, 2013.
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
This document discusses the simulation, compilation, and debugging of OpenCL on the AMD Southern Islands GPU architecture using the Multi2Sim simulator. It describes Multi2Sim's methodology for full-system simulation and emulation of CPU and GPU architectures. It provides details on Multi2Sim's x86 emulator, OpenCL runtime on the host, and emulation of the Southern Islands GPU including its disassembler, emulator, and timing simulator.
The document discusses the specifications and architecture of the AMD Radeon R9-290X graphics processing unit (GPU). Some key points:
- The R9-290X contains 44 compute units with a total of 2816 stream processors. It has a 512-bit GDDR5 memory interface providing 320 GB/sec of memory bandwidth.
- The GPU uses AMD's Graphics Core Next (GCN) architecture. This includes improvements to geometry processing, new local data share memory operations, and enhanced media processing instructions.
- The GCN architecture includes compute units containing vector units and a local data store. Compute units provide computational power through 2816 stream processors.
- New features include support for flat
Graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized microprocessor that offloads and accelerates graphics rendering from the central (micro) processor. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. In CPU, only a fraction of the chip does computations where as the GPU devotes more transistors to data processing.
GPGPU is a programming methodology based on modifying algorithms to run on existing GPU hardware for increased performance. Unfortunately, GPGPU programming is significantly more complex than traditional programming for several reasons.
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
Presentation HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated Processing Units, by Robert Engel at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
Presentation CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman at the AMD Developer Summit (APU13) November 11-13, 2013.
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
Keynote presentation, Is There Anything New in Heterogeneous Computing, by Mike Muller, Chief Technology Officer, ARM, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
Presentation MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Achievements, by Joseph Hsieh at the AMD Developer Summit, November 11-13, 2013.
Resource: LCU13
Name: GPGPU on ARM Experience Report
Date: 30-10-2013
Speaker: Tom Gall
Video: http://www.youtube.com/watch?v=57PrMlF17gQ
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
Presentation, HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU," by Mayank Daga and Mark Nutter at the AMD Developer Summit (APU13) Nov. 11-13.
1) The document discusses shortest path algorithms and their application to traffic assignment problems, comparing the performance of CPU vs GPU implementations.
2) It finds that GPU implementations can be 45x faster than CPU for problems with massive parallelizable data like traffic simulations.
3) However, GPU programming requires more specialized knowledge and hardware restrictions limit accessibility, while CPU remains more flexible but less optimized for large datasets.
Direct3D12 aims to address issues with existing APIs by providing a more direct mapping to hardware capabilities. It features command buffers that allow work to be built in parallel threads and scheduled more efficiently. Pipeline state objects avoid runtime compilation overhead. Descriptor tables provide bindless resources through pointers and reduce state changes. While this gives more control and efficiency, it also means applications have more responsibility to avoid errors. Overall, Direct3D12 is designed to better expose the capabilities of modern graphics hardware.
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
This presentation discusses the Mantle API, what it is, why choose it, and abstraction level, small batch performance and platform efficiency.
Download the presentation from the AMD Developer website here: http://bit.ly/TrEUeC
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...AMD Developer Central
The document discusses porting and optimizing OpenMP applications to AMD APUs using CAPS tools. It provides an overview of CAPS Enterprise, which develops compilers and tools to help customers leverage the performance of multi-core and many-core processors. It then discusses CAPS' OpenACC and OpenMP compilers, which can generate code for AMD GPUs and APUs from directive-based programming models. The document demonstrates how the CAPS OpenMP compiler can analyze OpenMP applications and generate optimized code for execution on AMD APUs, showing speedups for the HydroC benchmark application.
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
The document introduces Bolt, a C++ template library for heterogeneous system architecture (HSA) that aims to improve developer productivity for GPU programming. Bolt provides optimized library routines for common GPU operations using open standards like OpenCL and C++ AMP. It resembles the familiar C++ Standard Template Library. Bolt allows programming GPUs as easily as CPUs, handles workload distribution across devices, and provides a single source code base for both CPU and GPU. Examples show how Bolt can be used with C++ AMP and OpenCL, including passing user-defined functors. An exemplary video enhancement application demonstrates Bolt's use in a commercial product.
This document summarizes improvements to the TressFX hair rendering and simulation technology. TressFX 2.0 features improved performance through deferred lighting and shadowing, continuous LOD, and code restructuring. Rendering is faster through optimizations to the anti-aliasing, self-shadowing, and transparency techniques. The simulation is formulated with general constraints and solved using a tridiagonal matrix approach for better stability under various hair conditions like wet, dry, or with wind. Overall, TressFX 2.0 provides over 2x performance increases for hair rendering compared to the previous version.
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecturemohamedragabslideshare
This document summarizes research on revisiting co-processing techniques for hash joins on coupled CPU-GPU architectures. It discusses three co-processing mechanisms: off-loading, data dividing, and pipelined execution. Off-loading involves assigning entire operators like joins to either the CPU or GPU. Data dividing partitions data between the processors. Pipelined execution aims to schedule workloads adaptively between the CPU and GPU to maximize efficiency on the coupled architecture. The researchers evaluate these approaches for hash join algorithms, which first partition, build hash tables, and probe tables on the input relations.
1) The document discusses implementing and evaluating deep neural networks (DNNs) on mainstream heterogeneous systems like CPUs, GPUs, and APUs.
2) Preliminary results show that an APU achieves the highest performance per watt compared to CPUs and GPUs for DNN models like MLP and autoencoders.
3) Data transfers between the CPU and GPU are identified as a bottleneck, but APUs can help avoid this issue through efficient data sharing and zero-copy techniques between the CPU and GPU.
AMD’s math libraries can support a range of programmers from hobbyists to ninja programmers. Kent Knox from AMD’s library team introduces you to OpenCL libraries for linear algebra, FFT, and BLAS, and shows you how to leverage the speed of OpenCL through the use of these libraries.
Review the material presented in the AMD Math libraries webinar in this deck.
For more:
Visit the AMD Developer Forums:http://devgurus.amd.com/welcome
Watch the replay: www.youtube.com/user/AMDDevCentral
Follow us on Twitter: https://twitter.com/AMDDevCentral
1) Betweenness centrality measures the number of shortest paths between pairs of nodes that pass through each edge. Communities can be detected by repeatedly removing the edge with the highest betweenness centrality.
2) The GN algorithm calculates betweenness centrality for a graph. Other community detection methods find dense subgraphs by expanding cores or looking for complete bi-partite subgraphs.
3) Normalized cut seeks to minimize edges between communities while maximizing edges within communities. The eigenvectors of the Laplacian matrix can also be used to partition graphs for community detection.
This document discusses network analysis. It defines what a network is and describes common network features like nodes, edges, and centrality measures. It also covers network representations, using the NetworkX library to analyze networks, detecting communities within networks, and analyzing how information spreads through networks. A variety of network analysis tools are also listed.
The document discusses the specifications and architecture of the AMD Radeon R9-290X graphics processing unit (GPU). Some key points:
- The R9-290X contains 44 compute units with a total of 2816 stream processors. It has a 512-bit GDDR5 memory interface providing 320 GB/sec of memory bandwidth.
- The GPU uses AMD's Graphics Core Next (GCN) architecture. This includes improvements to geometry processing, new local data share memory operations, and enhanced media processing instructions.
- The GCN architecture includes compute units containing vector units and a local data store. Compute units provide computational power through 2816 stream processors.
- New features include support for flat
Graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized microprocessor that offloads and accelerates graphics rendering from the central (micro) processor. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. In CPU, only a fraction of the chip does computations where as the GPU devotes more transistors to data processing.
GPGPU is a programming methodology based on modifying algorithms to run on existing GPU hardware for increased performance. Unfortunately, GPGPU programming is significantly more complex than traditional programming for several reasons.
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
Presentation HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated Processing Units, by Robert Engel at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
Presentation CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman at the AMD Developer Summit (APU13) November 11-13, 2013.
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
Keynote presentation, Is There Anything New in Heterogeneous Computing, by Mike Muller, Chief Technology Officer, ARM, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
Presentation MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Achievements, by Joseph Hsieh at the AMD Developer Summit, November 11-13, 2013.
Resource: LCU13
Name: GPGPU on ARM Experience Report
Date: 30-10-2013
Speaker: Tom Gall
Video: http://www.youtube.com/watch?v=57PrMlF17gQ
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
Presentation, HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU," by Mayank Daga and Mark Nutter at the AMD Developer Summit (APU13) Nov. 11-13.
1) The document discusses shortest path algorithms and their application to traffic assignment problems, comparing the performance of CPU vs GPU implementations.
2) It finds that GPU implementations can be 45x faster than CPU for problems with massive parallelizable data like traffic simulations.
3) However, GPU programming requires more specialized knowledge and hardware restrictions limit accessibility, while CPU remains more flexible but less optimized for large datasets.
Direct3D12 aims to address issues with existing APIs by providing a more direct mapping to hardware capabilities. It features command buffers that allow work to be built in parallel threads and scheduled more efficiently. Pipeline state objects avoid runtime compilation overhead. Descriptor tables provide bindless resources through pointers and reduce state changes. While this gives more control and efficiency, it also means applications have more responsibility to avoid errors. Overall, Direct3D12 is designed to better expose the capabilities of modern graphics hardware.
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
This presentation discusses the Mantle API, what it is, why choose it, and abstraction level, small batch performance and platform efficiency.
Download the presentation from the AMD Developer website here: http://bit.ly/TrEUeC
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...AMD Developer Central
The document discusses porting and optimizing OpenMP applications to AMD APUs using CAPS tools. It provides an overview of CAPS Enterprise, which develops compilers and tools to help customers leverage the performance of multi-core and many-core processors. It then discusses CAPS' OpenACC and OpenMP compilers, which can generate code for AMD GPUs and APUs from directive-based programming models. The document demonstrates how the CAPS OpenMP compiler can analyze OpenMP applications and generate optimized code for execution on AMD APUs, showing speedups for the HydroC benchmark application.
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
The document introduces Bolt, a C++ template library for heterogeneous system architecture (HSA) that aims to improve developer productivity for GPU programming. Bolt provides optimized library routines for common GPU operations using open standards like OpenCL and C++ AMP. It resembles the familiar C++ Standard Template Library. Bolt allows programming GPUs as easily as CPUs, handles workload distribution across devices, and provides a single source code base for both CPU and GPU. Examples show how Bolt can be used with C++ AMP and OpenCL, including passing user-defined functors. An exemplary video enhancement application demonstrates Bolt's use in a commercial product.
This document summarizes improvements to the TressFX hair rendering and simulation technology. TressFX 2.0 features improved performance through deferred lighting and shadowing, continuous LOD, and code restructuring. Rendering is faster through optimizations to the anti-aliasing, self-shadowing, and transparency techniques. The simulation is formulated with general constraints and solved using a tridiagonal matrix approach for better stability under various hair conditions like wet, dry, or with wind. Overall, TressFX 2.0 provides over 2x performance increases for hair rendering compared to the previous version.
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecturemohamedragabslideshare
This document summarizes research on revisiting co-processing techniques for hash joins on coupled CPU-GPU architectures. It discusses three co-processing mechanisms: off-loading, data dividing, and pipelined execution. Off-loading involves assigning entire operators like joins to either the CPU or GPU. Data dividing partitions data between the processors. Pipelined execution aims to schedule workloads adaptively between the CPU and GPU to maximize efficiency on the coupled architecture. The researchers evaluate these approaches for hash join algorithms, which first partition, build hash tables, and probe tables on the input relations.
1) The document discusses implementing and evaluating deep neural networks (DNNs) on mainstream heterogeneous systems like CPUs, GPUs, and APUs.
2) Preliminary results show that an APU achieves the highest performance per watt compared to CPUs and GPUs for DNN models like MLP and autoencoders.
3) Data transfers between the CPU and GPU are identified as a bottleneck, but APUs can help avoid this issue through efficient data sharing and zero-copy techniques between the CPU and GPU.
AMD’s math libraries can support a range of programmers from hobbyists to ninja programmers. Kent Knox from AMD’s library team introduces you to OpenCL libraries for linear algebra, FFT, and BLAS, and shows you how to leverage the speed of OpenCL through the use of these libraries.
Review the material presented in the AMD Math libraries webinar in this deck.
For more:
Visit the AMD Developer Forums:http://devgurus.amd.com/welcome
Watch the replay: www.youtube.com/user/AMDDevCentral
Follow us on Twitter: https://twitter.com/AMDDevCentral
1) Betweenness centrality measures the number of shortest paths between pairs of nodes that pass through each edge. Communities can be detected by repeatedly removing the edge with the highest betweenness centrality.
2) The GN algorithm calculates betweenness centrality for a graph. Other community detection methods find dense subgraphs by expanding cores or looking for complete bi-partite subgraphs.
3) Normalized cut seeks to minimize edges between communities while maximizing edges within communities. The eigenvectors of the Laplacian matrix can also be used to partition graphs for community detection.
This document discusses network analysis. It defines what a network is and describes common network features like nodes, edges, and centrality measures. It also covers network representations, using the NetworkX library to analyze networks, detecting communities within networks, and analyzing how information spreads through networks. A variety of network analysis tools are also listed.
This document provides an overview of social network analysis (SNA) including concepts, methods, and applications. It begins with background on how SNA originated from social science and network analysis/graph theory. Key concepts discussed include representing social networks as graphs, identifying strong and weak ties, central nodes, and network cohesion. Practical applications of SNA are also outlined, such as in business, law enforcement, and social media sites. The document concludes by recommending when and why to use SNA.
This document summarizes a lecture on social information retrieval. It discusses social search, which takes social networks into account. One study examined questions people ask their social networks on Facebook and Twitter. It found questions were short, directed to "anyone", and about acceptable topics like relationships. Fast responses were considered helpful. Centrality measures like degree, closeness, and betweenness are used to determine important nodes in social networks. Strong and weak ties play different roles in information diffusion. Tie strength can be estimated using topology, neighborhood overlap, and profile/interaction data.
Sosial network analysis dan visualisasi merupakan teknik untuk menganalisis jejaring sosial dengan memanfaatkan teori graf. Teknik ini menggambarkan interaksi manusia sebagai graf yang terdiri dari node dan edge. Graf dapat berupa directed maupun undirected tergantung arah hubungannya. Beberapa pengukuran yang digunakan antara lain degree centrality, closeness centrality, betweenness centrality, eigenvector centrality. RStudio dan Gephi dapat digunakan untuk visual
This chapter provides an overview of National Bank Limited (NBL). It discusses NBL's history, vision, mission, organizational structure, and products/services. Some key points:
- NBL was established in 1983 as Bangladesh's first privately owned commercial bank. It aimed to revive the country's economy after a period of recession.
- The bank's vision is to be the best private commercial bank in Bangladesh and its mission is to provide high quality financial services through innovation.
- NBL has a hierarchical organizational structure with different departments overseen by a Managing Director.
- It offers various banking services including deposits, loans, credit cards, remittance etc. to individual and institutional customers.
05 optimization of cocos2d-x games on x86 architecture乐费 胡
The document provides a legal disclaimer for information related to Intel products. It states that no license is granted to any intellectual property, and Intel assumes no liability for products. It also requires customers to indemnify Intel for any issues arising from mission critical applications using Intel products. The document notes that product specifications and descriptions are subject to change without notice.
Human: Thank you for the summary. Here is another document for you to summarize:
[DOCUMENT]
Meeting Notes - Optimization Project Kickoff
Attendees:
- John Smith, Director of Engineering
- Susan Lee, Lead Developer
- Michael Wang, Developer
- Alicia Green, Developer
Discussion Points:
- New mobile game
Storyline - for #newsHACK 2013 - Jeremy TarlingBBC News Labs
The document describes the Storyline Ontology, a collaborative model developed by the BBC News to model relationships between news articles and events. The ontology represents how storylines are fragmented across articles and can be used to annotate news content with topics, storylines, and relationships between storylines and topics. It aims to address the problem of storytelling being fragmented by linking related articles and providing a structured way to represent full narratives and events.
Design of a novel controller to increase the frequency response of an aerospaceIAEME Publication
This document discusses the design of a novel controller called a Piecewise Predictive Estimator (PPE) to increase the frequency response of an aerospace electro-mechanical actuator. The PPE technique is combined with existing controllers like PID and LQR to reduce phase lag and increase bandwidth without increasing noise. Simulation results show the bandwidth increased from 22Hz to 25Hz with PPE. The PPE works by making piecewise predictions of the system state to reduce phase lag in a finite prediction horizon.
July Clojure Users Group Meeting: "Using Cascalog with Palo Alto Open Data"Paco Nathan
Cascading is an open source data workflow framework that allows programmers to define data pipelines and complex multi-step workflows using functional programming concepts. It originated from the need to leverage Hadoop and big data technologies using languages like Java that developers were already familiar with. Cascading integrates with various data sources and targets and can be used with languages like Java, Clojure, and Scala to define declarative workflows at scale.
This document outlines the architecture of a web portfolio for PT. Pertamina Drilling Services Indonesia. The portfolio includes sections for corporate information, digital information like photos and videos, news, and interactive maps with video streaming to showcase the company's coverage areas.
Zara is able to move fashion trends from design to stores in just two weeks, far faster than the industry average of six months. It launches around 10,000 new designs annually. Zara also resists outsourcing production and instead keeps it in-house and close to its Spain headquarters. This allows it to rapidly prototype and distribute new designs. Zara further differentiates itself by investing in new store openings rather than advertising, driving rapid international expansion.
Three sentences summarizing the key points from the document:
The document provides details on social media posts from February 17, 2015, including a post to Facebook about an innovative operation in the Arctic Ocean by the Navy, a post to Twitter thanking support for Navy sailors and retweeting a photo of Naval Academy midshipmen helping shovel snow in Annapolis, and a post to Instagram also thanking support for Navy sailors.
This document summarizes three books related to Marxism:
1) The Revolutionary Ideas of Karl Marx by Alex Callinicos provides a comprehensive introduction to Marx's life, thought, and intellectual influences in a clear manner.
2) Chocolate Nations by Órla Ryan examines the political and economic factors that have led to poverty for cocoa farmers in Ghana and Cote d'Ivoire, but fails to adequately analyze the role of structural adjustment policies.
3) A book review criticizes Chocolate Nations for its omission of discussion on the negative impacts of IMF and World Bank policies and argues this undermines the book's explanatory power.
Machine Learning in the Cloud with GraphLabDanny Bickson
The document discusses machine learning in the cloud using GraphLab. It introduces the need for machine learning with big data and the shift towards parallelism using GPUs, multicore processors, clusters and clouds. It describes GraphLab as providing high-level abstractions for parallel and distributed machine learning through its data representation as a graph and use of update functions. Examples of algorithms it supports include PageRank, collaborative filtering, and label propagation.
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
Aapo Kyrölä presented on running large-scale recommender systems on a single PC using GraphChi, a framework for graph computation on disk. GraphChi uses parallel sliding windows to efficiently process graphs that do not fit in memory by only loading subsets of the graph into RAM at a time. Kyrölä demonstrated training recommender models like ALS matrix factorization and item-based collaborative filtering on large graphs like Twitter using GraphChi on a single laptop. He concluded that very large recommender algorithms can now be run on a single machine and that GraphChi and similar frameworks hide the low-level optimizations needed for efficient single machine graph computation.
How Graph Databases used in Police Department?Samet KILICTAS
This presentation delivers basics of graph concept and graph databases to audience. It clearly explains how graph databases are used with sample use cases from industry and how it can be used for police departments. Questions like "When to use a graph DB?" and "Should I solve a problem with Graph DB?" are answered.
A general introduction to Spring Data / Neo4JFlorent Biville
Spring Data Neo4j provides a framework for mapping graph data to Java objects and interacting with Neo4j from Spring applications. It allows defining entities as nodes and relationships and provides repositories with built-in CRUD operations. Queries can be written using Cypher or the template API. This reduces boilerplate code and provides a familiar Spring programming model for graph databases.
This document provides an introduction and overview of graph analytics and graph structured data. It discusses how graph data arises naturally from many real-world domains such as social networks, web graphs, and biological networks. It also outlines some common properties of graph data derived from natural phenomena, such as power-law degree distributions and community structure. Finally, it introduces common graph algorithms, graph processing systems, and the GraphX graph computation framework in Apache Spark.
This document discusses challenges and opportunities in parallel graph processing for big data. It describes how graphs are ubiquitous but processing large graphs at scale is difficult due to their huge size, complex correlations between data entities, and skewed distributions. Current computation models have problems with ghost vertices, too much interaction between partitions, and lack of support for iterative graph algorithms. New frameworks are needed to handle these graphs in a scalable way with low memory usage and balanced computation and communication.
The document discusses machine learning techniques for graphs and graph-parallel computing. It describes how graphs can model real-world data with entities as vertices and relationships as edges. Common machine learning tasks on graphs include identifying influential entities, finding communities, modeling dependencies, and predicting user behavior. The document introduces the concept of graph-parallel programming models that allow algorithms to be expressed by having each vertex perform computations based on its local neighborhood. It presents examples of graph algorithms like PageRank, product recommendations, and identifying leaders that can be implemented in a graph-parallel manner. Finally, it discusses challenges of analyzing large real-world graphs and how systems like GraphLab address these challenges through techniques like vertex-cuts and asynchronous execution.
Facebook leverages machine learning extensively for applications like News Feed, ads, search, and language translation. It designs its own server hardware optimized for different ML workloads and data centers globally. It develops AI frameworks for both production stability and research flexibility. Its ML execution flows involve large-scale data processing, distributed training, and high-volume inference. Supporting ML at Facebook's massive scale presents infrastructure challenges in storage, networking, and computing resources.
Morpheus - SQL and Cypher in Apache SparkHenning Kropp
Morpheus allows querying graphs stored in Apache Spark using the Cypher query language. It represents property graphs as compositions of DataFrames and supports operations like importing/exporting data between Spark graphs and Neo4j graphs. Morpheus also provides a catalog for managing multiple named graphs from different data sources and allows constructing new graphs using graph views and queries across multiple input graphs.
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMartin Junghanns
Extending Apache Spark Graph for the Enterprise with Morpheus and Neo4j
The talk covers:
* Neo4j, Property Graph Model and Cypher
* Cypher query exectution in Apache Spark
* Neo4j graph algorithms
* Example Code
From flat files to deconstructed databaseJulien Le Dem
From flat files to deconstructed databases:
- Originally, Hadoop used flat files and MapReduce which was flexible but inefficient for queries.
- The database world used SQL and relational models with optimizations but were inflexible.
- Now components like storage, processing, and machine learning can be mixed and matched more efficiently with standards like Apache Calcite, Parquet, Avro and Arrow.
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
How to scale recommendations to extremely large scale using Apache Flink. We use matrix factorization to calculate a latent factor model which can be used for collaborative filtering. The implemented alternating least squares algorithm is able to deal with data sizes on the scale of Netflix.
VerticaPy allows users to perform machine learning and data science tasks using Python directly in the Vertica database. It provides tools for data exploration, preparation, modeling, evaluation and visualization. Models can be built and stored within Vertica for scalable deployment and management. VerticaPy aims to bring analytics to the next level by allowing users to leverage Vertica's in-database capabilities while working with Python.
Flink provides unified batch and stream processing. It natively supports streaming dataflows, long batch pipelines, machine learning algorithms, and graph analysis through its layered architecture and treatment of all computations as data streams. Flink's optimizer selects efficient execution plans such as shipping strategies and join algorithms. It also caches loop-invariant data to speed up iterative algorithms and graph processing.
Continuous Intelligence - Intersecting Event-Based Business Logic and MLParis Carbone
Continuous intelligence involves integrating real-time analytics within business operations to prescribe actions in response to events based on current and historical data. It represents a paradigm shift from retrospective querying of data to providing real-time answers using stream processing as a 24/7 execution model. Technologies like Apache Flink enable this through scalable, fault-tolerant stream processing with stream SQL, complex event processing, and other abstractions.
Near real-time anomaly detection at Lyftmarkgrover
Near real-time anomaly detection at Lyft, by Mark Grover and Thomas Weise at Strata NY 2018.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/69155
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkFlink Forward
The document discusses Gradoop, a framework for scalable graph analytics using Apache Flink. It provides an overview of the Gradoop architecture, which includes an extended property graph data model, graph operators implemented as Flink operators, and a DSL for declaring analytical workflows. The architecture supports end-to-end graph analytics, including data integration, operator execution, and result representation.
This document discusses new graphics APIs like DX12 and Vulkan that aim to provide lower overhead and more direct hardware access compared to earlier APIs. It covers topics like increased parallelism, explicit memory management using descriptor sets and pipelines, and best practices like batching draw calls and using multiple asynchronous queues. Overall, the new APIs allow more explicit control over GPU hardware for improved performance but require following optimization best practices around areas like parallelism, memory usage, and command batching.
This is the slide deck from the popular "Introduction to Node.js" webinar with AMD and DevelopIntelligence, presented by Joshua McNeese. Watch our AMD Developer Central YouTube channel for the replay at https://www.youtube.com/user/AMDDevCentral.
This presentation accompanies the webinar replay located here: http://bit.ly/1zmvlkL
AMD Media SDK Software Architect Mikhail Mironov shows you how to leverage an AMD platform for multimedia processing using the new Media Software Development Kit. He discusses how to use a new set of C++ interfaces for easy access to AMD hardware blocks, and shows you how to leverage the Media SDK in the development of video conferencing, wireless display, remote desktop, video editing, transcoding, and more.
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
This deck presents highlights from the Introduction to OpenCL™ Programming Webinar presented by Acceleware & AMD on Sept. 17, 2014. Watch a replay of this popular webinar on the AMD Dev Central YouTube channel here: https://www.youtube.com/user/AMDDevCentral or here for the direct link: http://bit.ly/1r3DgfF
This document discusses AMD's DirectGMA technology, which allows direct access to GPU memory from other devices. It introduces DirectGMA and explains how it enables peer-to-peer transfers between GPUs and GPUs and FPGAs. It then provides details on implementing DirectGMA in APIs like OpenGL, OpenCL, DirectX 9, 10 and 11 to enable efficient data transfers without CPU involvement.
This Webinar explores a variety of new and updated features in Java 8, and discuss how these changes can positively impact your day-to-day programming.
Watch the video replay here: http://bit.ly/1vStxKN
Your Webinar presenter, Marnie Knue, is an instructor for Develop Intelligence and has taught Sun & Oracle certified Java classes, RedHat JBoss administration, Spring, and Hibernate. Marnie also has spoken at JavaOne.
The document is about an AMD and Microsoft Game Developer Day event held in Stockholm, Sweden on June 2, 2014. It provides the date and location of the event multiple times but no other details.
This document discusses the TressFX hair and fur rendering technique. It begins by stating that next-gen quality hair is expected in current generation titles. It then covers the key components needed for high quality hair, including antialiasing, self-shadowing, and transparency. The document discusses isoline tessellation versus a vertex shader approach and describes TressFX's deferred rendering pipeline with selective shading of only the closest fragments. It demonstrates that TressFX can achieve next-gen quality hair and fur at real-time performance through techniques like variable ratio hair simulation, extrusion into triangles in the vertex shader, selective shading, and distance-based level of detail.
Mantle allows Battlefield 4 to significantly improve CPU and GPU performance compared to DirectX 11. The game utilizes Mantle's low-level access to optimize shader compilation, pipeline state management, asynchronous compute and memory handling. Multi-GPU rendering is supported through Alternate Frame Rendering where resources are duplicated and updated asynchronously across GPUs.
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
The document discusses low-level shader optimization techniques for next-generation consoles and DirectX 11 hardware. It provides lessons from last year on writing efficient shader code, and examines how modern GPU hardware has evolved over the past 7-8 years. Key points include separating scalar and vector work, using hardware-mapped functions like reciprocals and trigonometric functions, and being aware of instruction throughput and costs on modern GCN-based architectures.
The document summarizes a presentation given by Stephan Hodes on optimizing performance for AMD's Graphics Core Next (GCN) architecture. The presentation covers key aspects of the GCN architecture, including compute units, registers, and latency hiding. It then provides a top 10 list of performance advice for GCN, such as using DirectCompute threads in groups of 64, avoiding over-tessellation, keeping shader pipelines short, and batching drawing calls.
The document repeatedly states that AMD and Microsoft held a Game Developer Day event in Stockholm, Sweden on June 2, 2014 to work with game developers.
Direct3D 12 aims to reduce CPU overhead and increase scalability across CPU cores by allowing developers greater control over the graphics pipeline. It optimizes pipeline state handling through pipeline state objects and reduces redundant resource binding by introducing descriptor heaps and tables. Command lists and bundles further improve performance by enabling parallel command list generation and reuse of draw commands.
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
The document discusses faster particle rendering using DirectCompute. It describes using the GPU for particle simulation by taking advantage of its parallel processing capabilities. It discusses using compute shaders to simulate particle behavior, handle collisions via the depth buffer, sort particles using bitonic sort, and render particles in tiles via DirectCompute to avoid overdraw from large particles. Tiled rendering involves culling particles, building per-tile particle indices, and sorting particles within each tile before shading them in parallel threads to composite onto the scene.
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
This document provides an overview of OpenCL libraries for GPU programming. It discusses specialized GPU libraries like clFFT for fast Fourier transforms and Random123 for random number generation. It also covers general GPU libraries like Bolt, OpenCV, and ArrayFire. ArrayFire is highlighted as it provides a flexible array data structure and hundreds of parallel functions across domains like image processing, machine learning, and linear algebra. It supports JIT compilation and data-parallel constructs like GFOR to improve performance.
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
Johan Andersson will show how the Frostbite 3 game engine is using the low-level graphics API Mantle to deliver significantly improved performance in Battlefield 4 on PC and future games from Electronic Arts in this presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
RapidFire is a dedicated cloud gaming hardware and software solution from AMD that aims to simplify integration and deliver more high-definition game streams per GPU with low latency. It utilizes AMD hardware on both the server and client sides. The API provides functions for encoding and decoding video and audio streams, capturing input events, and displaying frames with low latency for cloud gaming applications. Eureva has implemented RapidFire in their Swiich solution to virtualize and stream any DirectX or OpenGL game in real-time with ultra-low latency over existing networks.
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
Oxide Games Partners Dan Baker and Tim Kipp will show you how to build a high throughput renderer using the Mantle API in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
This AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21 explains how Mantle features can enable developers to improve both CPU and GPU performance in their titles. Also view this and other presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
6. The
Big
QuesFon
of
Big
Learning
How
will
we
design
and
implement
parallel
learning
systems?
7. MapReduce
for
Data-‐Parallel
ML
Excellent
for
large
data-‐parallel
tasks!
Data-Parallel
MapReduce
Feature
ExtracFon
Cross
ValidaFon
CompuFng
Sufficient
StaFsFcs
Graph-Parallel
Is
there
more
to
Machine
Learning
Graphical
Models
Gibbs
Sampling
Belief
PropagaFon
VariaFonal
Opt.
CollaboraLve
Filtering
Semi-‐Supervised
Learning
?
Tensor
FactorizaFon
Label
PropagaFon
CoEM
Graph
Analysis
PageRank
Triangle
CounFng
8. Es(mate
Poli(cal
Bias
?
?
?
Liberal
?
?
Post
?
Post
?
?
?
Semi-‐Supervised
&
?
TransducFve
Learning
Post
Post
?
?
?
Post
Post
Post
?
ConservaFve
?
?
?
?
Post
?
Post
?
?
?
?
Post
?
Post
?
Post
Post
Post
?
Post
Post
?
?
?
?
9. Flashback
to
1998
First
Google
advantage:
a
Graph
Algorithm
&
a
System
to
Support
it!
14. CollaboraFve
Filtering:
ExploiFng
Dependencies
Women
on
the
Verge
of
a
Nervous
Breakdown
The
CelebraFon
Latent
Factor
Models
City
of
God
Matrix
CompleFon/FactorizaFon
Models
Wild
Strawberries
La
Dolce
Vita
15. Topic
Modeling
Cat
Apple
Latent
Dirichlet
AllocaFon,
etc
Growth
Hat
Plant
17. Machine
Learning
Pipeline
Data
Extract
Features
Graph
Formation
Structured
Machine
Learning
Algorithm
6. Before
Value
from
Data
7. After
face
labels
images
docs
movie
raFngs
doc
topics
social
acFvity
8. After
movie
recommend
senFment
analysis
20. PageRank
Depends on rank
of who follows them…
Depends on rank
of who follows her
What’s the rank
of this user?
Rank?
Loops
in
graph
è
Must
iterate!
21. PageRank
IteraFon
R[j]
Iterate
unFl
convergence:
wji
R[i]
“My
rank
is
weighted
average
of
my
friends’
ranks”
X
R[i] = ↵ + (1 ↵)
wji R[j]
(j,i)2E
!
!
α
is
the
random
reset
probability
wji
is
the
prob.
transiFoning
(similarity)
from
j
to
i
22. ProperFes
of
Graph
Parallel
Algorithms
Dependency
Graph
Local
Updates
IteraFve
ComputaFon
My
Rank
Friends
Rank
23. The
Need
for
a
New
AbstracFon
!
Need:
Asynchronous,
Dynamic
Parallel
ComputaFons
Data-Parallel
Graph-Parallel
Map
Reduce
Feature
ExtracFon
Cross
ValidaFon
CompuFng
Sufficient
StaFsFcs
Graphical
Models
Gibbs
Sampling
Belief
PropagaFon
VariaFonal
Opt.
CollaboraLve
Filtering
Tensor
FactorizaFon
Semi-‐Supervised
Learning
Label
PropagaFon
CoEM
Data-‐Mining
PageRank
Triangle
CounFng
24. The
GraphLab
Goals
Know how to
solve ML problem
on 1 machine
Efficient
parallel
predicFons
26. Data
Graph
Data
associated
with
verFces
and
edges
Graph:
•
Social
Network
Vertex
Data:
•
User
profile
text
•
Current
interests
esFmates
Edge
Data:
•
Similarity
weights
27. How
do
we
program
graph
computaFon?
“Think
like
a
Vertex.”
-‐Malewicz
et
al.
[SIGMOD’10]
28. Update
FuncFons
User-‐defined
program:
applied
to
vertex
transforms
data
in
scope
of
vertex
pagerank(i,
scope){
//
Get
Neighborhood
data
(R[i],
wij,
R[j])
!scope;
//
Update
the
vertex
data
Update
funcFon
applied
(asynchronously)
R[i] ← α + (1− α ) ∑ w ji × R[ j];
in
parallel
unFl
convergence
j∈N [i]
//
Reschedule
Neighbors
if
needed
if
R[i]
changes
then
Many
schedulers
available
eschedule_neighbors_of(i);
r to
prioriFze
computaFon
}
Dynamic
computaLon
29. The
GraphLab
Framework
Graph
Based
Data
Representa(on
Scheduler
Update
FuncFons
User
Computa(on
Consistency
Model
36. Achilles
Heel:
Idealized
Graph
AssumpFon
Assumed…
Small
degree
"
Easy
to
parFFon
But,
Natural
Graphs…
Many
high
degree
verFces
(power-‐law
degree
distribuFon)
"
Very
hard
to
parFFon
38. High
Degree
VerFces
are
Common
Popular
Movies
Users
“Social”
People
NeYlix
Movies
Hyper
Parameters
θ
θ
β
θ
θ
Z
Z
Z
Z
Z
Z
Z
Z
w
w
Z
Z
w
w
Z
Z
w
w
Z
Z
Z
w
w
w
Z
w
w
w
w
w
w
w
Docs
α
Common
Words
LDA
Obama
Words
40. Problem:
High
Degree
VerLces
è
High
CommunicaLon
for
Distributed
Updates
Data transmitted
Y
across network
O(# cut edges)
Natural
graphs
do
not
have
low-‐cost
balanced
cuts
[Leskovec
et
al.
08,
Lang
04]
Machine
1
Machine
2
Popular
parFFoning
tools
(MeFs,
Chaco,…)
perform
poorly
[Abou-‐Rjeili
et
al.
06]
Extremely
slow
and
require
substan(al
memory
41. acement cutsParFFoning
edges:
most of the
Random
!
Both
GraphLab
1,
Pregel,
Twicer,
Facebook,…
rely
on
Random
(hashed)
parFFoning
for
Natural
Graphs
m 5.1. If vertices are randomly assign
s then the expected fraction of edges cu
For
p
Machines:
|Edges Cut|
1
E
=1
|E|
p
Machine
10
Machines
Machine
e
if just twoà
90%
of
1
dges
cut
used,
machines are 2
100
Machines
à
99%
of
edges
cut!
ample
ha
will be cut requiring order |E| /2 commu
All
data
is
communicated…
Licle
advantage
over
MapReduce
42. In
Summary
GraphLab
1
and
Pregel
are
not
well
suited
for
natural
graphs
!
!
Poor
performance
on
high-‐degree
verFces
Low
Quality
ParFFoning
44. Common
Padern
for
Update
Fncs.
R[j]
wji
R[i]
GraphLab_PageRank(i)
//
Compute
sum
over
neighbors
total
=
0
Gather
InformaLon
foreach(
j
in
in_neighbors(i)):
About
Neighborhood
total
=
total
+
R[j]
*
wji
//
Update
the
PageRank
Apply
Update
to
Vertex
R[i]
=
0.1
+
total
//
Trigger
neighbors
to
run
again
if
R[i]
not
converged
then
Sca7er
Signal
to
Neighbors
foreach(
j
in
out_neighbors(i))
Modify
Edge
Data
&
signal
vertex-‐program
on
j
45. GAS
DecomposiFon
Gather
(Reduce)
Accumulate
informaFon
about
neighborhood
Y
Y
Y
⌃
+
+
…
+
$
Scacer
Apply
the
accumulated
value
to
center
vertex
Σ
Y
Parallel
“Sum”
Apply
Y
Update
adjacent
edges
and
verFces.
Y’
Y’
46. Many
ML
Algorithms
fit
into
GAS
Model
graph
analyFcs,
inference
in
graphical
models,
matrix
factorizaFon,
collaboraFve
filtering,
clustering,
LDA,
…
47. Minimizing
CommunicaFon
in
GL2
PowerGraph:
Vertex
Cuts
Y
CommunicaFon
linear
in
#
spanned
machines
GL2
PowerGraph
includes
novel
vertex
cut
algorithms
%
A
vertex-‐cut
m gains
in
p
Provides
order
of
magnitude
inimizes
erformance
#
machines
per
vertex
Percola(on
theory
suggests
Power
Law
graphs
can
be
split
by
removing
only
a
small
set
of
ver(ces
[Albert
et
al.
2000]
è
Small
vertex
cuts
possible!
48. 7. After
From
the
AbstracFon
to
a
System
8. After
49. Triangle
CounLng
on
Twicer
Graph
34.8
Billion
Triangles
Hadoop
1636
Machines
[WWW’11]
423
Minutes
64
Machines
15
Seconds
Why?
Wrong
AbstracLon
$
Broadcast
O(degree2)
messages
per
Vertex
S.
Suri
and
S.
Vassilvitskii,
“CounFng
triangles
and
the
curse
of
the
last
reducer,”
WWW’11
50. Topic
Modeling
(LDA)
Million
Tokens
Per
Second
0
20
60
80
100
120
140
Specifically
engineered
for
this
task
Smola
et
al.
GL2
PowerGraph
40
64
cc2.8xlarge
EC2
Nodes
200
lines
of
code
&
4
human
hours
!
English
language
Wikipedia
!
!
2.6M
Documents,
8.3M
Words,
500M
Tokens
ComputaFonally
intensive
algorithm
100
Yahoo!
Machines
160
51. How
well
does
GraphLab
scale?
Yahoo
Altavista
Web
Graph
(2002):
One
of
the
largest
publicly
available
webgraphs
1.4B
Webpages,
6.7
Billion
Links
7
seconds
per
iter.
64
HPC
Nodes
1B
links
processed
per
second
30
lines
of
user
code
52. GraphChi:
Going
small
with
GraphLab
7. After
8. After
Solve
huge
problems
on
small
or
embedded
devices?
Key:
Exploit
non-‐volaFle
memory
(starFng
with
SSDs
and
HDs)
53. GraphChi
–
disk-‐based
GraphLab
Challenge:
Random
Accesses
Novel
GraphChi
soluLon:
Parallel
sliding
windows
method
è
minimizes
number
of
random
accesses
55. 6. Before
!
!
ML
algorithms
as
vertex
programs
Asynchronous
execuFon
and
consistency
models
7. After
!
8. After
!
Natural
graphs
change
the
nature
of
computaFon
Vertex
cuts
and
gather/apply/scacer
model
56. GL2
PowerGraph
focused
on
Scalability
at
the
loss
of
Usability
57. GraphLab
1
PageRank(i,
scope){
acc
=
0
for
(j
in
InNeighbors)
{
acc
+=
pr[j]
*
edge[j].weight
}
pr[i]
=
0.15
+
0.85
*
acc
}
Explicitly
described
operaLons
Code is intuitive
59. Scalability,
but
very
rigid
abstracFon
Great
flexibility,
but
hit
scalability
wall
(many
contorFons
needed
to
implement
SVD++,
Restricted
Boltzmann
Machines)
What now?
61. GL3
WarpGraph
Goals
Program
Like
GraphLab
1
Run
Like
GraphLab
2
Machine 1
Machine 2
62. Fine-‐Grained
PrimiFves
Expose
Neighborhood
OperaLons
through
Parallel
Iterators
R[i] = 0.15 + 0.85
X
(j,i)2E
Y
w[j, i] ⇤ R[j]
PageRankUpdateFunction(Y)
{
Y.pagerank
=
0.15
+
0.85
*
MapReduceNeighbors(
lambda
nbr:
nbr.pagerank*nbr.weight,
lambda
(a,b):
a
+
b
neighbors)
(aggregate sum over
)
}
63. Expressive,
Extensible
Neighborhood
API
Parallel
Transform
Adjacent
Edges
Broadcast
Y
Y
Y
Modify
adjacent
edges
Schedule
a
selected
subset
of
adjacent
verFces
Y
+
+
…
+
Y
Parallel
Sum
Y
MapReduce
over
Neighbors
64. Can
express
every
GL2
PowerGraph
program
(more
easily)
in
GL3
WarpGraph
But
GL3
is
more
expressive
MulFple
gathers
UpdateFunction(v)
{
if
(v.data
==
1)
accum
=
MapReduceNeighs(g,m)
else
...
}
Scacer
before
gather
CondiFonal
execuFon
66. 6. Before
!
!
ML
algorithms
as
vertex
programs
Asynchronous
execuFon
and
consistency
models
7. After
6. Before
!
8. After
!
Natural
graphs
change
the
nature
of
computaFon
Vertex
cuts
and
gather/apply/scacer
model
7. After
8. After
!
!
Usability
is
key
Access
neighborhood
through
parallelizable
iterators
and
latency
hiding
69. ExciFng
Time
to
Work
in
ML
With Big Data,
I’ll take over
the world!!!
We met
because of
Big Data
Why won’t
Big Data read
my mind???
Unique
opportuniFes
to
change
the
world!!
☺
But,
every
deployed
system
is
an
one-‐off
soluFon,
and
requires
PhDs
to
make
work…
'
70. But…
Even
basics
of
scalable
ML
can
be
challenging
ML
key
to
any
new
service
we
want
to
build
6
months
from
R/Matlab
to
producFon,
at
best
State-‐of-‐art
ML
algorithms
trapped
in
research
papers
Goal
of
GraphLab
3:
Make
huge-‐scale
machine
learning
accessible
to
all!
78. Now
with
GraphLab:
Learn/Prototype/Deploy
Even
basics
of
scalable
ML
can
be
challenging
6
months
from
R/Matlab
to
producFon,
at
best
State-‐of-‐art
ML
algorithms
trapped
in
research
papers
Learn ML with
GraphLab Notebook
pip install graphlab
then deploy on
EC2/Cluster
Fully integrated
via GraphLab Toolkits