The document describes Onyx, a new flexible and extensible data processing system. It discusses limitations of existing frameworks in new resource environments like resource disaggregation and transient resources. The Onyx architecture includes a compiler that transforms dataflow programs into optimized physical execution plans using passes, and a runtime that executes the plans across cluster resources. It provides examples of compiling and running MapReduce and ALS jobs, and handling dynamic data skew through runtime optimization.
Speaker: Umayah Abdennabi
Agenda
* Intro Grammarly (Umayah Abdennabi, 5 mins)
* Meetup Updates and Announcements (Chris, 5 mins)
* Custom Functions in Spark SQL (30 mins)
Speaker: Umayah Abdennabi
Spark comes with a rich Expression library that can be extended to make custom expressions. We will look into custom expressions and why you would want to use them.
* TF 2.0 + Keras (30 mins)
Speaker: Francesco Mosconi
Tensorflow 2.0 was announced at the March TF Dev Summit, and it brings many changes and upgrades. The most significant change is the inclusion of Keras as the default model building API. In this talk, we'll review the main changes introduced in TF 2.0 and highlight the differences between open source Keras and tf.keras
* SQUAD Deep-Dive: Question & Answer with Context (45 mins)
Speaker: Brett Koonce (https://quarkworks.co)
SQuAD (Stanford Question Answer Dataset) is an NLP challenge based around answering questions by reading Wikipedia articles, designed to be a real-world machine learning benchmark. We will look at several different ways to tackle the SQuAD problem, building up to state of the art approaches in terms of time, complexity, and accuracy.
https://rajpurkar.github.io/SQuAD-explorer/
https://dawn.cs.stanford.edu/benchmark/#squad
Food and drinks will be provided. The event will be held at Grammarly's office at One Embarcadero Center on the 9th floor. When you arrive at One Embarcadero, take the escalator to the second floor where you will find the lobby and elevators to the office suites. Come on up to the 9th floor (no need to check in at security), and ring the Grammarly doorbell.
JVM and OS Tuning for accelerating Spark applicationTatsuhiro Chiba
1) The document discusses optimizing Spark applications through JVM and OS tuning. Tuning aspects covered include JVM heap sizing, garbage collection options, process affinity, and large memory pages.
2) Benchmark results show that after applying these optimizations, execution time was reduced by 30-50% for Kmeans clustering and TPC-H queries compared to the default configuration.
3) Dividing the application across multiple smaller JVMs instead of a single large JVM helped reduce garbage collection overhead and resource contention, improving performance by up to 16%.
This document summarizes Kazuaki Ishizaki's keynote presentation at the Fourth International Symposium on Computing and Networking (CANDAR'16) on transparent GPU exploitation for Java. The presentation covered Ishizaki's research history developing compilers and optimizing code for GPUs. It described a Java just-in-time compiler that can generate optimized GPU code from parallel loops in Java programs without requiring programmers to manage low-level GPU operations like data transfers and memory allocation themselves. The compiler implements optimizations like array alignment, read-only caching, and reducing data copying to improve GPU performance. The goal is to make GPU programming easier and more portable across hardware for Java programmers.
Vector and ListBuffer have similar performance for random reads. Benchmarking showed no significant difference in throughput, average time, or sample times between reading randomly from a Vector versus a ListBuffer. Vectors are generally faster than Lists for random access due to Vectors being implemented as arrays under the hood.
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsChris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs @ Strata London, May 24 2017
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
This document summarizes a presentation about Netflix's big data platform and Spark. The key points are:
1. Netflix uses Apache Spark on YARN and Mesos clusters to process batch and streaming data from sources like Cassandra and Kafka.
2. Netflix has contributed improvements to Spark's dynamic resource allocation, predicate pushdown, and support for S3 filesystems.
3. A use case showed Spark outperforming Pig for an iterative job that duplicated and aggregated data in multiple steps.
Stream processing is designed for continuously processing unbounded data streams. It allows for unbounded data inputs and continuous processing, unlike batch processing which requires bounded, finite data sets. The key challenges of stream processing include out-of-order data arrival and needing to relate events that occur close together in time but may be processed out of order. To address this, stream processing systems use watermarks to indicate processing progress, triggers to determine output timing, and accumulation to handle refinements from late data.
The document discusses challenges in implementing a dynamic language like JavaScript on the Java Virtual Machine (JVM). Some key points:
- Nashorn is a JavaScript runtime written in Java that generates JVM bytecode, aiming to be 2-10x faster than previous solutions like Rhino.
- Compiling JavaScript to JVM bytecode is difficult as JavaScript has dynamic types, runtime changes, and number representations that don't map cleanly to Java's static types.
- Nashorn uses static analysis to infer types where possible and optimize for primitive number representations, but this only goes so far with JavaScript's dynamic nature.
- As JavaScript code changes, Nashorn may need to transition to more dynamic, adaptive and optimistic techniques
The document summarizes a presentation given by Chris Fregly on end-to-end real-time analytics using Apache Spark. It discusses topics like Spark streaming, machine learning, tuning Spark for performance, and demonstrates live demos of sorting, matrix multiplication, and thread synchronization optimized for CPU cache. The presentation emphasizes techniques like cache-friendly data layouts, prefetching, and lock-free algorithms to improve Spark performance.
Apache Spark 2.0 includes improvements that provide considerable speedups for CPU-intensive queries through techniques like code generation. Profiling tools like flame graphs can help analyze where CPU cycles are spent by visualizing stack traces. Flame graphs are useful for performance troubleshooting but have limitations. Testing Spark applications locally and through unit tests allows faster iteration compared to running on clusters and saves resources. It is also important to test with local approximations of distributed components like HDFS and Hive.
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...Chris Fregly
http://pipeline.ai
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
http://pipeline.ai
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
Chris Fregly, Founder @ PipelineAI, will walk you through a real-world, complete end-to-end Pipeline-optimization example. We highlight hyper-parameters - and model pipeline phases - that have never been exposed until now.
While most Hyperparameter Optimizers stop at the training phase (ie. learning rate, tree depth, ec2 instance type, etc), we extend model validation and tuning into a new post-training optimization phase including 8-bit reduced precision weight quantization and neural network layer fusing - among many other framework and hardware-specific optimizations.
Next, we introduce hyperparameters at the prediction phase including request-batch sizing and chipset (CPU v. GPU v. TPU).
Lastly, we determine a PipelineAI Efficiency Score of our overall Pipeline including Cost, Accuracy, and Time. We show techniques to maximize this PipelineAI Efficiency Score using our massive PipelineDB along with the Pipeline-wide hyper-parameter tuning techniques mentioned in this talk.
Bio
Chris Fregly is Founder and Applied AI Engineer at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.
He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production with Kubernetes and GPUs."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
This document provides an overview of a presentation on optimizing TensorFlow models for high performance and production with GPUs. The presentation covers optimizing both TensorFlow model training and model serving. For model training, topics include using GPUs with TensorFlow, feeding and debugging models, distributed training, and optimizing with XLA compiler. For model serving, topics are post-processing, TensorFlow Serving, and Ahead-of-Time compilation. The code and materials from the presentation are available in an open source GitHub repository.
This session talks about how unit testing of Spark applications is done, as well as tells the best way to do it. This includes writing unit tests with and without Spark Testing Base package, which is a spark package containing base classes to use when writing tests with Spark.
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
Comparing TensorFlow NLP Options: word2Vec, gloVe, RNN/LSTM, SyntaxNet, and Penn Treebank: Through code samples and demos, we’ll compare the architectures and algorithms of the various TensorFlow NLP options. We’ll explore both feed-forward and recurrent neural networks such as word2vec, gloVe, RNN/LSTM, SyntaxNet, and Penn Treebank using the latest TensorFlow libraries.
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Chris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
1. The document discusses using Deeplearning4j and Kafka together for machine learning workflows. It describes how Deeplearning4j can be used to build, train, and deploy neural networks on JVM and Spark, while Kafka can be used to stream data for training and inference.
2. An example application is described that performs anomaly detection on log files from a CDN by aggregating the data to reduce the number of data points. This allows the model to run efficiently on available GPU hardware.
3. The document provides a link to a GitHub repository with a code example that uses Kafka to stream data, Keras to train a model, and Deeplearning4j to perform inference in Java and deploy the model.
[212]big models without big data using domain specific deep networks in data-...NAVER D2
The document discusses techniques for using deep learning with limited data. It presents methods for data synthesis, domain adaptation, and data cleaning. For data synthesis, it describes using a game engine to procedurally generate synthetic videos with automatic annotations for action recognition training. For domain adaptation, it applies a model trained on mouse tracking saliency data to eye tracking data. For data cleaning, it introduces a technique to prune noisy images from a landmark dataset to obtain reliable training annotations. The techniques aim to leverage limited data to train deep networks for tasks like saliency mapping, image retrieval, and action recognition.
This document compares and summarizes several deep learning frameworks: Caffe, Chainer, CNTK, DL4J, Keras, MXNet, TensorFlow, and Theano. It describes who created each framework, when it was released, example applications, design motivations, and key features from technical, design, and programming perspectives.
This document introduces Failsafe, which provides latency and fault tolerance capabilities for distributed systems. It discusses how Failsafe compares to Hystrix and how it is used at Coupang. Key features of Failsafe include retry policies, circuit breakers, fallbacks, and asynchronous execution. Event listeners and policy composition patterns are also covered.
unassert - encourage reliable programming by writing assertions in productionTakuto Wada
unassert - Encourage Design by Contract (DbC) by writing assertions in production code, and compiling them away from release.
Takuto Wada
2015/11/07 @nodefest Tokkyo 2015
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
http://pipeline.ai
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment. This talk is 100% demo based on open source tools and completely reproducible through Docker on your own GPU cluster.
Bio
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High-Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
This document compares the batch and streaming capabilities of Spark and Storm. Spark supports both batch and micro-batch processing while Storm supports micro-batch and real-time stream processing. Spark has been in production mode since 2013 and is implemented in Scala, while Storm has been used since 2011 and is implemented in Clojure and Java. Spark includes libraries for SQL, streaming, and machine learning while Storm uses spouts to read data streams and bolts to filter and join data in topologies. Both integrate with Hadoop and support fault tolerance, though Spark has improved reliability when used with YARN. Performance tests show Spark Streaming can process more records per second than Storm.
This is an introduction to polyaxon and why I use polyaxon.
Polyaxon enables me to leverage kubernetes to achieve the objectives:
- Make the lead time of experiments as short as possible.
- Make the financial cost to train models as cheap as possible.
- Make the experiments reproducible.
Datacratic is the leader in real-time machine learning and decisioning and the creator of the RTBkit Open-Source Project. Mark Weiss, head of client solutions at Datacratic shares some of the challenges companies and developers face today as they move into Real Time Bidding. In this presentation he does a developer deep dive into design and implementation choices, technologies, plugins and provide some real world RTB customer use cases. You will also learn how you can join the RTBkit community get support for your upcoming RTBkit initiatives.
DeviceAtlas - 6 Ways Ad Platforms Can Harness Device DataMartin Clancy
How device intelligence can unlock the mobile advertising opportunity in a multi-screen world. 6 methods to target advertising by using knowledge of the requesting device, how ad platforms can use them.
The document discusses challenges in implementing a dynamic language like JavaScript on the Java Virtual Machine (JVM). Some key points:
- Nashorn is a JavaScript runtime written in Java that generates JVM bytecode, aiming to be 2-10x faster than previous solutions like Rhino.
- Compiling JavaScript to JVM bytecode is difficult as JavaScript has dynamic types, runtime changes, and number representations that don't map cleanly to Java's static types.
- Nashorn uses static analysis to infer types where possible and optimize for primitive number representations, but this only goes so far with JavaScript's dynamic nature.
- As JavaScript code changes, Nashorn may need to transition to more dynamic, adaptive and optimistic techniques
The document summarizes a presentation given by Chris Fregly on end-to-end real-time analytics using Apache Spark. It discusses topics like Spark streaming, machine learning, tuning Spark for performance, and demonstrates live demos of sorting, matrix multiplication, and thread synchronization optimized for CPU cache. The presentation emphasizes techniques like cache-friendly data layouts, prefetching, and lock-free algorithms to improve Spark performance.
Apache Spark 2.0 includes improvements that provide considerable speedups for CPU-intensive queries through techniques like code generation. Profiling tools like flame graphs can help analyze where CPU cycles are spent by visualizing stack traces. Flame graphs are useful for performance troubleshooting but have limitations. Testing Spark applications locally and through unit tests allows faster iteration compared to running on clusters and saves resources. It is also important to test with local approximations of distributed components like HDFS and Hive.
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...Chris Fregly
http://pipeline.ai
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
http://pipeline.ai
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
Chris Fregly, Founder @ PipelineAI, will walk you through a real-world, complete end-to-end Pipeline-optimization example. We highlight hyper-parameters - and model pipeline phases - that have never been exposed until now.
While most Hyperparameter Optimizers stop at the training phase (ie. learning rate, tree depth, ec2 instance type, etc), we extend model validation and tuning into a new post-training optimization phase including 8-bit reduced precision weight quantization and neural network layer fusing - among many other framework and hardware-specific optimizations.
Next, we introduce hyperparameters at the prediction phase including request-batch sizing and chipset (CPU v. GPU v. TPU).
Lastly, we determine a PipelineAI Efficiency Score of our overall Pipeline including Cost, Accuracy, and Time. We show techniques to maximize this PipelineAI Efficiency Score using our massive PipelineDB along with the Pipeline-wide hyper-parameter tuning techniques mentioned in this talk.
Bio
Chris Fregly is Founder and Applied AI Engineer at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.
He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production with Kubernetes and GPUs."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
This document provides an overview of a presentation on optimizing TensorFlow models for high performance and production with GPUs. The presentation covers optimizing both TensorFlow model training and model serving. For model training, topics include using GPUs with TensorFlow, feeding and debugging models, distributed training, and optimizing with XLA compiler. For model serving, topics are post-processing, TensorFlow Serving, and Ahead-of-Time compilation. The code and materials from the presentation are available in an open source GitHub repository.
This session talks about how unit testing of Spark applications is done, as well as tells the best way to do it. This includes writing unit tests with and without Spark Testing Base package, which is a spark package containing base classes to use when writing tests with Spark.
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
Comparing TensorFlow NLP Options: word2Vec, gloVe, RNN/LSTM, SyntaxNet, and Penn Treebank: Through code samples and demos, we’ll compare the architectures and algorithms of the various TensorFlow NLP options. We’ll explore both feed-forward and recurrent neural networks such as word2vec, gloVe, RNN/LSTM, SyntaxNet, and Penn Treebank using the latest TensorFlow libraries.
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Chris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
1. The document discusses using Deeplearning4j and Kafka together for machine learning workflows. It describes how Deeplearning4j can be used to build, train, and deploy neural networks on JVM and Spark, while Kafka can be used to stream data for training and inference.
2. An example application is described that performs anomaly detection on log files from a CDN by aggregating the data to reduce the number of data points. This allows the model to run efficiently on available GPU hardware.
3. The document provides a link to a GitHub repository with a code example that uses Kafka to stream data, Keras to train a model, and Deeplearning4j to perform inference in Java and deploy the model.
[212]big models without big data using domain specific deep networks in data-...NAVER D2
The document discusses techniques for using deep learning with limited data. It presents methods for data synthesis, domain adaptation, and data cleaning. For data synthesis, it describes using a game engine to procedurally generate synthetic videos with automatic annotations for action recognition training. For domain adaptation, it applies a model trained on mouse tracking saliency data to eye tracking data. For data cleaning, it introduces a technique to prune noisy images from a landmark dataset to obtain reliable training annotations. The techniques aim to leverage limited data to train deep networks for tasks like saliency mapping, image retrieval, and action recognition.
This document compares and summarizes several deep learning frameworks: Caffe, Chainer, CNTK, DL4J, Keras, MXNet, TensorFlow, and Theano. It describes who created each framework, when it was released, example applications, design motivations, and key features from technical, design, and programming perspectives.
This document introduces Failsafe, which provides latency and fault tolerance capabilities for distributed systems. It discusses how Failsafe compares to Hystrix and how it is used at Coupang. Key features of Failsafe include retry policies, circuit breakers, fallbacks, and asynchronous execution. Event listeners and policy composition patterns are also covered.
unassert - encourage reliable programming by writing assertions in productionTakuto Wada
unassert - Encourage Design by Contract (DbC) by writing assertions in production code, and compiling them away from release.
Takuto Wada
2015/11/07 @nodefest Tokkyo 2015
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
http://pipeline.ai
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment. This talk is 100% demo based on open source tools and completely reproducible through Docker on your own GPU cluster.
Bio
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High-Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
This document compares the batch and streaming capabilities of Spark and Storm. Spark supports both batch and micro-batch processing while Storm supports micro-batch and real-time stream processing. Spark has been in production mode since 2013 and is implemented in Scala, while Storm has been used since 2011 and is implemented in Clojure and Java. Spark includes libraries for SQL, streaming, and machine learning while Storm uses spouts to read data streams and bolts to filter and join data in topologies. Both integrate with Hadoop and support fault tolerance, though Spark has improved reliability when used with YARN. Performance tests show Spark Streaming can process more records per second than Storm.
This is an introduction to polyaxon and why I use polyaxon.
Polyaxon enables me to leverage kubernetes to achieve the objectives:
- Make the lead time of experiments as short as possible.
- Make the financial cost to train models as cheap as possible.
- Make the experiments reproducible.
Datacratic is the leader in real-time machine learning and decisioning and the creator of the RTBkit Open-Source Project. Mark Weiss, head of client solutions at Datacratic shares some of the challenges companies and developers face today as they move into Real Time Bidding. In this presentation he does a developer deep dive into design and implementation choices, technologies, plugins and provide some real world RTB customer use cases. You will also learn how you can join the RTBkit community get support for your upcoming RTBkit initiatives.
DeviceAtlas - 6 Ways Ad Platforms Can Harness Device DataMartin Clancy
How device intelligence can unlock the mobile advertising opportunity in a multi-screen world. 6 methods to target advertising by using knowledge of the requesting device, how ad platforms can use them.
Dlaczego warto regularnie dbać o wydajność aplikacji od samego początku jej tworzenia. Jak można wykorzystać do tego narzędzie Gatling i jakie daje możliwości?
Real-time bidding (RTB) is a programmatic advertising method where online display ad space is bought and sold on an impression-by-impression basis through real-time auctions that occur within milliseconds of a user visiting a webpage. Advertisers bid on ad impressions for specific audiences in real-time based on user data and can target more valuable users, potentially achieving better results like lower cost-per-acquisition than bulk buying impressions for large audiences.
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...Shuai Yuan
Computational Advertising has been an important topical area in information retrieval and knowledge management. This tutorial will be focused on real-time advertising, aka Real-Time Bidding (RTB), the fundamental shift in the field of computational advertising. It is strongly related to CIKM areas such as user log analysis and modelling, information retrieval, text mining, knowledge extraction and management, behaviour targeting, recommender systems, personalization, and data management platform.
This tutorial aims to provide not only a comprehensive and systemic introduction to RTB and computational advertising in general, but also the emerging research challenges and research tools and datasets in order to facilitate the research. Compared to previous Computational Advertising tutorials in relevant top-tier conferences, this tutorial takes a fresh, neutral, and the latest look of the field and focuses on the fundamental changes brought by RTB.
We will begin by giving a brief overview of the history of online advertising and present the current eco-system in which RTB plays an increasingly important part. Based on our field study and the DSP optimisation contest organised by iPinyou, we analyse optimization problems both from the demand side (advertisers) and the supply side (publishers), as well as the auction mechanism design challenges for Ad exchanges. We discuss how IR, DM and ML techniques have been applied to these problems. In addition, we discuss why game theory is important in this area and how it could be extended beyond the auction mechanism design.
CIKM is an ideal venue for this tutorial because RTB is an area of multiple disciplines, including information retrieval, data mining, knowledge discovery and management, and game theory, most of which are traditionally the key themes of the conference. As an illustration of practical application in the real world, we shall cover algorithms in the iPinyou global DSP optimisation contest on a production platform; for the supply side, we also report experiments of inventory management, reserve price optimisation, etc. in production systems.
We expect the audience, after attending the tutorial, to understand the real-time online advertising mechanisms and the state of the art techniques, as well as to grasp the research challenges in this field. Our motivation is to help the audience acquire domain knowledge and obtain relevant datasets, and to promote research activities in RTB and computational advertising in general.
Incremental development is easy when we are talking about functionality. Story splitting has become quite popular as a technique lately.
But what about those cases when you need to do an architectural refactoring? Could incremental development be applied?
(Talk delivered during I T.A.K.E. Unconference 2015)
Flavius Ștef: Big Rewrites Without Big Risks at I T.A.K.E. UnconferenceMozaic Works
This document discusses strategies for incrementally rewriting software architecture without big risks. It suggests splitting large rewrites into smaller chunks and using techniques like faking functionality until it is implemented, incremental refactoring, and rewriting in small bites. Specific strategies discussed include researching prerequisites, creating proofs of concept, adding unit and performance tests incrementally, and removing dependencies one by one. Choosing the right refactoring is important, and catalogues of refactorings can provide guidance. An incremental process of analyzing, planning, refactoring, testing and integrating changes in small batches is recommended to validate progress toward goals at each step.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
The document discusses emerging trends in software and services including:
1) Software as a Service and cloud computing which allows software to be delivered and consumed "as a service" with service level agreements.
2) The growth of massive data centers which are becoming large physical assets requiring significant capital expenditures.
3) The rise of "Dev-signers" or designer-developers who are combining development and design skills.
4) The integration of software and services will be key as local software interacts with internet services to provide combined capabilities.
Maximize Big Data ROI via Best of Breed Patterns and PracticesJeff Bertman
This presentation discusses maximizing ROI from big data technologies and architectures. It introduces the concept of a fitness technology landscape (FiTL) to evaluate different data platform options based on factors like cost. The presentation advocates using a polyglot or best-of-breed approach using multiple technologies to address diverse use cases. This includes using different technologies for extraction, loading, and transformation of data in integrated architectures. Maximizing ROI requires balancing factors like functionality, cost, scalability and other considerations for each specific use case.
The document provides a checklist for front-end performance optimization. It includes recommendations to establish performance metrics and goals, optimize assets like images, videos, fonts and JavaScript, choose frameworks and CDNs wisely, and set priorities to optimize the core experience for all users. Key metrics to target include a Time to Interactive under 5 seconds on 3G and First Input Delay below 100ms.
Highway to heaven - Microservices Meetup MunichChristian Deger
The document summarizes AutoScout24's transition from a monolithic architecture to microservices in the cloud. Some key points:
- They moved from an on-premise Microsoft-based stack to AWS and a microservices architecture using JVM and Linux.
- This was a major technical transformation to become "mobile first" and reduce costs and time to market while attracting new talent.
- They established architectural principles like event sourcing, autonomous teams, infrastructure as code, and shared-nothing.
- DynamoDB is now used as the "atom feed" between services while eliminating tight coupling.
- Teams are organized around business capabilities rather than projects to improve agility.
Questions Log: Dynamic Cubes – Set to Retire Transformer?Senturus
This document contains a questions log from a webinar about optimizing Cognos performance. It includes questions from webinar attendees about topics like using virtual cubes and dynamic cubes to address large data volumes, optimizing in-memory aggregates, hardware sizing requirements for dynamic cubes, and configuration considerations when using dynamic cubes. The questions are answered in detail to help attendees understand how to best implement and optimize dynamic cubes in Cognos.
The document discusses deep learning techniques for financial technology (FinTech) applications. It begins with examples of current deep learning uses in FinTech like trading algorithms, fraud detection, and personal finance assistants. It then covers topics like specialized compute hardware for deep learning training and inference, optimization techniques for CPUs and GPUs, and distributed training approaches. Finally, it discusses emerging areas like FPGA and quantum computing and provides resources for practitioners to start with deep learning for FinTech.
Dataiku productive application to production - pap is may 2015 Dataiku
This document discusses the development of predictive applications and outlines a vision for a platform called "Blue Box" that could help address many of the challenges in building and deploying these applications at scale. It notes that building predictive applications currently requires integrating multiple separate components. The document then describes desired features for the Blue Box platform, such as data cleansing, external data integration, model updating, decision logic, auditing, and serving predictions in real-time. It poses questions about how such a platform could be created, whether through open source or a commercial offering.
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
“Customer experience is the next big battle ground for telcos,” proclaimed recently Amit Akhelikar, Global Director of Lynx Analytics at TM Forum Live! Asia in Singapore. But, how to fight in this battle? A common approach has been to keep “under control” some well-known network quality indicators, like dropped calls, radio access congestion, availability, and so on; but this has proven not to be enough to keep customers happy, like a siege weapon is not enough to conquer a city. But, what if it were possible to know how customers perceive services, at least most demanded ones, like web browsing or video streaming? That would be like a squad of archers ready to battle. And even having that, how to extract value of it and take actions in no time, giving our skilled archers the right targets? Meet CANVAS (Customer And Network Visualization and AnaltyticS), one of the first LATAM implementations of a Flink-based stream processing use case for a telco, which successfully combines leading and innovative technologies like Apache Hadoop, YARN, Kafka, Nifi, Druid and advanced visualizations with Flink core features like non-trivial stateful stream processing (joins, windows and aggregations on event time) and CEP capabilities for alarm generation, delivering a next-generation tool for SOC (Service Operation Center) teams.
The document provides an overview of the key topics and objectives covered in a computer architecture course. It discusses trends in technology that have led to changes in computer systems over time, including exponential increases in processor performance, memory capacity and speeds, and network bandwidth. It introduces various performance metrics and challenges in benchmarking systems. Important principles of computer design are outlined, such as Amdahl's law, exploiting parallelism, and making common cases fast. Quantitative analysis tools like simulations and queueing theory are also summarized.
This document provides an overview of the key topics and objectives covered in a computer architecture course. The course aims to evaluate instruction set design tradeoffs, advanced pipelining techniques, solutions to increasing memory latency, and qualitative and quantitative tradeoffs in modern computer system design. Key areas covered include instruction set architecture, memory hierarchies, pipelining, parallelism, and performance evaluation metrics.
Capacity Planning Infrastructure for Web Applications (Drupal)Ricardo Amaro
In this session we will try to solve a couple of recurring problems:
Site Launch and User expectations
Imagine a customer that provides a set of needs for hardware, sets a date and launches the site, but then he forgets to warn that they have sent out some (thousands of) emails to half the world announcing their new website launch! What do you think it will happen?
Of course launching a Drupal Site involves a lot of preparation steps and there are plenty of guides out there about common Drupal Launch Readiness Checklists which is not a problem anymore.
What we are really missing here is a Plan for Capacity.
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
What are the design considerations that go into architecting a modern data warehouse? This presentation will cover some of the requirements analysis, design decisions, and execution challenges of building a modern data lake/data warehouse.
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...Altinity Ltd
Christophe Kalenzaga and Vianney Foucault from ContentSquare summarize their experience using ClickHouse as their new backend database. They were seeking to replace their ElasticSearch clusters due to high storage, compute and scaling costs. Through a series of benchmarks, they found ClickHouse to be significantly faster than ElasticSearch or other alternatives, with response times up to 85% better. It also helped reduce their infrastructure costs. While ClickHouse requires effort to master and lacks some tooling, they found it stable and fast enough to recommend it for other companies seeking to replace expensive ElasticSearch deployments.
Enterprise application performance - Understanding & LearningsDhaval Shah
This document discusses enterprise application performance, including:
- Performance basics like response time, throughput, and availability
- Common metrics like response time, transactions per second, and concurrent users
- Factors that affect performance such as software issues, configuration settings, and hardware resources
- Case studies where the author analyzed memory leaks, optimized services, and addressed an inability to meet non-functional requirements
- Learnings around heap dump analysis, hotspot identification, and database monitoring
Discover why Wi-Fi 7 is set to transform wireless networking and how Router Architects is leading the way with next-gen router designs built for speed, reliability, and innovation.
Societal challenges of AI: biases, multilinguism and sustainabilityJordi Cabot
Towards a fairer, inclusive and sustainable AI that works for everybody.
Reviewing the state of the art on these challenges and what we're doing at LIST to test current LLMs and help you select the one that works best for you
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?steaveroggers
Migrating from Lotus Notes to Outlook can be a complex and time-consuming task, especially when dealing with large volumes of NSF emails. This presentation provides a complete guide on how to batch export Lotus Notes NSF emails to Outlook PST format quickly and securely. It highlights the challenges of manual methods, the benefits of using an automated tool, and introduces eSoftTools NSF to PST Converter Software — a reliable solution designed to handle bulk email migrations efficiently. Learn about the software’s key features, step-by-step export process, system requirements, and how it ensures 100% data accuracy and folder structure preservation during migration. Make your email transition smoother, safer, and faster with the right approach.
Read More:- https://www.esofttools.com/nsf-to-pst-converter.html
F-Secure Freedome VPN 2025 Crack Plus Activation New Versionsaimabibi60507
Copy & Past Link 👉👉
https://dr-up-community.info/
F-Secure Freedome VPN is a virtual private network service developed by F-Secure, a Finnish cybersecurity company. It offers features such as Wi-Fi protection, IP address masking, browsing protection, and a kill switch to enhance online privacy and security .
Get & Download Wondershare Filmora Crack Latest [2025]saniaaftab72555
Copy & Past Link 👉👉
https://dr-up-community.info/
Wondershare Filmora is a video editing software and app designed for both beginners and experienced users. It's known for its user-friendly interface, drag-and-drop functionality, and a wide range of tools and features for creating and editing videos. Filmora is available on Windows, macOS, iOS (iPhone/iPad), and Android platforms.
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Eric D. Schabell
It's time you stopped letting your telemetry data pressure your budgets and get in the way of solving issues with agility! No more I say! Take back control of your telemetry data as we guide you through the open source project Fluent Bit. Learn how to manage your telemetry data from source to destination using the pipeline phases covering collection, parsing, aggregation, transformation, and forwarding from any source to any destination. Buckle up for a fun ride as you learn by exploring how telemetry pipelines work, how to set up your first pipeline, and exploring several common use cases that Fluent Bit helps solve. All this backed by a self-paced, hands-on workshop that attendees can pursue at home after this session (https://o11y-workshops.gitlab.io/workshop-fluentbit).
PDF Reader Pro Crack Latest Version FREE Download 2025mu394968
🌍📱👉COPY LINK & PASTE ON GOOGLE https://dr-kain-geera.info/👈🌍
PDF Reader Pro is a software application, often referred to as an AI-powered PDF editor and converter, designed for viewing, editing, annotating, and managing PDF files. It supports various PDF functionalities like merging, splitting, converting, and protecting PDFs. Additionally, it can handle tasks such as creating fillable forms, adding digital signatures, and performing optical character recognition (OCR).
How can one start with crypto wallet development.pptxlaravinson24
This presentation is a beginner-friendly guide to developing a crypto wallet from scratch. It covers essential concepts such as wallet types, blockchain integration, key management, and security best practices. Ideal for developers and tech enthusiasts looking to enter the world of Web3 and decentralized finance.
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...Egor Kaleynik
This case study explores how we partnered with a mid-sized U.S. healthcare SaaS provider to help them scale from a successful pilot phase to supporting over 10,000 users—while meeting strict HIPAA compliance requirements.
Faced with slow, manual testing cycles, frequent regression bugs, and looming audit risks, their growth was at risk. Their existing QA processes couldn’t keep up with the complexity of real-time biometric data handling, and earlier automation attempts had failed due to unreliable tools and fragmented workflows.
We stepped in to deliver a full QA and DevOps transformation. Our team replaced their fragile legacy tests with Testim’s self-healing automation, integrated Postman and OWASP ZAP into Jenkins pipelines for continuous API and security validation, and leveraged AWS Device Farm for real-device, region-specific compliance testing. Custom deployment scripts gave them control over rollouts without relying on heavy CI/CD infrastructure.
The result? Test cycle times were reduced from 3 days to just 8 hours, regression bugs dropped by 40%, and they passed their first HIPAA audit without issue—unlocking faster contract signings and enabling them to expand confidently. More than just a technical upgrade, this project embedded compliance into every phase of development, proving that SaaS providers in regulated industries can scale fast and stay secure.
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMaxim Salnikov
Imagine if apps could think, plan, and team up like humans. Welcome to the world of AI agents and agentic user interfaces (UI)! In this session, we'll explore how AI agents make decisions, collaborate with each other, and create more natural and powerful experiences for users.
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Andre Hora
Exceptions allow developers to handle error cases expected to occur infrequently. Ideally, good test suites should test both normal and exceptional behaviors to catch more bugs and avoid regressions. While current research analyzes exceptions that propagate to tests, it does not explore other exceptions that do not reach the tests. In this paper, we provide an empirical study to explore how frequently exceptional behaviors are tested in real-world systems. We consider both exceptions that propagate to tests and the ones that do not reach the tests. For this purpose, we run an instrumented version of test suites, monitor their execution, and collect information about the exceptions raised at runtime. We analyze the test suites of 25 Python systems, covering 5,372 executed methods, 17.9M calls, and 1.4M raised exceptions. We find that 21.4% of the executed methods do raise exceptions at runtime. In methods that raise exceptions, on the median, 1 in 10 calls exercise exceptional behaviors. Close to 80% of the methods that raise exceptions do so infrequently, but about 20% raise exceptions more frequently. Finally, we provide implications for researchers and practitioners. We suggest developing novel tools to support exercising exceptional behaviors and refactoring expensive try/except blocks. We also call attention to the fact that exception-raising behaviors are not necessarily “abnormal” or rare.
🌍📱👉COPY LINK & PASTE ON GOOGLE http://drfiles.net/ 👈🌍
Adobe Illustrator is a powerful, professional-grade vector graphics software used for creating a wide range of designs, including logos, icons, illustrations, and more. Unlike raster graphics (like photos), which are made of pixels, vector graphics in Illustrator are defined by mathematical equations, allowing them to be scaled up or down infinitely without losing quality.
Here's a more detailed explanation:
Key Features and Capabilities:
Vector-Based Design:
Illustrator's foundation is its use of vector graphics, meaning designs are created using paths, lines, shapes, and curves defined mathematically.
Scalability:
This vector-based approach allows for designs to be resized without any loss of resolution or quality, making it suitable for various print and digital applications.
Design Creation:
Illustrator is used for a wide variety of design purposes, including:
Logos and Brand Identity: Creating logos, icons, and other brand assets.
Illustrations: Designing detailed illustrations for books, magazines, web pages, and more.
Marketing Materials: Creating posters, flyers, banners, and other marketing visuals.
Web Design: Designing web graphics, including icons, buttons, and layouts.
Text Handling:
Illustrator offers sophisticated typography tools for manipulating and designing text within your graphics.
Brushes and Effects:
It provides a range of brushes and effects for adding artistic touches and visual styles to your designs.
Integration with Other Adobe Software:
Illustrator integrates seamlessly with other Adobe Creative Cloud apps like Photoshop, InDesign, and Dreamweaver, facilitating a smooth workflow.
Why Use Illustrator?
Professional-Grade Features:
Illustrator offers a comprehensive set of tools and features for professional design work.
Versatility:
It can be used for a wide range of design tasks and applications, making it a versatile tool for designers.
Industry Standard:
Illustrator is a widely used and recognized software in the graphic design industry.
Creative Freedom:
It empowers designers to create detailed, high-quality graphics with a high degree of control and precision.
FL Studio Producer Edition Crack 2025 Full Versiontahirabibi60507
Copy & Past Link 👉👉
http://drfiles.net/
FL Studio is a Digital Audio Workstation (DAW) software used for music production. It's developed by the Belgian company Image-Line. FL Studio allows users to create and edit music using a graphical user interface with a pattern-based music sequencer.
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AIdanshalev
If we were building a GenAI stack today, we'd start with one question: Can your retrieval system handle multi-hop logic?
Trick question, b/c most can’t. They treat retrieval as nearest-neighbor search.
Today, we discussed scaling #GraphRAG at AWS DevOps Day, and the takeaway is clear: VectorRAG is naive, lacks domain awareness, and can’t handle full dataset retrieval.
GraphRAG builds a knowledge graph from source documents, allowing for a deeper understanding of the data + higher accuracy.
Why Orangescrum Is a Game Changer for Construction Companies in 2025Orangescrum
Orangescrum revolutionizes construction project management in 2025 with real-time collaboration, resource planning, task tracking, and workflow automation, boosting efficiency, transparency, and on-time project delivery.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Dele Amefo
Adtech x Scala x Performance tuning
1. ×
∼ Best Practice for Better Performance ∼
Scala Days 2015 San Francisco Un-conference
2015-03-19 @mogproject
Ad Tech Performance
Tuning
Scala
×
2. Agenda
About Demand Side Science
Introduction to Performance Tuning
Best Practice in Development
Japanese language version here:
http://www.slideshare.net/mogproject/scala-41799241
3. Yosuke Mizutani (@mogproject)
Joined Demand Side Science in April 2013
(thanks to Scala Conference in Japan 2013)
Full-stack engineer (want to be…)
Background: 9-year infrastructure engineer
About Me
10. Advertiser’s side of realtime ads bidding (RTB)
What is DSP
Supply Side Platform
11. Dec 2013
Moved into the group of Opt, the e-marketing agency
Oct 2014
Released dynamic creative tool unis
Brief History of DSS
× ×
12. unis is a third-party ad server which creates
dynamic and/or personalized ads under the rules.
http://www.opt.ne.jp/news/pr/detail/id=2492
unis
items on sale most popular
items
fixed items
re-targeting
13. With venture mind + advantage of Opt group …
Future of DSS
Demand
×
Side
Science×
14. We will create various products based on Science!
Future of DSS
???
×
???
Science×
22. Application goes wrong with high load
Bad latency under the specific condition
Slow batch execution than expectations
Slow development tools
Resolve an Issue
23. Very important especially in ad tech industry
Cost tends to go bigger and bigger
High traffic
Need to response in few milli seconds
Big database, big log data
Business requires
Benefit from mass delivery > Infra Investment
Reduce Infrastructure Cost
24. You need to care about
cost (≒ engineer’s time) and
risk (possibility to cause new trouble)
for performance tuning itself.
Don’t lose you goal
Scaling up/out of Infra can be the best
solution, naively
Don’t want to be perfect
25. We iterate
Basic of Performance Tuning
Measure metrics
× Find bottleneck
Try with hypothesis×
Don't take erratic steps.
27. ※CAUTION: This is my own impression
Bottle Neck in My Experience
others
1%
Network
4%
JVM parameter
5%
Library
5%
OS
10%
Scala
10%
Async・Thread
15%
Database
(RDBMS/NOSQL)
50%
29. Approximate timing for various operations
http://norvig.com/21-days.html#answers
execute typical instruction 1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory 0.5 nanosec
branch misprediction 5 nanosec
fetch from L2 cache memory 7 nanosec
Mutex lock/unlock 25 nanosec
fetch from main memory 100 nanosec
send 2K bytes over 1Gbps network 20,000 nanosec
read 1MB sequentially from memory 250,000 nanosec
fetch from new disk location (seek) 8,000,000 nanosec
read 1MB sequentially from disk 20,000,000 nanosec
send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec
30. If Typical Instruction Takes 1 second…
https://www.coursera.org/course/reactive week3-2
execute typical instruction 1 second
fetch from L1 cache memory 0.5 seconds
branch misprediction 5 seconds
fetch from L2 cache memory 7 seconds
Mutex lock/unlock ½ minute
fetch from main memory 1½ minute
send 2K bytes over 1Gbps network 5½ hours
read 1MB sequentially from memory 3 days
fetch from new disk location (seek) 13 weeks
read 1MB sequentially from disk 6½ months
send packet US to Europe and back 5 years
31. A batch
reads 1,000,000 files of 10KB
from disk
for each time.
Data size:
10KB × 1,000,000 ≒ 10GB
Horrible and True Story
32. Assuming 1,000,000 seeks are needed,
Estimated time:
8ms × 106
+ 20ms × 10,000 ≒ 8,200 sec ≒ 2.5 h
If there is one file of 10GB and only one seek is
needed,
Estimated time:
8ms × 1 + 20ms × 10,000 ≒ 200 sec ≒ 3.5 min
Horrible and True Story
33. Have Respect for the Disk Head
http://en.wikipedia.org/wiki/Hard_disk_drive
35. In the other words…
JVM Performance Triangle
Compactness
Throughput Responsiveness
36. C × T × R = a
JVM Performance Triangle
Tuning: vary C, T, R for fixed a
Optimization: increase a
Reference:
Everything I ever learned about JVM performance tuning
@twitter by Attila Szegedi
http://www.beyondlinux.com/files/pub/qconhangzhou2011/Everything%20I%20ever%20learned
%20about%20JVM%20performance%20tuning%20@twitter%28Attila%20Szegedi%29.pdf
37. Agenda
About Demand Side Science
Introduction to Performance Tuning
Best Practice in Development
38. 1. Requirement Definition / Feasibility
2. Basic Design
3. Detailed Design
4. Building Infrastructure / Coding
5. System Testing
6. System Operation / Maintenance
Development Process
Only topics related to performance will be covered.
39. Make the agreement with stakeholders
about performance requirement
Requirement Definition / Feasibility
How many user IDs
internet users in Japan: 100 million
unique browsers: 200 ~ x00 million
will increase?
data expiration cycle?
type of devices / browsers?
opt-out rate?
40. Requirement Definition / Feasibility
Number of deliver requests for ads
Number of impressions per month
In case 1 billion / month
=> mean: 400 QPS (Query Per Second)
=> if peak rate = 250%, then 1,000 QPS
For RTB, bid rate? win rate?
Goal response time? Content size?
Plans for increasing?
How about Cookie Sync?
41. Requirement Definition / Feasibility
Number of receiving trackers
Timing of firing tracker
Click rate?
Conversion(*) rate?
* A conversion occurs when the user performs the
specific action that the advertiser has defined as the
campaign goal.
e.g. buying a product in an online store
42. Requirement Definition / Feasibility
Requirement for aggregation
Indicates to be aggregated
Is unique counting needed?
Any exception rules?
Who and when
secondary processing by ad agency?
Update interval
Storage period
43. Requirement Definition / Feasibility
Hard limit by business side
Sales plan
Christmas selling?
Annual sales target?
Total budget
44. The most important thing is to provide numbers,
although it is extremely difficult to approximate
precisely in the turbulent world of ad tech.
Requirement Definition / Feasibility
Architecture design needs assumed value
Performance testing needs numeric goal
46. Threading model design
Reduce blocking
Future based
Callback & function composition
Actor based
Message passing
Thread pool design
We can’t know the appropriate thread pool
size unless we complete performance
testing in production.
Basic Design
47. Database design
Access pattern / Number of lookup
Data size per one record
Create model of distribution when the size
is not constant
Number of records
Rate of growth / retention period
Memory usage
At first, measure the performance of the
database itself
Detailed Design
48. Log design
Consider compression ratio for disk usage
Cache design
Some software needs the double of capacity
for processing backup (e.g. Redis)
Detailed Design
49. Simplicity and clarity come first
“It is far, far easier to make a correct
program fast than it is to make a fast
program correct”
— C++ Coding Standards: 101 Rules, Guidelines, and Best Practices (C
++ in-depth series)
Building Infrastructure / Coding
52. Avoid the algorithm which is worse than linear
as possible
Measure, don’t guess
http://en.wikipedia.org/wiki/Unix_philosophy
Building Infrastructure / Coding
53. SBT Plugin for running OpenJDK JMH
(Java Microbenchmark Harness: Benchmark tool for Java)
https://github.com/ktoso/sbt-jmh
Micro Benchmark: sbt-jmh
54. addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.1.6")
Micro Benchmark: sbt-jmh
plugins.sbt
jmhSettings
build.sbt
import org.openjdk.jmh.annotations.Benchmark
class YourBench {
@Benchmark
def yourFunc(): Unit = ??? // write code to measure
}
YourBench.scala
Just put an annotation
55. > run -i 3 -wi 3 -f 1 -t 1
Micro Benchmark: sbt-jmh
Run benchmark in the sbt console
Number of
measurement
iterations to do
Number of warmup
iterations to do
How many times to forks
a single benchmark
Number of worker
threads to run with
56. [info] Benchmark Mode Samples Score Score error Units
[info] c.g.m.u.ContainsBench.listContains thrpt 3 41.033 25.573 ops/s
[info] c.g.m.u.ContainsBench.setContains thrpt 3 6.810 1.569 ops/s
Micro Benchmark: sbt-jmh
Result (excerpted)
By default, throughput score
will be displayed.
(larger is better)
http://mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html
57. Scala Optimization Example
Use Scala collection correctly
Prefer recursion to function call
by Prof. Martin Odersky in Scala Matsuri 2014
Try optimization libraries
58. def f(xs: List[Int], acc: List[Int] = Nil): List[Int] = {
if (xs.length < 4) {
(xs.sum :: acc).reverse
} else {
val (y, ys) = xs.splitAt(4)
f(ys, y.sum :: acc)
}
}
Horrible and True Story pt.2
Group by 4 elements of List[Int], then
calculate each sum respectively
scala> f((1 to 10).toList)
res1: List[Int] = List(10, 26, 19)
Example
59. Horrible and True Story pt.2
List#length takes time proportional to the
length of the sequence
When the length of the parameter xs is n,
time complexity of List#length is O(n)
Implemented in LinearSeqOptimized#length
https://github.com/scala/scala/blob/v2.11.4/src/library/scala/collection/
LinearSeqOptimized.scala#L35-43
60. Horrible and True Story pt.2
In function f,
xs.length will be evaluated n / 4 + 1 times,
so number of execution of f is also
proportional to n
Therefore,
time complexity of function f is O(n2)
It becomes too slow with big n
61. Horrible and True Story pt.2
For your information, the following one-liner does
same work using built-in method
scala> (1 to 10).grouped(4).map(_.sum).toList
res2: List[Int] = List(10, 26, 19)
63. Library for optimising Scala collection
(by using macro)
http://scala-blitz.github.io/
Presentation in Scala Days 2014
https://parleys.com/play/
53a7d2c6e4b0543940d9e549/chapter0/
about
ScalaBlitz
64. System feature testing
Interface testing
Performance testing
Reliability testing
Security testing
Operation testing
System Testing
65. Simple load testing
Scenario load testing
mixed load with typical user operations
Aging test (continuously running test)
Performance Testing
66. Apache attached
Simple benchmark tool
http://httpd.apache.org/docs/2.2/programs/ab.html
Adequate for naive requirements
Latest version recommended
(Amazon Linux pre-installed version’s bug made me sick)
Example
ab - Apache Bench
ab -C <CookieName=Value> -n <NumberOfRequests> -c <Concurrency> “<URL>“
67. Result example (excerpted)
ab - Apache Bench
Benchmarking example.com (be patient)
Completed 1200 requests
Completed 2400 requests
(略)
Completed 10800 requests
Completed 12000 requests
Finished 12000 requests
(略)
Concurrency Level: 200
Time taken for tests: 7.365 seconds
Complete requests: 12000
Failed requests: 0
Write errors: 0
Total transferred: 166583579 bytes
HTML transferred: 160331058 bytes
Requests per second: 1629.31 [#/sec] (mean)
Time per request: 122.751 [ms] (mean)
Time per request: 0.614 [ms] (mean, across all concurrent requests)
Transfer rate: 22087.90 [Kbytes/sec] received
(略)
Percentage of the requests served within a certain time (ms)
50% 116
66% 138
75% 146
80% 150
90% 161
95% 170
98% 185
99% 208
100% 308 (longest request)
Requests per second
= QPS
68. Load testing tool written in Scala
http://gatling.io
Gatling
69. An era of Apache JMeter has finished
Say good bye to scenario making with GUI
With Gatling,
You load write scenario with Scala DSL
Gatling
70. Care for the resource of stressor side
Resource of server (or PC)
Network router (CPU) can be bottleneck
Don’t tune two or more parameters at one
time
Leave change log and log files
Days for Testing and Tuning
72. Day-to-day logging and monitoring
Application log
GC log
Profiler
Anomaly detection from several metrics
Server resource (CPU, memory, disk, etc.)
abnormal response code
Latency
Trends visualization from several metrics
System Operation / Maintenance
76. stdout / stderr
Should redirect to file
Should NOT throw away to /dev/null
Result of thread dump
(kill - 3 <PROCESS_ID>) will be written
here
JVM Settings
78. SLF4J + Profiler
Output example
Example:
Log the result of the profiler when
timeout occurs
Profiler
+ Profiler [BASIC]
|-- elapsed time [A] 220.487 milliseconds.
|-- elapsed time [B] 2499.866 milliseconds.
|-- elapsed time [OTHER] 3300.745 milliseconds.
|-- Total [BASIC] 6022.568 milliseconds.
79. For catching trends, not for anomaly detection
Operation is also necessary not to look over
the sign of change
Not only for infrastructure /application, but
business indicates
Who uses the console?
System user
System administrator
Application developer
Business manager
Trends Visualization