Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016Matthew Broberg
Joel Cooklin and I cover the usage and design of Snap, an open telemetry framework. I also refer to Jason Dixon a few times because I'm a fan of his presentation that influenced my view of monitoring. If you love monitoring and don't mind profanity, check this out first: https://speakerdeck.com/obfuscurity/the-state-of-open-source-monitoring
Intro to open source telemetry linux con 2016Matthew Broberg
Abstract
As part of the team delivering Snap, an open telemetry framework, I've run through dozens of use cases where gathering disparate metrics from services can roll up into meaningful diagrams for operations engineers and developers alike. We will use Snap's plugin model to collect, process and publish these measurements into meaningful graphs using open source tools. By joining this session, you can follow along and install industry-standard open source projects, deploy them and then use Snap to collect, process and visualize these metrics.
Audience
Anyone with an operations-background (or future ahead of them) that wants to see the breadth of available open source tooling around telemetry. This proposal is designed for the hands-on user, who is comfortable running containers or virtual machines locally.
Experience Level
Intermediate
Benefits to the Ecosystem
By joining this session, you can follow along and install industry-standard open source projects, deploy them and then use Snap to collect, process and visualize these metrics. This empowers users within the Linux ecosystem to see their knowledge as powerful when visualized next to other layers of the datacenter.
As part of the team delivering Snap, an open telemetry framework, I've run through dozens of use cases where gathering disparate metrics from services can roll up into meaningful diagrams for operations engineers and developers alike. I will introduce you to the concept of telemetry by talking through the basics then using Snap's plugin model to collect, process and publish these measurements into meaningful graphs using open source tools.
The document discusses the Telemetry performance testing framework in Chrome. It provides an overview of Telemetry and its goals. The methodology section describes running a smoothness benchmark case as an example. Key concepts in Telemetry like benchmarks, measurements, page sets and options are explained. The document details how these concepts are connected in code and outlines the overall workflow when running a benchmark, from starting the browser to collecting and outputting results. It describes where results are located and how the performance metrics in results are generated and interpreted.
Tempest is the OpenStack integration test suite. It uses unittest and nosetest frameworks to run API calls against OpenStack services like Nova, Glance, Keystone, etc. and validate the responses. Tempest tests include smoke, positive, negative, stress and white box tests. It has a modular structure with common, services, and tests directories. Tempest plays an important role in OpenStack continuous integration by running on proposed code changes to check for regressions.
(1) Security testing should be integrated into continuous delivery pipelines to test applications as part of each build. (2) Pre-processing and grouping scan results reduces noise and false positives, saving developer time on analysis. (3) Leveraging existing automated tests within security scanners finds more accurate vulnerabilities than traditional scans alone.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Talk for USENIX LISA17: "Containers pose interesting challenges for performance monitoring and analysis, requiring new analysis methodologies and tooling. Resource-oriented analysis, as is common with systems performance tools and GUIs, must now account for both hardware limits and soft limits, as implemented using cgroups. A reverse diagnosis methodology can be applied to identify whether a container is resource constrained, and by which hard or soft resource. The interaction between the host and containers can also be examined, and noisy neighbors identified or exonerated. Performance tooling can need special usage or workarounds to function properly from within a container or on the host, to deal with different privilege levels and name spaces. At Netflix, we're using containers for some microservices, and care very much about analyzing and tuning our containers to be as fast and efficient as possible. This talk will show you how to identify bottlenecks in the host or container configuration, in the applications by profiling in a container environment, and how to dig deeper into kernel and container internals."
The document discusses the evolution of Ceilometer, an OpenStack project that collects measurements from deployed clouds and persists the data for later retrieval and analysis. It describes how Ceilometer has scaled out its data collection capabilities over time by adding agents, partitioning workloads, and integrating with Gnocchi to provide more efficient time-series storage. The document also provides best practices for Ceilometer deployment and configuration to optimize data collection, storage and querying.
Rapid Application Design in Financial ServicesAerospike
Applying internet NoSQL design patterns to fraud detection and risk scoring, including when to use SQL and when to use NoSQL. The state of NAND Flash and NVMe is also discussed, as well as storage class memory futures with Intel's 3D Xpoint technology.
This talk was presented in LA at the following meetup:
http://www.meetup.com/scalela/events/233396111/
MIT researchers have developed highly efficient quadruped robots like the Cheetah that can run at speeds up to 6m/s. The Cheetah uses a proprioceptive actuation system with high torque density motors to achieve high force control bandwidth over 120Hz. Its parallelized control system with multicore CPUs and FPGAs allows control frequencies up to 4kHz. Design principles for efficient legged locomotion include energy regeneration, low transmission impedance, and low leg inertia. The researchers are continuing their work with robots like Cheetah 2 and Hermes.
The document discusses best practices for testing Spark applications including setting up test environments for unit and integration testing Spark in both batch and streaming modes. It also covers performance testing Spark applications using Gatling, including developing a sample performance test for a word count Spark job run on Spark job server. Key steps for testing, code coverage, continuous integration and analyzing performance test results are provided.
Among the #1 complaints of Python in a data analysis context is the presence of the Global Interpreter Lock, or GIL. At its core, it means that a given Python program cannot easily utilize more than one core of a multi-core machine to do computation in parallel. However, fear not! To beat the GIL, you just need to be willing to adopt a little magic -- and this talk will tell you how.
Percona XtraDB Cluster before every release: Glimpse into CI testingRaghavendra Prabhu
This document discusses the continuous integration testing process used by Percona for releases of Percona XtraDB Cluster (PXC). It describes how Jenkins is used to automatically run a suite of tests on multiple platforms after every code change, including unit, performance, replication, and end-to-end clustering tests. These automated tests help find bugs early and ensure PXC works as intended as a clustered database system before each release. The document also outlines areas for further improving the testing approach over time.
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates and prepare you for a Cloud Native environment. Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.
A key feature when monitoring and debugging any Cloud infrastructure is to provide the ability to trace, track, and collate all the individual, discrete steps that compose an event. A typical resource action in OpenStack is often a combination of smaller tasks -- which given the distributed nature of OpenStack -- can fail at unpredictable points in the workflow. By collecting the appropriate events, operators can view all events within Ceilometer, filter on a failed action and trace back the history of related events to spot anomalies or errors. In this talk, we provide an overview of the recent enhancements made in Ceilometer to support the collection of event notifications from OpenStack services. We will describe: how events are processed, transformed and stored in Ceilometer; how you can derive metrics from events; and how it’s possible to track the events of a resource and analyse where errors occur.
Network Test Automation - Net Ops Coding 2015Hiroshi Ota
1. The document discusses network test automation using tools like Serverspec, Infrataster, Lbspec, and Rspec-ssltls to test network configurations and connectivity. These tools use Ruby and RSpec to test servers, DNS, firewalls, load balancers, and SSL/TLS without requiring changes to production systems.
2. Examples are provided showing how to test server reachability, DNS entries, firewall rules, load balancer behavior, and SSL/TLS settings using the different tools. Tests can be run to check configurations without affecting live networks.
3. Running the RSpec tests produces results indicating how many examples passed and failed, allowing engineers to test network changes with confidence before deploying
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
From USENIX LISA 2010, San Jose.
Visualizations that include heat maps can be an effective way to present performance data: I/O latency, resource utilization, and more. Patterns can emerge that would be difficult to notice from columns of numbers or line graphs, which are revealing previously unknown behavior. These visualizations are used in a product as a replacement for traditional metrics such as %CPU and are allowing end users to identify more issues much more easily (and some issues are becoming nearly impossible to identify with tools such as vmstat(1)). This talk covers what has been learned, crazy heat map discoveries, and thoughts for future applications beyond performance analysis.
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.
[231] the simplicity of cluster apps with circuitNAVER D2
This document discusses Circuit, a lightweight cluster operating system. It provides a real-time API to view and control hosts, processes, and containers. The API allows traversal and manipulation of the cluster as a unified namespace. The document outlines the API, including command line usage and a Go client package. It then describes how to build a job scheduler service using the Circuit API, including designing the state, handling events, and running jobs on hosts. The vision is for Circuit to enable easy sharing of systems and for any program to take on different roles by executing as a recursive process tree on the cluster.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
This document discusses using data logging and telemetry tools to optimize glider performance. It describes logging launch parameters like camber, elevator preset, and towhook position to measure altitude, airspeed, and towline tension over time. Analyzing these parameters helped identify ways to extract maximum energy in launches, such as reducing wing "ringing" by moving the towhook. Telemetry was also used in team training to provide real-time feedback on trim settings to find minimum sink rates. Overall, measuring and collecting data over time through logging and telemetry allows performance to be compared, understood, and improved.
Logging and Exception handling is one of the easiest tools to use when debugging; but how can you take those massive logs, thousands of errors and effortlessly use them to build a better product? This presentation share our developers team's lesson-learned to expedite releases and fix app issues faster. It discuss best practices that will help your dev team build a culture of logging such as: what to log, how to log it, and how to proactively put it to use.
Talk for USENIX LISA17: "Containers pose interesting challenges for performance monitoring and analysis, requiring new analysis methodologies and tooling. Resource-oriented analysis, as is common with systems performance tools and GUIs, must now account for both hardware limits and soft limits, as implemented using cgroups. A reverse diagnosis methodology can be applied to identify whether a container is resource constrained, and by which hard or soft resource. The interaction between the host and containers can also be examined, and noisy neighbors identified or exonerated. Performance tooling can need special usage or workarounds to function properly from within a container or on the host, to deal with different privilege levels and name spaces. At Netflix, we're using containers for some microservices, and care very much about analyzing and tuning our containers to be as fast and efficient as possible. This talk will show you how to identify bottlenecks in the host or container configuration, in the applications by profiling in a container environment, and how to dig deeper into kernel and container internals."
The document discusses the evolution of Ceilometer, an OpenStack project that collects measurements from deployed clouds and persists the data for later retrieval and analysis. It describes how Ceilometer has scaled out its data collection capabilities over time by adding agents, partitioning workloads, and integrating with Gnocchi to provide more efficient time-series storage. The document also provides best practices for Ceilometer deployment and configuration to optimize data collection, storage and querying.
Rapid Application Design in Financial ServicesAerospike
Applying internet NoSQL design patterns to fraud detection and risk scoring, including when to use SQL and when to use NoSQL. The state of NAND Flash and NVMe is also discussed, as well as storage class memory futures with Intel's 3D Xpoint technology.
This talk was presented in LA at the following meetup:
http://www.meetup.com/scalela/events/233396111/
MIT researchers have developed highly efficient quadruped robots like the Cheetah that can run at speeds up to 6m/s. The Cheetah uses a proprioceptive actuation system with high torque density motors to achieve high force control bandwidth over 120Hz. Its parallelized control system with multicore CPUs and FPGAs allows control frequencies up to 4kHz. Design principles for efficient legged locomotion include energy regeneration, low transmission impedance, and low leg inertia. The researchers are continuing their work with robots like Cheetah 2 and Hermes.
The document discusses best practices for testing Spark applications including setting up test environments for unit and integration testing Spark in both batch and streaming modes. It also covers performance testing Spark applications using Gatling, including developing a sample performance test for a word count Spark job run on Spark job server. Key steps for testing, code coverage, continuous integration and analyzing performance test results are provided.
Among the #1 complaints of Python in a data analysis context is the presence of the Global Interpreter Lock, or GIL. At its core, it means that a given Python program cannot easily utilize more than one core of a multi-core machine to do computation in parallel. However, fear not! To beat the GIL, you just need to be willing to adopt a little magic -- and this talk will tell you how.
Percona XtraDB Cluster before every release: Glimpse into CI testingRaghavendra Prabhu
This document discusses the continuous integration testing process used by Percona for releases of Percona XtraDB Cluster (PXC). It describes how Jenkins is used to automatically run a suite of tests on multiple platforms after every code change, including unit, performance, replication, and end-to-end clustering tests. These automated tests help find bugs early and ensure PXC works as intended as a clustered database system before each release. The document also outlines areas for further improving the testing approach over time.
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates and prepare you for a Cloud Native environment. Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.
A key feature when monitoring and debugging any Cloud infrastructure is to provide the ability to trace, track, and collate all the individual, discrete steps that compose an event. A typical resource action in OpenStack is often a combination of smaller tasks -- which given the distributed nature of OpenStack -- can fail at unpredictable points in the workflow. By collecting the appropriate events, operators can view all events within Ceilometer, filter on a failed action and trace back the history of related events to spot anomalies or errors. In this talk, we provide an overview of the recent enhancements made in Ceilometer to support the collection of event notifications from OpenStack services. We will describe: how events are processed, transformed and stored in Ceilometer; how you can derive metrics from events; and how it’s possible to track the events of a resource and analyse where errors occur.
Network Test Automation - Net Ops Coding 2015Hiroshi Ota
1. The document discusses network test automation using tools like Serverspec, Infrataster, Lbspec, and Rspec-ssltls to test network configurations and connectivity. These tools use Ruby and RSpec to test servers, DNS, firewalls, load balancers, and SSL/TLS without requiring changes to production systems.
2. Examples are provided showing how to test server reachability, DNS entries, firewall rules, load balancer behavior, and SSL/TLS settings using the different tools. Tests can be run to check configurations without affecting live networks.
3. Running the RSpec tests produces results indicating how many examples passed and failed, allowing engineers to test network changes with confidence before deploying
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
From USENIX LISA 2010, San Jose.
Visualizations that include heat maps can be an effective way to present performance data: I/O latency, resource utilization, and more. Patterns can emerge that would be difficult to notice from columns of numbers or line graphs, which are revealing previously unknown behavior. These visualizations are used in a product as a replacement for traditional metrics such as %CPU and are allowing end users to identify more issues much more easily (and some issues are becoming nearly impossible to identify with tools such as vmstat(1)). This talk covers what has been learned, crazy heat map discoveries, and thoughts for future applications beyond performance analysis.
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.
[231] the simplicity of cluster apps with circuitNAVER D2
This document discusses Circuit, a lightweight cluster operating system. It provides a real-time API to view and control hosts, processes, and containers. The API allows traversal and manipulation of the cluster as a unified namespace. The document outlines the API, including command line usage and a Go client package. It then describes how to build a job scheduler service using the Circuit API, including designing the state, handling events, and running jobs on hosts. The vision is for Circuit to enable easy sharing of systems and for any program to take on different roles by executing as a recursive process tree on the cluster.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
This document discusses using data logging and telemetry tools to optimize glider performance. It describes logging launch parameters like camber, elevator preset, and towhook position to measure altitude, airspeed, and towline tension over time. Analyzing these parameters helped identify ways to extract maximum energy in launches, such as reducing wing "ringing" by moving the towhook. Telemetry was also used in team training to provide real-time feedback on trim settings to find minimum sink rates. Overall, measuring and collecting data over time through logging and telemetry allows performance to be compared, understood, and improved.
Logging and Exception handling is one of the easiest tools to use when debugging; but how can you take those massive logs, thousands of errors and effortlessly use them to build a better product? This presentation share our developers team's lesson-learned to expedite releases and fix app issues faster. It discuss best practices that will help your dev team build a culture of logging such as: what to log, how to log it, and how to proactively put it to use.
JANOG39 トラフィック可視化 BoF 発表資料
Japanese - https://www.janog.gr.jp/meeting/janog39/program/traffic
English - https://www.janog.gr.jp/meeting/janog39/en/programs/y-bof-traffic
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
This document summarizes Sada Furuhashi's presentation on Fluentd, an open source data collector. Fluentd provides a centralized way to collect, filter, and output log data from various sources like applications, servers, and databases. It addresses challenges with typical log collection architectures that have high latency, complex parsing, and a combination explosion of connections. Fluentd uses a plugin-based architecture with input, filter, and output components to flexibly collect, transform, and deliver log data at scale to targets like files, databases and visualization tools. Many large companies like Microsoft, Atlassian and Amazon use Fluentd for log collection and analytics in production environments.
Ultra fast DDoS Detection with FastNetMon at Coloclue (AS 8283)Pavel Odintsov
This document discusses how Coloclue, a non-profit volunteer-driven ISP, automated the detection and mitigation of DDoS attacks through the use of FastNetMon and BIRD. FastNetMon allows for detection of attacks within 3 seconds by monitoring traffic levels. BIRD then injects selective blackhole routes within 1 second to mitigate attacks by dropping traffic for 1 IP or subnet for 60 seconds. This approach solves the DDoS problem within 4 seconds through 100% automated detection and mitigation.
This document discusses making the Norikra stream processing software more perfect. It outlines how Norikra currently works well for small to medium sites but has limitations for large deployments. The concept of a "Perfect Norikra" is introduced that would add distributed execution, high availability, and dynamic scaling capabilities. A rough design is sketched that involves a new query executor, dataflow manager, and strategies for dynamic scaling through intermediate results and merging across nodes. Challenges mentioned include resource monitoring, multi-tenancy, and supporting queries without aggregations.
Fluentd is an open source data collector that allows flexible data collection, processing, and output. It supports streaming data from sources like logs and metrics to destinations like databases, search engines, and object stores. Fluentd's plugin-based architecture allows it to support a wide variety of use cases. Recent versions of Fluentd have added features like improved plugin APIs, nanosecond time resolution, and Windows support to make it more suitable for containerized environments and low-latency applications.
DDoS detection at small ISP by Wardner MaiaPavel Odintsov
Este documento trata sobre la detección y mitigación de ataques distribuidos de denegación de servicio (DDoS) en un pequeño proveedor de servicios de Internet (ISP). Explica conceptos básicos sobre DDoS, incluidos tipos de ataques y arquitectura. Luego, discute buenas prácticas de red para minimizar ataques, como la implementación de BCP-38 y la eliminación de amplificadores y bucles estáticos. Finalmente, cubre técnicas de mitigación como blackholing remoto y sol
WebRTC Conference Japan 2016 (2016年2月16日) の講演資料です。
発表者は中蔵聡哉と大津谷亮祐 http://www.slideshare.net/rotsuya です。
“Telexistence Robot controlled with WebRTC”
It's the presentation slides at WebRTC Conference Japan on Feb 16, 2016.
The presenters were Toshiya Nakakura and Ryosuke Otsuya http://www.slideshare.net/rotsuya .
Ripe71 FastNetMon open source DoS / DDoS mitigationPavel Odintsov
This document describes FastNetMon, an open source DDoS mitigation toolkit. It provides concise summaries of network traffic and detects DDoS attacks in real-time. It can block malicious traffic through methods like BGP announcements. FastNetMon supports many Linux distributions and can integrate with hardware/cloud solutions. It detects attacks faster than traditional hardware/service approaches through optimized packet capture using tools like Netmap and PF_RING.
The Science of Fun - Data-driven Game Developmentalex_turcan
Games are crafted to provide unique experiences, but players don't always behave as you would expect. In this presentation, Alexandra Turcan and Ruan Pearce-Authers from Dambuster Studios will explain how they combine UX methods with telemetry and biometrics to quantify player in-game behaviour.
How to Become a Thought Leader in Your NicheLeslie Samuel
Are bloggers thought leaders? Here are some tips on how you can become one. Provide great value, put awesome content out there on a regular basis, and help others.
This document discusses malware analysis collaboration and automation. It describes setting up a virtualized malware analysis environment using QEMU/KVM with light-weight, copy-on-write disk clones for consistency and efficiency. It also covers automating tasks like provisioning new virtual machines, inserting and extracting files from guests, and capturing and replaying virtual machine sessions for collaborative training.
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia
By Andy Wingo.
Snabb is an open-source toolkit for building fast, flexible network functions. Since its beginnings in 2012, Snabb has seen some modest deployment success ranging from simple one-off diagnosis tools to border routers that process all IPv4 traffic for entire countries. This talk will give an introduction to Snabb. After going over Snabb's fundamental components and how they combine, the talk will move on to examples of how network engineers are taking advantage of Snabb in practice, mentioning a few of the many open-source network functions built on Snabb.
(c) RIPE 77
15 - 19 October 2018
Amsterdam, Netherlands
https://ripe77.ripe.net
NML is a project for out-of-band server management that allows for extremely configurable OS installation with minimal human intervention. It aims to build an open-source matrix of server hardware and OS distribution combinations. The current status is that it is hosted on GitHub and has two main members. NML encapsulates intelligence in HTTP and uses technologies like iPXE, DHCP, and preseeding/kickstarting to remotely install and configure operating systems on servers. It focuses on flexibility and independence from specific OSes or hardware.
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios
Dan Wittenberg's presentation on using Nagios at a Fortune 50 Company
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
LibOS as a regression test framework for Linux networking #netdev1.1Hajime Tazaki
This document describes using the LibOS framework to build a regression testing system for Linux networking code. LibOS allows running the Linux network stack in a library, enabling deterministic network simulation. Tests can configure virtual networks and run network applications and utilities to identify bugs in networking code by detecting changes in behavior across kernel versions. Example tests check encapsulation protocols like IP-in-IP and detect past kernel bugs. Results are recorded in JUnit format for integration with continuous integration systems.
Five cool ways the JVM can run Apache Spark fasterTim Ellison
The IBM JVM runs Apache Spark fast! This talk explains some of the findings and optimizations from our experience of running Spark workloads.
The talk was originally presented at the SparkEU Summit 2015 in Amsterdam.
This document discusses container technologies including App Container (appc) and rkt. It provides an overview of appc components like the image format, discovery, and executor. It then discusses rkt, an implementation of appc, describing its modular architecture with stages 0-2 and use of systemd and cgroups for isolation. It also touches on rkt security, networking, and integration with systemd and user namespaces.
The Varnish Roadshow is a performance focused presentation on Varnish, an open source HTTP accelerator. It discusses how Varnish was created to address the shortcomings of traditional caching solutions like Squid that do not take advantage of modern computing architectures. Varnish uses a custom configuration language called VCL, has a split manager/worker process design for high performance, and provides real-time statistics and management via shared memory and command line tools.
Talk from Embedded Linux Conference, http://elcabs2015.sched.org/event/551ba3cdefe2d37c478810ef47d4ca4c?iframe=no&w=i:0;&sidebar=yes&bg=no#.VRUCknSQQQs
The document describes steps to build and train an image classification model using Lazarus, the neural-api library, and Google Colab. It clones the neural-api GitHub repository, installs dependencies like FPC and Lazarus, builds and trains a simple image classifier on the CIFAR-10 dataset, and exports the trained model weights and training logs. The process demonstrates how to leverage Google Colab's GPUs to train deep learning models using Lazarus and Pascal.
Linux Server Deep Dives (DrupalCon Amsterdam)Amin Astaneh
Over the past few years the Linux kernel has gained features that allow us to learn more about what's really happening on our servers and the applications that run on them.
This talk will explore how these new features, particularly perf_events and ebpf, enable us to answer questions about what a Drupal site is doing in real time beyond what the standard logs, server performance tools, and even strace will reveal. Attendees will be provided a brief introduction to example uses of these tools to diagnose performance problems.
This talk is intended for attendees that are familiar with Linux, the command line, and have used host observability tools in the past (top, netstat, etc).
This summary provides an overview of the key points from the OpenStack security document:
1. OpenStack is an open source cloud computing platform consisting of several interrelated components like Nova, Swift, Keystone, etc. Each component has its own REST API and is responsible for a certain functionality like compute, storage, identity, etc.
2. The document discusses various security aspects and pain points related to different OpenStack components like authentication tokens, message buses, REST APIs, volumes, and intrusion detection.
3. It also covers strategies for incident response, forensics, and reporting vulnerabilities in OpenStack. Maintaining chain of custody for evidence and providing forensic access to tenants are highlighted.
4. Finally, the
Hacktivity2014: Virtual Machine Introspection to Detect and ProtectTamas K Lengyel
Virtual machine introspection (VMI) allows security tools to be run externally to virtual machines for improved isolation, visibility, and control. VMI provides full interpretation of the virtual hardware and memory to detect malware. It can actively monitor VM events and memory through techniques like EPT trap interposition. The speaker demonstrated these capabilities using open source tools like LibVMI and DRAKVUF for dynamic malware analysis. VMI is presented as an important approach for cloud and mobile security going forward.
This document discusses performance optimization for data centers on multi-core platforms and provides a case study analysis. It introduces Intel software tuning tools, describes a methodology for data center performance tuning involving system, application, and microarchitecture levels, and analyzes a case study where thread synchronization overhead was identified and reduced through the use of NPTL in Linux, improving CPU utilization and throughput.
Android boot time optimization involves measuring boot times, analyzing the results, and reducing times. Key areas of focus include the bootloader, kernel initialization, zygote class preloading, and system service startup. Hibernation technologies like QuickBoot and Fast-On can improve resume speeds by saving a system image to flash. The "R-Loader" concept aims to minimize hardware re-initialization on resume by directly loading a suspended kernel image.
Our industry as at a saturation point of buzzwords and it's mucky for those on the job hunt going forward. I'd like to take some time to preach how Open Source's history might be an opportunity to connect into a better future for us all and then map that to technical and career trends happening around us. Sysadmin skills might just be more valuable in the future than we realize.
This talk also takes the theme from my favorite talk by John Rauser with his permission.
As part of the Geek Whisperers podcast, I talk to people about their jobs. Here are some ideas that might help you think about yours. Subscribe to the show: http://geek-whisperers.com/subscribe/
A 30 minute exploration of what I think leads to successful open source projects and successful enjoyment of learning git. It also introduced Commitmukkah, which will complement Commitmas by being a way to get back to basics.
A brief introduction to the concept of application program interfaces, how they differ from the CLI and how that theory applies to our day jobs. Communication is hard. In an exploration of what software engineering can teach us about our daily lives, let's explore the interfaces we have between each other. We might just find the metaphor that makes communication a little easier. Takeaway: determine what your API is – what inputs you take from others and what they can expect in return.
Social Media Communities Explained - They're Like PuppiesMatthew Broberg
Communities, like puppies, are desirable to anyone with a pulse. So cute... so much potential love; but what does it mean to own a community? What are the impacts to you both positive and negative?
Through the lens of adorable canines, I explore what someone signs up for when they get their very own community.
My colleague Jon Gogel (@JonGogel on Twitter) put together an incredible deck on how Social Media needs a baseline in metrics. That said, metrics without insight is meaningless.
Combining Simply Measured with business insight, you can improve upon your use of numbers to improve how social media impacts your business results.
How to Pitch an Idea - Lessons from EMC TV & ToastmastersMatthew Broberg
I talk a lot these days, and it hasn't always been effective.
Four and a half years ago I nearly went to the hospital due to loss of oxygen during a presentation. Today, I'm a multi-year member of Toastmasters International and an anchor for EMCTV at EMC Corporation.
This presentation isn't about my story, but rather some elements inherent to ALL of our stories. Take a moment to review the points within the story arc taught here and reflect on how you can make a killer presentation to your target audience.
This document provides an introduction to using Twitter for business purposes. It outlines reasons to use Twitter such as building relationships, reaching a large market, measuring engagement, gaining individual recognition, and curating a news stream. It also provides references to tools that can help with automating, scheduling, and saving content from Twitter like Hootsuite, Buffer, IFTTT, and apps from Tapbots, Pocket, and Newsle.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
1. Go write a plugin!
THE STORY OF AN
OPEN TELEMETRY
FRAMEWORK CALLED
2. What snap is
snap is an open framework for metrics
snap is NOT an analytics alternative
We have a list of maintained plugins.
Curious what it requires to write one? See our Plugin Authoring
documentation.
/intel/server/cpu/load
/intel/server/cpu/ipc
/intel/server/cpu/l2cache
/intel/server/mem/free
/intel/server/mem/used
/intel/server/nic/eth0/bytes_rec
PSUtil, Facter,
CollectD, Ohai
Node, DCM,
NIC, Disk Encryption, OSLO,
Machine Learning,
Filtering
RabbitMQ, HANA,
Ceilometer, InfluxDB
Mosquito, File,
postgresSQL, mySQL
3. Demo.
The simple one.
Environment:
1 instance of snapd running locally
Load collectors, view metrics via snapctl
Pre-work:
gvm use go1.5.3
go get github.com/intelsdi-x/snap
make
4. Why this is cool
Write this plugin (or others!)
Cloud
Software
Server
Silicon
Storage
Silicon
Network
Silicon
Orchestration Software
Developer Environment
Virtualization Software
Operating System, Libraries
Cloud Software Stack
Applications & Services
Telemetry and
Datacenter Analytics
Collector
Processor
Publisher
Bayesia
n Filter Cassandr
a
Publisher
TASK
Grafana
CP
U
NIC
ME
M
5. Key Features
Collect
Publish
Publish
Collect
Process Publish
Process Publish
• Plugin load
• No restart
• Extends the metric catalog
• Plugin unload
• Removes metrics from catalog
• Plugin swap
• Newer version swapped in a
single transaction
LIFECYCLE MANAGEMENT FLEXIBLE DEPLOYMENT AUTOMATION
ProcessCollect Publish
TRIBE
6. Demo.
Administration and automate scale.
Environment:
4 instance of snapd running locally
Initiate in tribe mode
Load a collector, process & publisher
Load a task via snapctl
Watch that task via snapctl
Pre-work:
gvm use go1.5.3
go get github.com/intelsdi-x/snap
make
7. Why Go?
<3 this community slide.
Great toolset
• Logrus
• Gomit*
• Memberlist
Strong opinions
• fmt
• test
Datacenter Momentum
9. Get involved
• Download, install and run snap right now
• Read the extensive documentation on GitHub
• Report your experiences (features and bugs) through GitHub
• Talk to the developers of snap on Gitter
• See the public roadmap on GitHub
• Reach out to me or any of the maintainers =>
• on Twitter|GitHub|Gitter
Thanks!
@mjbrender
Editor's Notes
#3: A powerful telemetry agent framework designed to
Improve deployment model and flexibility of the telemetry tools ecosystem
Provide dynamic control of collection for small or large clusters of systems
Allow flexible processing of telemetry data on agent (e.g. machine learning)
Simplify disseminating data to telemetry ingesting systems
Provide operational innovation for collecting across cluster of machines
Support emerging API consumption models
#8: Gomit = Gomit provides facilities for defining, emitting, and handling events within a go program. - https://github.com/intelsdi-x/gomit