ML platform meetups are quarterly meetups, where we discuss and share advanced technology on machine learning infrastructure. Companies involved include Airbnb, Databricks, Facebook, Google, LinkedIn, Netflix, Pinterest, Twitter, and Uber.
Service Discovery and Registration in a Microservices ArchitecturePLUMgrid
Microservices, Service Discovery and Registration have been heading towards the peak of inflated expectations on the Gartner Hype cycle for over the last year or so, but there has often been a lack of clarity as to what these are, why are they needed or how to implement them well.
Service discovery and registration are key components of most distributed systems and service oriented architectures. In this session we will talk about what, why and how of service registration and discovery in distributed systems in general and OpenStack in particular.
We will talk about some of the technologies that address this challenge like Zookeeper, Etcd, Consul, Mesos-DNS, Minuteman, SkyDNS, SmartStack or Eureka. We will also address how these technologies as well as existing OpenStack projects can be used to solve this problem inside OpenStack environments.
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...ScyllaDB
Lightbits LightOS software defined storage system aims to provide low latency storage access over existing Ethernet networks by pooling NVMe flash across servers. It addresses latency challenges from flash media, the network, and management operations through an intelligent flash management backend, NVMe/TCP frontend, write buffering, and isolation of management processes. Performance measurements show average read latencies of 150-170 microseconds and average write latencies of 90-125 microseconds even under high load, demonstrating its ability to keep latency low at scale.
Grafana is an open source analytics and monitoring tool that allows users to visualize time series data from various databases in customizable dashboards. It supports advanced visualizations, alerting features, and reporting. Grafana works with time series databases like InfluxDB to collect, analyze, and visualize metrics data. Users can build dashboards to monitor servers and get alert notifications. Grafana is widely used across different domains for its flexibility and rich feature set.
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021StreamNative
This document discusses using Apache Pulsar with MQTT for edge computing. It provides an overview of Apache Pulsar and how it enables message queuing and data streaming with features like pub-sub, geo-replication, and multi-protocol support including MQTT. It also discusses edge computing characteristics and challenges, and how running Apache Pulsar on edge devices can address these by extending data processing to the edge and integrating with sensors using the MQTT protocol. Examples are provided of ingesting IoT data into Pulsar from Python and using NVIDIA Jetson devices with Pulsar.
Saturn 2018: Managing data consistency in a microservice architecture using S...Chris Richardson
A revised and extended version that I gave at Saturn 2018.
The services in a microservice architecture must be loosely coupled and so cannot share database tables. What’s more, two phase commit (a.k.a. a distributed transaction) is not a viable option for modern applications. Consequently, a microservices application must use the Saga pattern, which maintains data consistency using a series of local transactions.
In this presentation, you will learn how sagas work and how they differ from traditional transactions. We describe how to use sagas to develop business logic in a microservices application. You will learn effective techniques for orchestrating sagas and how to use messaging for reliability. We will describe the design of a saga framework for Java and show a sample application.
Apache Pulsar: The Next Generation Messaging and Queuing SystemDatabricks
This document discusses Apache Pulsar, an open-source distributed messaging and streaming platform. It provides concise summaries of Pulsar's key capabilities:
1) Pulsar provides messaging and streaming capabilities with a unified model and supports high throughput, low latency, availability, and durability.
2) It uses Apache BookKeeper for durable log storage and supports features like geo-replication, multi-tenancy, and lightweight compute functions.
3) Pulsar has been widely adopted with over 100 customers and has a large open-source community around it.
인터넷 환경에서 가장 많이 사용되는 전송 중 암호화 기술인 SSL/TLS 에 대한 설명입니다. TLS 는 2022년 현재 1.3 버전이 가장 최신버전이며 주요 기업 및 기관에서는 보안 강화 및 사용자 경험 개선을 위해 1.3 버전을 권고하고 있습니다. TLS 의 각 버전별 Handshake 방식과 적용되는 기술들이 어떻게 다른지 살펴보세요.
Log Management
Log Monitoring
Log Analysis
Need for Log Analysis
Problem with Log Analysis
Some of Log Management Tool
What is ELK Stack
ELK Stack Working
Beats
Different Types of Server Logs
Example of Winlog beat, Packetbeat, Apache2 and Nginx Server log analysis
Mimikatz
Malicious File Detection using ELK
Practical Setup
Conclusion
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Claus Ibsen
In this session, we'll focus on:
Camel 3: Demos of how Camel 3, Camel K and Camel Quarkus all work together, and will provide insights into Camel’s role in the next major release of Red Hat Integration products.
Camel K: This serverless integration platform provides low-code/no-code capabilities, where integrations can be snapped together quickly using the powers from integration patterns and Camel’s extensive set of connectors.
Camel Quarkus: Using Knative (the fast runtime of Quarkus) and Camel K brings awesome serverless features, such as auto-scaling, scaling to zero, and event-based communication, with great integration capabilities from Apache Camel.
You will also hear about the latest Camel sub-project Camel Kafka Connectors which makes it possible to use all the Camel components as Kafka Connect connectors.
Finally we bring details of the roadmap for what is coming up in the Camel projects.
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...StreamNative
The Netdata Agent is free, open source single-node monitoring software. Netdata Cloud is a free, closed source, software-as-a-service that brings together metadata from endpoints running the Netdata Agent, giving a complete view of the health and performance of an infrastructure. All the metrics remain on the Netdata Agent, making Netdata Cloud the focal point of a distributed, infinitely scalable, low cost solution.
The heart of Netdata Cloud is Pulsar. Almost every message coming from and going to the open source agents passes through Pulsar. Pulsar's infinite number of topics has given us the flexibility we needed and in some cases, every single Netdata Agent has its own unique Pulsar topic. A single message from an agent or from a service that processes a front end request can trigger several other Pulsar messages, as we also use Pulsar for communication between microservices (using a CQRS pattern with shared subscriptions for scalability).
The reliable persistence of messages has allowed us to replay old events to rebuild old and build new materialized views and debug specific production issues. It's also what will enable us to implement an event sourcing pattern, for a new set of features we want to introduce shortly.
We have had a few issues with a specific client and our shared subscriptions that we're working on resolving, but overall Pulsar has proven to be one of the most reliable parts of our infrastructure and we decided to proceed with a managed services agreement.
Debezium is a Kafka Connect plugin that performs Change Data Capture from your database into Kafka. This talk demonstrates how this can be leveraged to move your data from one database platform such as MySQL to PostgreSQL. A working example is available on GitHub (github.com/gh-mlfowler/debezium-demo).
- Microservices advocate creating a system from small, isolated services that each own their data and are independently scalable and resilient. They are inspired by biological cells that are small, single-purpose, and work together through messaging.
- The system is divided using a divide and conquer approach, decomposing it into discrete subsystems that communicate over well-defined protocols. Each microservice focuses on a single business capability and owns its own data and behavior.
- Microservices communicate asynchronously through APIs and events to maintain independence and isolation, which enables continuous delivery, failure resilience, and independent scaling of each service.
This document summarizes Netflix's use of Kafka in their data pipeline. It discusses how Netflix evolved from using S3 and EMR to introducing Kafka and Kafka producers and consumers to handle 400 billion events per day. It covers challenges of scaling Kafka clusters and tuning Kafka clients and brokers. Finally, it outlines Netflix's roadmap which includes contributing to open source projects like Kafka and testing failure resilience.
ELK is a stack consisting of the open source tools Elasticsearch, Logstash, and Kibana. Elasticsearch provides a distributed, multitenant-capable full-text search engine. Logstash is used to collect, process, and forward events and log messages. Kibana provides visualization capabilities on top of Elasticsearch. The document discusses how each tool in the ELK stack works and can be configured using inputs, filters, and outputs in Logstash or through the Elasticsearch REST API. It also provides examples of using ELK for log collection, processing, and visualization.
This document provides an overview of Grafana, an open source metrics dashboard and graph editor for Graphite, InfluxDB and OpenTSDB. It discusses Grafana's features such as rich graphing, time series querying, templated queries, annotations, dashboard search and export/import. The document also covers Grafana's history and alternatives. It positions Grafana as providing richer features than Graphite Web and highlights features like multiple y-axes, unit formats, mixing graph types, thresholds and tooltips.
BMP (BGP Monitoring Protocol) allows routers to send BGP peer route updates and statistics to external monitoring stations. It provides access to the pre-policy routing table (Adj-RIB-In) of peers on an ongoing basis. Cisco supports BMP in IOS-XE and IOS-XR routers. OpenBMP is an open-source BMP collector that stores updates in a MySQL database for analysis.
The document discusses optimizing an Apache Pulsar deployment to handle 10 PB of data per day for a large customer. It estimates the initial cluster size needed using different storage options in Google Cloud Platform. It then describes four optimizations made - eliminating the journal, using direct I/O, compression, and improving the C++ client - and recalculates the cluster size after each optimization. The optimized deployment uses 200 VMs each with 24 local SSDs to meet the requirements.
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...confluent
Apache Kafka is changing the way we build scalable and highly available software systems. Providing a simplified path to eventual consistency and event sourcing Kafka gives us the platform to make these patterns a reality for a much broader segment of applications and customers than was possible in the past. Cloud Events is an interoperable specification for eventing that is part of the CNCF. This session will combine open source and open standards to show you how you can build highly reliable application that scale linearly, provide interoperability and are easily extensible leveraging both push and pull semantics. Concrete real world examples will be shown of how Kafka makes event sourcing more approachable and how streams and events complement each other including the difference between business events and technical events.
The document summarizes an international conference on Islamic microfinance in Mauritius organized by the Center of Islamic Banking & Economics. It discusses technology for Islamic finance and whether it is time to invest. The conference will discuss Oracle and Islamic finance, the need to re-architect Islamic banking technology, and the benefits of a re-architected technology platform. It concludes by outlining how a re-architected platform can provide benefits such as being designed for universal banking, end-to-end integration, and regulatory and Shariah compliance.
Presentation by Lorenzo Mangani of QXIP at the October 26 SF Bay Area ClickHouse meetup
https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup
https://qxip.net/
This document discusses various techniques for IPv6 transition and coexistence with IPv4, including:
- Dual-stack which allows simultaneous support of both IPv4 and IPv6.
- Tunnels which encapsulate IPv6 packets in IPv4 packets to provide IPv6 connectivity through IPv4 networks.
- Translation techniques like NAT64 which allow communication between IPv4-only and IPv6-only nodes.
The implementation will:
- FORWARD-TSN (0xC0): Silently skip this chunk but continue to process the rest of the chunks in the packet.
- ASCONF (0xC1): Silently skip this chunk but continue to process the rest of the chunks in the packet.
- ASC: This is an incomplete chunk type so I cannot determine how it would be processed.
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
This document discusses Apache Zeppelin, an open-source notebook for interactive data analytics. It provides an overview of Zeppelin's features, including interactive notebooks, multiple backends, interpreters, and a display system. The document also covers Zeppelin's adoption timeline, from its origins as a commercial product in 2012 to becoming an Apache Incubator project in 2014. Future projects involving Zeppelin like Helium and Z-Manager are also briefly described.
Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...Fred Posner
Presentation from AsteriskWorld 2017 at ITEXPO. Discussion of how I started with Asterisk and Kamailio as well as how to build Reliability, Scalability, and Security into your telephony platform.
Getting Started with Spring Authorization ServerVMware Tanzu
SpringOne 2021
Title: Getting Started with Spring Authorization Server
Speakers: Joe Grandja, Spring Security Engineer at VMware; Steve Riesenberg, Software Engineer at VMware
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
Bighead is Airbnb's machine learning infrastructure that was created to:
- Standardize and simplify the ML development workflow;
- Reduce the time and effort to build ML models from weeks/months to days/weeks; and
- Enable more teams at Airbnb to utilize ML.
It provides shared services and tools for data management, model training/inference, and model management to make the ML process more efficient and production-ready. This includes services like Zipline for feature storage, Redspot for notebook environments, Deep Thought for online inference, and the Bighead UI for model monitoring.
Bighead is Airbnb's machine learning infrastructure that was created to:
1) Standardize and simplify the ML development workflow;
2) Reduce the time and effort to build ML models from weeks/months to days/weeks; and
3) Enable more teams at Airbnb to utilize ML.
It provides services for data management, model training/scoring, production deployment, and model management to make the ML process more efficient and consistent across teams. Bighead is built on open source technologies like Spark, TensorFlow, and Kubernetes but addresses gaps to fully support the end-to-end ML pipeline.
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Claus Ibsen
In this session, we'll focus on:
Camel 3: Demos of how Camel 3, Camel K and Camel Quarkus all work together, and will provide insights into Camel’s role in the next major release of Red Hat Integration products.
Camel K: This serverless integration platform provides low-code/no-code capabilities, where integrations can be snapped together quickly using the powers from integration patterns and Camel’s extensive set of connectors.
Camel Quarkus: Using Knative (the fast runtime of Quarkus) and Camel K brings awesome serverless features, such as auto-scaling, scaling to zero, and event-based communication, with great integration capabilities from Apache Camel.
You will also hear about the latest Camel sub-project Camel Kafka Connectors which makes it possible to use all the Camel components as Kafka Connect connectors.
Finally we bring details of the roadmap for what is coming up in the Camel projects.
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...StreamNative
The Netdata Agent is free, open source single-node monitoring software. Netdata Cloud is a free, closed source, software-as-a-service that brings together metadata from endpoints running the Netdata Agent, giving a complete view of the health and performance of an infrastructure. All the metrics remain on the Netdata Agent, making Netdata Cloud the focal point of a distributed, infinitely scalable, low cost solution.
The heart of Netdata Cloud is Pulsar. Almost every message coming from and going to the open source agents passes through Pulsar. Pulsar's infinite number of topics has given us the flexibility we needed and in some cases, every single Netdata Agent has its own unique Pulsar topic. A single message from an agent or from a service that processes a front end request can trigger several other Pulsar messages, as we also use Pulsar for communication between microservices (using a CQRS pattern with shared subscriptions for scalability).
The reliable persistence of messages has allowed us to replay old events to rebuild old and build new materialized views and debug specific production issues. It's also what will enable us to implement an event sourcing pattern, for a new set of features we want to introduce shortly.
We have had a few issues with a specific client and our shared subscriptions that we're working on resolving, but overall Pulsar has proven to be one of the most reliable parts of our infrastructure and we decided to proceed with a managed services agreement.
Debezium is a Kafka Connect plugin that performs Change Data Capture from your database into Kafka. This talk demonstrates how this can be leveraged to move your data from one database platform such as MySQL to PostgreSQL. A working example is available on GitHub (github.com/gh-mlfowler/debezium-demo).
- Microservices advocate creating a system from small, isolated services that each own their data and are independently scalable and resilient. They are inspired by biological cells that are small, single-purpose, and work together through messaging.
- The system is divided using a divide and conquer approach, decomposing it into discrete subsystems that communicate over well-defined protocols. Each microservice focuses on a single business capability and owns its own data and behavior.
- Microservices communicate asynchronously through APIs and events to maintain independence and isolation, which enables continuous delivery, failure resilience, and independent scaling of each service.
This document summarizes Netflix's use of Kafka in their data pipeline. It discusses how Netflix evolved from using S3 and EMR to introducing Kafka and Kafka producers and consumers to handle 400 billion events per day. It covers challenges of scaling Kafka clusters and tuning Kafka clients and brokers. Finally, it outlines Netflix's roadmap which includes contributing to open source projects like Kafka and testing failure resilience.
ELK is a stack consisting of the open source tools Elasticsearch, Logstash, and Kibana. Elasticsearch provides a distributed, multitenant-capable full-text search engine. Logstash is used to collect, process, and forward events and log messages. Kibana provides visualization capabilities on top of Elasticsearch. The document discusses how each tool in the ELK stack works and can be configured using inputs, filters, and outputs in Logstash or through the Elasticsearch REST API. It also provides examples of using ELK for log collection, processing, and visualization.
This document provides an overview of Grafana, an open source metrics dashboard and graph editor for Graphite, InfluxDB and OpenTSDB. It discusses Grafana's features such as rich graphing, time series querying, templated queries, annotations, dashboard search and export/import. The document also covers Grafana's history and alternatives. It positions Grafana as providing richer features than Graphite Web and highlights features like multiple y-axes, unit formats, mixing graph types, thresholds and tooltips.
BMP (BGP Monitoring Protocol) allows routers to send BGP peer route updates and statistics to external monitoring stations. It provides access to the pre-policy routing table (Adj-RIB-In) of peers on an ongoing basis. Cisco supports BMP in IOS-XE and IOS-XR routers. OpenBMP is an open-source BMP collector that stores updates in a MySQL database for analysis.
The document discusses optimizing an Apache Pulsar deployment to handle 10 PB of data per day for a large customer. It estimates the initial cluster size needed using different storage options in Google Cloud Platform. It then describes four optimizations made - eliminating the journal, using direct I/O, compression, and improving the C++ client - and recalculates the cluster size after each optimization. The optimized deployment uses 200 VMs each with 24 local SSDs to meet the requirements.
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...confluent
Apache Kafka is changing the way we build scalable and highly available software systems. Providing a simplified path to eventual consistency and event sourcing Kafka gives us the platform to make these patterns a reality for a much broader segment of applications and customers than was possible in the past. Cloud Events is an interoperable specification for eventing that is part of the CNCF. This session will combine open source and open standards to show you how you can build highly reliable application that scale linearly, provide interoperability and are easily extensible leveraging both push and pull semantics. Concrete real world examples will be shown of how Kafka makes event sourcing more approachable and how streams and events complement each other including the difference between business events and technical events.
The document summarizes an international conference on Islamic microfinance in Mauritius organized by the Center of Islamic Banking & Economics. It discusses technology for Islamic finance and whether it is time to invest. The conference will discuss Oracle and Islamic finance, the need to re-architect Islamic banking technology, and the benefits of a re-architected technology platform. It concludes by outlining how a re-architected platform can provide benefits such as being designed for universal banking, end-to-end integration, and regulatory and Shariah compliance.
Presentation by Lorenzo Mangani of QXIP at the October 26 SF Bay Area ClickHouse meetup
https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup
https://qxip.net/
This document discusses various techniques for IPv6 transition and coexistence with IPv4, including:
- Dual-stack which allows simultaneous support of both IPv4 and IPv6.
- Tunnels which encapsulate IPv6 packets in IPv4 packets to provide IPv6 connectivity through IPv4 networks.
- Translation techniques like NAT64 which allow communication between IPv4-only and IPv6-only nodes.
The implementation will:
- FORWARD-TSN (0xC0): Silently skip this chunk but continue to process the rest of the chunks in the packet.
- ASCONF (0xC1): Silently skip this chunk but continue to process the rest of the chunks in the packet.
- ASC: This is an incomplete chunk type so I cannot determine how it would be processed.
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
This document discusses Apache Zeppelin, an open-source notebook for interactive data analytics. It provides an overview of Zeppelin's features, including interactive notebooks, multiple backends, interpreters, and a display system. The document also covers Zeppelin's adoption timeline, from its origins as a commercial product in 2012 to becoming an Apache Incubator project in 2014. Future projects involving Zeppelin like Helium and Z-Manager are also briefly described.
Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...Fred Posner
Presentation from AsteriskWorld 2017 at ITEXPO. Discussion of how I started with Asterisk and Kamailio as well as how to build Reliability, Scalability, and Security into your telephony platform.
Getting Started with Spring Authorization ServerVMware Tanzu
SpringOne 2021
Title: Getting Started with Spring Authorization Server
Speakers: Joe Grandja, Spring Security Engineer at VMware; Steve Riesenberg, Software Engineer at VMware
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
Bighead is Airbnb's machine learning infrastructure that was created to:
- Standardize and simplify the ML development workflow;
- Reduce the time and effort to build ML models from weeks/months to days/weeks; and
- Enable more teams at Airbnb to utilize ML.
It provides shared services and tools for data management, model training/inference, and model management to make the ML process more efficient and production-ready. This includes services like Zipline for feature storage, Redspot for notebook environments, Deep Thought for online inference, and the Bighead UI for model monitoring.
Bighead is Airbnb's machine learning infrastructure that was created to:
1) Standardize and simplify the ML development workflow;
2) Reduce the time and effort to build ML models from weeks/months to days/weeks; and
3) Enable more teams at Airbnb to utilize ML.
It provides services for data management, model training/scoring, production deployment, and model management to make the ML process more efficient and consistent across teams. Bighead is built on open source technologies like Spark, TensorFlow, and Kubernetes but addresses gaps to fully support the end-to-end ML pipeline.
Bighead: Airbnb's end-to-end machine learning platform
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python, Spark, and Kubernetes. The components include a lifecycle management service, an offline training and inference engine, an online inference service, a prototyping environment, and a Docker image customization tool. Each component can be used individually. In addition, Bighead includes a unified model building API that smoothly integrates popular libraries including TensorFlow, XGBoost, and PyTorch. Each model is reproducible and iterable through standardization of data collection and transformation, model training environments, and production deployment. This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adopted in Airbnb and we have variety of models running in production. We plan to open source Bighead to allow the wider community to benefit from our work.
Speaker: Andrew Hoh
Andrew Hoh is the Product Manager for the ML Infrastructure and Applied ML teams at Airbnb. Previously, he has spent time building and growing Microsoft Azure's NoSQL distributed database. He holds a degree in computer science from Dartmouth College.
When it comes to Large Scale data processing and Machine Learning, Apache Spark is no doubt one of the top battle-tested frameworks out there for handling batched or streaming workloads. The ease of use, built-in Machine Learning modules, and multi-language support makes it a very attractive choice for data wonks. However bootstrapping and getting off the ground could be difficult for most teams without leveraging a Spark cluster that is already pre-provisioned and provided as a managed service in the Cloud, while this is a very attractive choice to get going, in the long run, it could be a very expensive option if it’s not well managed.
As an alternative to this approach, our team has been exploring and working a lot with running Spark and all our Machine Learning workloads and pipelines as containerized Docker packages on Kubernetes. This provides an infrastructure-agnostic abstraction layer for us, and as a result, it improves our operational efficiency and reduces our overall compute cost. Most importantly, we can easily target our Spark workload deployment to run on any major Cloud or On-prem infrastructure (with Kubernetes as the common denominator) by just modifying a few configurations.
In this talk, we will walk you through the process our team follows to make it easy for us to run a production deployment of our Machine Learning workloads and pipelines on Kubernetes which seamlessly allows us to port our implementation from a local Kubernetes set up on the laptop during development to either an On-prem or Cloud Kubernetes environment
Artem Koval presented on cloud-native MLOps frameworks. MLOps is a process for deploying and monitoring machine learning models through continuous integration and delivery. It addresses fairness, explainability, model monitoring, and human intervention. Modern MLOps frameworks focus on these areas as well as data labeling, testing, and observability. Different levels of MLOps are needed depending on an organization's size, from lightweight for small teams to enterprise-level for large companies with many models. Human-centered AI should be incorporated at all levels by involving humans throughout the entire machine learning process.
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
ML solutions in production start from data ingestion and extend upto the actual deployment step. We want this workflow to be scalable, portable and simple. Containers and kubernetes are great at the former two but not the latter if you aren't a devops practitioner. We'll explore how you can leverage the Kubeflow project to deploy best-of-breed open-source systems for ML to diverse infrastructures.
Simply Business is a leading insurance provider for small business in the UK and we are now growing to the USA. In this presentation, I explain how our data platform is evolving to keep delivering value and adapting to a company that changes really fast.
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud.
Using Krylov, AI scientists can access eBay's massive datasets; build and train AI models; spin up powerful compute (high-memory or GPU instances) on the Krylov compute cluster; and set up machine learning pipelines, such as using declarative constructs that stitch together pipeline lifecycle.
The document discusses cloud native applications and their advantages. It describes how Mark and Grace build an online store as a cloud native application using microservices, containers, and horizontal scalability. This allows their application to be easily deployed, scaled, and updated. The document outlines layers of cloud native applications like functionality, data access, and deployment. It provides an example of a machine learning recommendation service and concludes that cloud native applications allow businesses to experiment quickly and react to needs.
The document discusses designing scalable platforms for artificial intelligence (AI) and machine learning (ML). It outlines several challenges in developing AI applications, including technical debts, unpredictability, different data and compute needs compared to traditional software. It then reviews existing commercial AI platforms and common components of AI platforms, including data access, ML workflows, computing infrastructure, model management, and APIs. The rest of the document focuses on eBay's Krylov project as an example AI platform, outlining its architecture, challenges of deploying platforms at scale, and needed skill sets on the platform team.
This document discusses DevOps and MLOps practices for machine learning models. It outlines that while ML development shares some similarities with traditional software development, such as using version control and CI/CD pipelines, there are also key differences related to data, tools, and people. Specifically, ML requires additional focus on exploratory data analysis, feature engineering, and specialized infrastructure for training and deploying models. The document provides an overview of how one company structures their ML team and processes.
Real world machine learning with Java for Fumankaitori.comMathieu Dumoulin
This document summarizes a presentation about using machine learning in Java 8 at Fumankaitori.com. The presentation introduces the speaker and their company, which collects user dissatisfaction posts and rewards users with points that can be exchanged for coupons. Their goal was to automate point assignment for posts using machine learning instead of manual rules. They trained an XGBoost model in DataRobot that achieved their goal of predicting points within 5 of human labels. For production, they achieved similar performance using H2O to train a gradient boosted machine model and generate a prediction POJO for low latency predictions. The presentation emphasizes that machine learning is possible for any Java engineer and that Java 8 features like streams make it a good choice for real
From prototype to production - The journey of re-designing SmartUp.ioMáté Lang
Talk about the joureny of small tech team re-designing SmartUp.io from scratch, and the technical paths from MVP to Production.
High level overview of architecture and tech stack decisions, best-practices and culture.
The document provides an overview of machine learning and artificial intelligence concepts. It discusses:
1. The machine learning pipeline, including data collection, preprocessing, model training and validation, and deployment. Common machine learning algorithms like decision trees, neural networks, and clustering are also introduced.
2. How artificial intelligence has been adopted across different business domains to automate tasks, gain insights from data, and improve customer experiences. Some challenges to AI adoption are also outlined.
3. The impact of AI on society and the workplace. While AI is predicted to help humans solve problems, some people remain wary of technologies like home health diagnostics or AI-powered education. Responsible development of explainable AI is important.
This document summarizes the development of Lore's machine learning and NLP platform using Python. It started as a monolithic Python server but evolved into a microservices architecture using Docker, Kubernetes, and Celery for parallelization. Key lessons included using DevOps tools like Docker for development and deployment, Celery to parallelize tasks, and wrapping services to improve modularity, flexibility, and performance. The platform now supports multiple products and consulting work in a scalable and maintainable way.
Serverless Functions and Machine Learning: Putting the AI in APIsNordic APIs
The document discusses using machine learning APIs and hosting machine learning models. It describes how off-the-shelf machine learning APIs work and how to host your own models. It then discusses limitations of hosting models on dedicated servers or using serverless functions and recommends a machine learning hosting platform that provides automatic scaling, discovery of models, and pay-per-use pricing. The presentation concludes with demonstrations of combining multiple machine learning models.
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...DataScienceConferenc1
Pepsico has developed an advanced machine learning platform using Kubeflow and other tools to address issues with non-reproducible models and increase efficiency. The platform enhances collaboration, focuses on core data science work, and provides scalability and standardization. It utilizes tools like Kubeflow Pipelines, Azure services, KServe, AutoML, and Datadog. Teams manage infrastructure, develop models, and provide specialized support. Transitioning to the Kubeflow-based platform from local development poses challenges but preliminary results show end-to-end project duration reduced by two-thirds, and improvements are anticipated to continue.
Day 13 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
This technical session for Local Experts in Data Sharing (LEBDs), this session will explain how to create data processing services that are key to i4Trust.
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...Andre Hora
Unittest and pytest are the most popular testing frameworks in Python. Overall, pytest provides some advantages, including simpler assertion, reuse of fixtures, and interoperability. Due to such benefits, multiple projects in the Python ecosystem have migrated from unittest to pytest. To facilitate the migration, pytest can also run unittest tests, thus, the migration can happen gradually over time. However, the migration can be timeconsuming and take a long time to conclude. In this context, projects would benefit from automated solutions to support the migration process. In this paper, we propose TestMigrationsInPy, a dataset of test migrations from unittest to pytest. TestMigrationsInPy contains 923 real-world migrations performed by developers. Future research proposing novel solutions to migrate frameworks in Python can rely on TestMigrationsInPy as a ground truth. Moreover, as TestMigrationsInPy includes information about the migration type (e.g., changes in assertions or fixtures), our dataset enables novel solutions to be verified effectively, for instance, from simpler assertion migrations to more complex fixture migrations. TestMigrationsInPy is publicly available at: https://github.com/altinoalvesjunior/TestMigrationsInPy.
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Andre Hora
Exceptions allow developers to handle error cases expected to occur infrequently. Ideally, good test suites should test both normal and exceptional behaviors to catch more bugs and avoid regressions. While current research analyzes exceptions that propagate to tests, it does not explore other exceptions that do not reach the tests. In this paper, we provide an empirical study to explore how frequently exceptional behaviors are tested in real-world systems. We consider both exceptions that propagate to tests and the ones that do not reach the tests. For this purpose, we run an instrumented version of test suites, monitor their execution, and collect information about the exceptions raised at runtime. We analyze the test suites of 25 Python systems, covering 5,372 executed methods, 17.9M calls, and 1.4M raised exceptions. We find that 21.4% of the executed methods do raise exceptions at runtime. In methods that raise exceptions, on the median, 1 in 10 calls exercise exceptional behaviors. Close to 80% of the methods that raise exceptions do so infrequently, but about 20% raise exceptions more frequently. Finally, we provide implications for researchers and practitioners. We suggest developing novel tools to support exercising exceptional behaviors and refactoring expensive try/except blocks. We also call attention to the fact that exception-raising behaviors are not necessarily “abnormal” or rare.
Who Watches the Watchmen (SciFiDevCon 2025)Allon Mureinik
Tests, especially unit tests, are the developers’ superheroes. They allow us to mess around with our code and keep us safe.
We often trust them with the safety of our codebase, but how do we know that we should? How do we know that this trust is well-deserved?
Enter mutation testing – by intentionally injecting harmful mutations into our code and seeing if they are caught by the tests, we can evaluate the quality of the safety net they provide. By watching the watchmen, we can make sure our tests really protect us, and we aren’t just green-washing our IDEs to a false sense of security.
Talk from SciFiDevCon 2025
https://www.scifidevcon.com/courses/2025-scifidevcon/contents/680efa43ae4f5
Get & Download Wondershare Filmora Crack Latest [2025]saniaaftab72555
Copy & Past Link 👉👉
https://dr-up-community.info/
Wondershare Filmora is a video editing software and app designed for both beginners and experienced users. It's known for its user-friendly interface, drag-and-drop functionality, and a wide range of tools and features for creating and editing videos. Filmora is available on Windows, macOS, iOS (iPhone/iPad), and Android platforms.
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AIdanshalev
If we were building a GenAI stack today, we'd start with one question: Can your retrieval system handle multi-hop logic?
Trick question, b/c most can’t. They treat retrieval as nearest-neighbor search.
Today, we discussed scaling #GraphRAG at AWS DevOps Day, and the takeaway is clear: VectorRAG is naive, lacks domain awareness, and can’t handle full dataset retrieval.
GraphRAG builds a knowledge graph from source documents, allowing for a deeper understanding of the data + higher accuracy.
Landscape of Requirements Engineering for/by AI through Literature ReviewHironori Washizaki
Hironori Washizaki, "Landscape of Requirements Engineering for/by AI through Literature Review," RAISE 2025: Workshop on Requirements engineering for AI-powered SoftwarE, 2025.
Douwan Crack 2025 new verson+ License codeaneelaramzan63
Copy & Paste On Google >>> https://dr-up-community.info/
Douwan Preactivated Crack Douwan Crack Free Download. Douwan is a comprehensive software solution designed for data management and analysis.
AgentExchange is Salesforce’s latest innovation, expanding upon the foundation of AppExchange by offering a centralized marketplace for AI-powered digital labor. Designed for Agentblazers, developers, and Salesforce admins, this platform enables the rapid development and deployment of AI agents across industries.
Email: [email protected]
Phone: +1(630) 349 2411
Website: https://www.fexle.com/blogs/agentexchange-an-ultimate-guide-for-salesforce-consultants-businesses/?utm_source=slideshare&utm_medium=pptNg
How can one start with crypto wallet development.pptxlaravinson24
This presentation is a beginner-friendly guide to developing a crypto wallet from scratch. It covers essential concepts such as wallet types, blockchain integration, key management, and security best practices. Ideal for developers and tech enthusiasts looking to enter the world of Web3 and decentralized finance.
PDF Reader Pro Crack Latest Version FREE Download 2025mu394968
🌍📱👉COPY LINK & PASTE ON GOOGLE https://dr-kain-geera.info/👈🌍
PDF Reader Pro is a software application, often referred to as an AI-powered PDF editor and converter, designed for viewing, editing, annotating, and managing PDF files. It supports various PDF functionalities like merging, splitting, converting, and protecting PDFs. Additionally, it can handle tasks such as creating fillable forms, adding digital signatures, and performing optical character recognition (OCR).
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDinusha Kumarasiri
AI is transforming APIs, enabling smarter automation, enhanced decision-making, and seamless integrations. This presentation explores key design principles for AI-infused APIs on Azure, covering performance optimization, security best practices, scalability strategies, and responsible AI governance. Learn how to leverage Azure API Management, machine learning models, and cloud-native architectures to build robust, efficient, and intelligent API solutions
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Versionsaimabibi60507
Copy & Past Link👉👉
https://dr-up-community.info/
Pixologic ZBrush, now developed by Maxon, is a premier digital sculpting and painting software renowned for its ability to create highly detailed 3D models. Utilizing a unique "pixol" technology, ZBrush stores depth, lighting, and material information for each point on the screen, allowing artists to sculpt and paint with remarkable precision .
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?steaveroggers
Migrating from Lotus Notes to Outlook can be a complex and time-consuming task, especially when dealing with large volumes of NSF emails. This presentation provides a complete guide on how to batch export Lotus Notes NSF emails to Outlook PST format quickly and securely. It highlights the challenges of manual methods, the benefits of using an automated tool, and introduces eSoftTools NSF to PST Converter Software — a reliable solution designed to handle bulk email migrations efficiently. Learn about the software’s key features, step-by-step export process, system requirements, and how it ensures 100% data accuracy and folder structure preservation during migration. Make your email transition smoother, safer, and faster with the right approach.
Read More:- https://www.esofttools.com/nsf-to-pst-converter.html
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...Egor Kaleynik
This case study explores how we partnered with a mid-sized U.S. healthcare SaaS provider to help them scale from a successful pilot phase to supporting over 10,000 users—while meeting strict HIPAA compliance requirements.
Faced with slow, manual testing cycles, frequent regression bugs, and looming audit risks, their growth was at risk. Their existing QA processes couldn’t keep up with the complexity of real-time biometric data handling, and earlier automation attempts had failed due to unreliable tools and fragmented workflows.
We stepped in to deliver a full QA and DevOps transformation. Our team replaced their fragile legacy tests with Testim’s self-healing automation, integrated Postman and OWASP ZAP into Jenkins pipelines for continuous API and security validation, and leveraged AWS Device Farm for real-device, region-specific compliance testing. Custom deployment scripts gave them control over rollouts without relying on heavy CI/CD infrastructure.
The result? Test cycle times were reduced from 3 days to just 8 hours, regression bugs dropped by 40%, and they passed their first HIPAA audit without issue—unlocking faster contract signings and enabling them to expand confidently. More than just a technical upgrade, this project embedded compliance into every phase of development, proving that SaaS providers in regulated industries can scale fast and stay secure.
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentShubham Joshi
A secure test infrastructure ensures that the testing process doesn’t become a gateway for vulnerabilities. By protecting test environments, data, and access points, organizations can confidently develop and deploy software without compromising user privacy or system integrity.
Download YouTube By Click 2025 Free Full Activatedsaniamalik72555
Copy & Past Link 👉👉
https://dr-up-community.info/
"YouTube by Click" likely refers to the ByClick Downloader software, a video downloading and conversion tool, specifically designed to download content from YouTube and other video platforms. It allows users to download YouTube videos for offline viewing and to convert them to different formats.
3. In 2016
● Only major models in production
● Models took on average 8 weeks to build (source: survey of ML producers)
● Everything built in Aerosolve, Spark and Scala
● No support for Tensorflow, PyTorch, SK-Learn or other popular ML packages
● Significant discrepancies between offline and online data
ML Infra was formed with the charter to:
● Enable more users to build ML products
● Reduce time and effort
● Enable easier model evaluation
Q4 2016: Formation of our ML Infra team
4. Before ML
Infrastructure
ML has had a massive impact on Airbnb’s
product
● Search Ranking
● Smart Pricing
● Trust
● Paid Growth
● …And a few other major models
5. After ML
Infrastructure
But there were many other areas that had
high-potential for ML, but were realized less of
that potential.
● Paid Growth - Hosts
● Classifying listing
● Experience Ranking + Personalization
● Host Availability
● Business Travel Classifier
● Room Type Categorizations
● Make Listing a Space Easier
● Customer Service Ticket Routing
● … And many more
6. Vision
Airbnb routinely ships ML-powered features throughout the
product.
Mission
Equip Airbnb with shared technology to build
production-ready ML applications with no incidental
complexity.
(Technology = tools, platforms, knowledge, shared feature data, etc.)
7. Value of ML
Infrastructure
Machine Learning Infrastructure can:
● Remove incidental complexities, by providing
generic, reusable solutions
● Simplify the workflow for intrinsic
complexities, by providing tooling, libraries,
and environments that make ML
development more efficient
And at the same time:
● Establish a standardized platform that
enables cross-company sharing of feature
data and model components
● “Make it easy to do the right thing” (ex:
consistent training/streaming/scoring logic)
9. Learnings:
● No consistency between ML Workflows
● New teams struggle to begin using ML
● Airbnb has a wide variety in ML applications
● Existing ML workflows are slow, fragmented, and brittle
● Incidental complexity vs. intrinsic complexity
● Build and forget - ML as a linear process
Q1 2017: Figuring out what to build
12. ● Consistent environment across the stack
○ Use Docker
● Common workflow across different ML frameworks
○ Supports Scikit-learn, TF, PyTorch, etc.
● Modular components
○ Easy to customize parts
○ Easy to share data/pipelines
Key Design Decisions
18. Components
air/mlinfravision
● Data Management: Zipline
● Training: Redspot / BigQueue
● Core ML Library: Bighead libraries
● Productionization: Deep Thought (online) / ML Automator (offline)
● Model Management: Model Repo
● Monitoring: Model Repo UI
20. Zipline - Why
● Defining features (especially windowed) with hive was complicated and error
prone
● Backfilling training sets (on inefficient hive queries) was a major bottleneck
● No feature sharing
● Inconsistent offline and online datasets
● Warehouse is built as of end-of-day, lacked point-in-time features
● ML data pipelines lacked data quality checks or monitoring
● Ownership of pipelines was in disarray
21. A data management platform for ML
● Common (and simple) definition: Define the feature once and use it in batch
and streaming
● Training data backfills: Resource efficient and point-in-time correct with
scheduled updates
● Lambda updates: Features available both offline and online
● Data quality: Feature visualizations and automatic data quality monitoring
Zipline - Overview
22. Zipline - Feature definition language
Primary Key
Timestamp
Owner
Operation = Sum
Time windows
● Owner allows us
to trace
accountability
● Primary keys and
timestamp are
used to guarantee
point in time
correctness in
Training Set
● Operations and
time windows are
optional
● Spark efficiently
handles
aggregations
(windowed and
not)
23. Zipline - Data Quality and Collaboration
● Features can be
visualized and
browsed through
online editor
● Gives stats on
feature, and also
provides info on
ownership
24. Zipline - Training Data
PK1 = User ID PK2 = Listing ID Timestamp bookings_by_user bookings_by_listing
123 456 2018-01-01 23... 0 4
234 567 2018-01-04 01... 2 8
456 789 2018-01-02 08... 1 0
User provides: Primary keys, timestamps, list of features
Zipline computes feature values
point-in-time correct for those PKs and
those timestamps. And joins them
together.
FeatureSet 1 FeatureSet 2
25. Zipline - Training Data
Airflow integration for daily
update of training data
26. Label logic
● Labels are often
joined to features with
an offset for training
(60 days offset)
● But that offset does
not apply to scoring
data
Zipline - Training Data with Labels
ds=2017-08-16
ds=2017-10-15
???
Features Table Labels Table
Training
...
???
ds=2017-10-15
Scoring
27. Features served
from online KV
store
Zipline schedules
daily batch
correction
Zipline - Consistent online and offline features
User writes one conf
Zipline starts the
streaming job
28. ● More efficient cluster usage: Hive and Spark jobs are optimized; Many weeks
to create training data backfills => a few hours
● Ease of use: Can define 100s of new features in a few hours (from many days)
● Online scoring with lambda: Features are automatically availability in online
scoring environment
● Collaboration: Many features are shared!
● Management: Clear data ownership and maintenance
Zipline - Impact
31. ● Started with Jupyterhub (open-source project), which manages multiple Jupyter
Notebook Servers (prototyping environment)
● But users were installing packages locally, and then creating virtualenv for
other parts of our infra
○ Environment was very fragile
● Users wanted to be able to use jupyterhub on larger instances or instances
with GPU
● Wanting to share notebooks with other teammates was common too
Redspot - Why
32. Containerized environments
● Every user’s environment is containerized via docker
○ Allows customizing the notebook environment without
affecting other users
■ e.g. install system/python packages
○ Easier to restore state therefore helps with reproducibility
● Support using custom docker images
○ Base images based on user’s needs
■ e.g. GPU access, pre-installed ML packages
○ Build your own image for a faster start time
33. Remote Instance Spawner
● For bigger jobs and total isolation,
Redspot allows launching a dedicated
instance
● Hardware resources not shared with
other users
● Automatically terminates idle instances
periodically
34. ● A multi-tenant notebook environment
● Makes it easy to iterate and prototype ML models, share work
○ Integrated with the rest of our infra - so one can deploy a notebook to prod
● Improved upon open source Jupyterhub
○ Containerized; can bring custom Docker env
○ Remote notebook spawner for dedicated instances (P3 and X1 machines on
AWS)
○ Persist notebooks in EFS and share with teams
○ Reverting to prior checkpoint
Redspot Summary
37. ● Performant, scalable execution of model inference in production is hard
○ Engineers shouldn’t build one off solutions for every model.
○ Data scientists should be able to launch new models in production with minimal
eng involvement.
● Debugging differences between online inference and training are difficult
○ We should support the exact serialized version of the model the data scientist
built
○ We should be able to run the same python transformations data scientists write
for training.
○ We should be able to load data computed in the warehouse or streaming easily
into online scoring.
Deep Thought - Why
38. ● Deep Thought is a shared service for online inference
○ Support for pickled sklearn models, TensorFlow models, and custom code in
python or Java
○ Add your model configuration to a file and deploy. Completely config driven so data
scientists don’t have to involve engineers to launch new models.
○ Engineers can then connect to a REST API from other services to get scores.
○ Support for loading data from K/V stores
○ Standardized logging, alerting and dashboarding for monitoring and offline
analysis of model performance
○ Process isolation to enable multi-tenancy without contention
○ Scalable and Reliable: 80+ models. Highest QPS service at Airbnb. Median response
time: 4ms. p95: 13ms.
Deep Thought - How
41. Model Repo
Overview
Model Repo is Bighead’s model management service
● Contains prototype and production models
● Can serve models “raw” or trained
● The source of truth on which trained models are
in production
● Stores model health data
42. Model Repo
Internals
We decompose Models into two components:
● Model Version - raw model code + docker image
● Model Artifact - parameters learned via training
Model
Version
Model Artifact
Code
Docker
Image
A trained model consists of:
Model Version
+
Model Artifact
Production
43. Our built-in UI provides:
● Deployment - review changes, deploy, and rollback trained models
● Model Health - metrics, visualizations, alerting, central dashboard
● Experimentation - Ability to setup model experiments - e.g. split traffic
between two or more models
Model Repo: UI
45. ● Tools and libraries for common tasks
○ Periodic training, evaluation and scoring on a model is common: Building Airflow
DAGs, uploading scores to K/V stores, dashboards on scores, alert on score changes
○ Scoring on large tables is tricky to scale
ML Automator - Why
46. ● Once a model file is checked in, we generate the DAGs automatically to train/score it
● 40+ models using this feature
● Score on Spark for large datasets (we generate virtualenv equivalent to the docker image,
as spark doesn’t run executors in docker image)
ML Automator
48. ML Helpers - Why
● Transformations are re-written too often
○ There are many versions of transformations for NLP, data cleaning, imputing, etc.
○ Models used to “start from scratch” and rebuild the same things
○ Model observability -- understand what features are important
49. ● Library of transformations; holds more than 50 different transformations including
automated preprocessing for common input formats
● Created example notebooks to show usage of our infra
○ Example usage of ML pipelines, contains diagnostics that help people debug and
improve models
○ Has been cloned and modified more than 20 times to build new models
● Improved Scikit-Learn Pipelines
○ Propagate feature metadata so we can plot feature importance at the end and
connect it to feature names
○ Pipelines for data processing are reusable in other pipelines
○ Added wrappers for model libraries (XGB, etc.) can be serialized (robust to minor
version changes)
ML Helpers and Pipelines
52. ML models have diverse dependency sets (tensorflow,
xgboost, etc.). We allow users to provide a docker image
within which model code always runs.
ML models don’t run in isolation however, so we’ve built a
lightweight API to interact with the “dockerized model”
Docker Container
Model
(user code)
Other ML
Infra
Services
Model
API
Dockerized
Models