Intro to Apache Kafka I gave at the Big Data Meetup in Geneva in June 2016. Covers the basics and gets into some more advanced topics. Includes demo and source code to write clients and unit tests in Java (GitHub repo on the last slides).
This document provides an introduction to Apache Kafka. It describes Kafka as a distributed messaging system with features like durability, scalability, publish-subscribe capabilities, and ordering. It discusses key Kafka concepts like producers, consumers, topics, partitions and brokers. It also summarizes use cases for Kafka and how to implement producers and consumers in code. Finally, it briefly outlines related tools like Kafka Connect and Kafka Streams that build upon the Kafka platform.
This document summarizes Netflix's use of Kafka in their data pipeline. It discusses the evolution of Netflix's data pipeline to incorporate Kafka to handle 400 billion events per day. It describes how Netflix uses Kafka clusters with different priorities and configurations. It also outlines some of the challenges of using Kafka at Netflix's scale, such as Zookeeper client issues and cluster scaling, and the solutions Netflix developed to address these challenges.
This document discusses using microservices with Kafka. It describes how Kafka can be used to connect microservices for asynchronous communication. It outlines various features of Kafka like high throughput, replication, partitioning, and how it can provide reliability. Examples are given of how microservices could use Kafka for logging, filtering messages, and dispatching to different topics. Performance benefits of Kafka are highlighted like scalability and ability to handle high volumes of messages.
The document provides an introduction and overview of Apache Kafka presented by Jeff Holoman. It begins with an agenda and background on the presenter. It then covers basic Kafka concepts like topics, partitions, producers, consumers and consumer groups. It discusses efficiency and delivery guarantees. Finally, it presents some use cases for Kafka and positioning around when it may or may not be a good fit compared to other technologies.
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...HostedbyConfluent
Deploying Kafka to support multiple teams or even an entire company has many benefits. It reduces operational costs, simplifies onboarding of new applications as your adoption grows, and consolidates all your data in one place. However, this makes applications sharing the cluster vulnerable to any one or few of them taking all cluster resources. The combined cluster load also becomes less predictable, increasing the risk of overloading the cluster and data unavailability.
In this talk, we will describe how to use quota framework in Apache Kafka to ensure that a misconfigured client or unexpected increase in client load does not monopolize broker resources. You will get a deeper understanding of bandwidth and request quotas, how they get enforced, and gain intuition for setting the limits for your use-cases.
While quotas limit individual applications, there must be enough cluster capacity to support the combined application load. Onboarding new applications or scaling the usage of existing applications may require manual quota adjustments and upfront capacity planning to ensure high availability.
We will describe the steps we took toward solving this problem in Confluent Cloud, where we must immediately support unpredictable load with high availability. We implemented a custom broker quota plugin (KIP-257) to replace static per broker quota allocation with dynamic and self-tuning quotas based on the available capacity (which we also detect dynamically). By learning our journey, you will have more insights into the relevant problems and techniques to address them.
This document discusses Apache Kafka, an open-source distributed event streaming platform. It provides an introduction to Kafka's design and capabilities including:
1) Kafka is a distributed publish-subscribe messaging system that can handle high throughput workloads with low latency.
2) It is designed for real-time data pipelines and activity streaming and can be used for transporting logs, metrics collection, and building real-time applications.
3) Kafka supports distributed, scalable, fault-tolerant storage and processing of streaming data across multiple producers and consumers.
(Stephane Maarek, DataCumulus) Kafka Summit SF 2018
Security in Kafka is a cornerstone of true enterprise production-ready deployment: It enables companies to control access to the cluster and limit risks in data corruption and unwanted operations. Understanding how to use security in Kafka and exploiting its capabilities can be complex, especially as the documentation that is available is aimed at people with substantial existing knowledge on the matter.
This talk will be delivered in a “hero journey” fashion, tracing the experience of an engineer with basic understanding of Kafka who is tasked with securing a Kafka cluster. Along the way, I will illustrate the benefits and implications of various mechanisms and provide some real-world tips on how users can simplify security management.
Attendees of this talk will learn about aspects of security in Kafka, including:
-Encryption: What is SSL, what problems it solves and how Kafka leverages it. We’ll discuss encryption in flight vs. encryption at rest.
-Authentication: Without authentication, anyone would be able to write to any topic in a Kafka cluster, do anything and remain anonymous. We’ll explore the available authentication mechanisms and their suitability for different types of deployment, including mutual SSL authentication, SASL/GSSAPI, SASL/SCRAM and SASL/PLAIN.
-Authorization: How ACLs work in Kafka, ZooKeeper security (risks and mitigations) and how to manage ACLs at scale
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
My talk from ScalaDays 2016 in New York on May 11, 2016:
Transitioning from a monolithic application to a set of microservices can help increase performance and scalability, but it can also drastically increase complexity. Layers of inter-service network calls for add latency and an increasing risk of failure where previously only local function calls existed. In this talk, I'll speak about how to tame this complexity using Apache Kafka and Reactive Streams to:
- Extract non-critical processing from the critical path of your application to reduce request latency
- Provide back-pressure to handle both slow and fast producers/consumers
- Maintain high availability, high performance, and reliable messaging
- Evolve message payloads while maintaining backwards and forwards compatibility.
Big data event streaming is very common part of any big data Architecture. Of the available open source big data streaming technologies Apache Kafka stands out because of it realtime, distributed, and reliable characteristics. This is possible because of the Kafka Architecture. This talk highlights those features.
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platformconfluent
Many enterprises have a large technical debt in legacy applications hosted in on-premises data centers. There is a strong desire to modernize and move to a cloud-based infrastructure, but the world won’t stop for you to transition. Existing applications need to be supported and enhanced; data from legacy platforms is required to make decisions that drive the business. On the other hand, data from cloud-based applications does not exist in a vacuum. Legacy applications need access to these cloud data sources and vice versa.
Can an enterprise have it both ways? Can new applications be built in the cloud while existing applications are maintained in a private data center?
Monsanto has adopted a cloud-first mentality—today most new development is focused on the cloud. However, this transition did not happen overnight.
Chrix Finne and Bob Lehmann share their experience building and implementing a Kafka-based cross-data-center streaming platform to facilitate the move to the cloud—in the process, kick-starting Monsanto’s transition from batch to stream processing. Details include an overview of the challenges involved in transitioning to the cloud and a deep dive into the cross-data-center stream platform architecture, including best practices for running this architecture in production and a summary of the benefits seen after deploying this architecture.
Apache Kafka is a distributed publish-subscribe messaging system that can handle high volumes of data and enable messages to be passed from one endpoint to another. It uses a distributed commit log that allows messages to be persisted on disk for durability. Kafka is fast, scalable, fault-tolerant, and guarantees zero data loss. It is used by companies like LinkedIn, Twitter, and Netflix to handle high volumes of real-time data and streaming workloads.
Watch this talk here: https://www.confluent.io/online-talks/how-apache-kafka-works-on-demand
Pick up best practices for developing applications that use Apache Kafka, beginning with a high level code overview for a basic producer and consumer. From there we’ll cover strategies for building powerful stream processing applications, including high availability through replication, data retention policies, producer design and producer guarantees.
We’ll delve into the details of delivery guarantees, including exactly-once semantics, partition strategies and consumer group rebalances. The talk will finish with a discussion of compacted topics, troubleshooting strategies and a security overview.
This session is part 3 of 4 in our Fundamentals for Apache Kafka series.
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
To manage the ever-increasing volume and velocity of data within your company, you have successfully made the transition from single machines and one-off solutions to large distributed stream infrastructures in your data center, powered by Apache Kafka. But what if one data center is not enough? I will describe building resilient data pipelines with Apache Kafka that span multiple data centers and points of presence, and provide an overview of best practices and common patterns while covering key areas such as architecture guidelines, data replication, and mirroring as well as disaster scenarios and failure handling.
Apache Kafka is a distributed publish-subscribe messaging system that was originally created by LinkedIn and contributed to the Apache Software Foundation. It is written in Scala and provides a multi-language API to publish and consume streams of records. Kafka is useful for both log aggregation and real-time messaging due to its high performance, scalability, and ability to serve as both a distributed messaging system and log storage system with a single unified architecture. To use Kafka, one runs Zookeeper for coordination, Kafka brokers to form a cluster, and then publishes and consumes messages with a producer API and consumer API.
With Apache Kafka 0.9, the community has introduced a number of features to make data streams secure. In this talk, we’ll explain the motivation for making these changes, discuss the design of Kafka security, and explain how to secure a Kafka cluster. We will cover common pitfalls in securing Kafka, and talk about ongoing security work.
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...confluent
In this baller talk, we will be addressing the elephant in the room that no one ever wants to look at or talk about: security. We generally never want to talk about configuring security because if we do, we allocate risk of penetration by exposing ourselves to exploitation. However, this leads to a lot of confusion around proper Kafka security best practices and how to appropriately lock down a cluster when you are starting out. In this talk we will demystify the elephant in the room without deconstructing it limb by limb. We will give you a notion of how to configure the following for BOTH clients and servers: * TLS or Kerberos Authentication * Encrypt your network traffic via TLS * Perform authorization via access control lists (ACLs) We will also demonstrate the above with a GitHub repo you can try out for yourself. Lastly, we will present a reference implementation of oauth if that suits your fancy. All in all you should walk away with a pretty decent understanding of the necessary aspects required for a secure Kafka environment.
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub ServiceOracle Korea
오라클 클라우드에서는 카프카를 관리형 서비스로 제공합니다. 밋업 세션에서는 관리형 카프카 서비스의 편의성을 소개하고 카프카 서비스의 데모를 진행합니다. 또한 MSA, 빅데이터 및 Blockchain의 인프라로 카프카가 핵심 위치를 갖는 것 뿐만 아니라 오라클 클라우드의 통합 핵심 컴포넌트로 카프카는 중요한 의미를 갖습니다.
오라클 클라우드의 통합 컴포넌트로 카프카의 역할과 주요 서비스의 구성을 소개합니다.
* 본 세션은 “입문자/초급자/중급자” 분들께 두루 적합한 세션입니다.
Developing with the Go client for Apache KafkaJoe Stein
This document summarizes Joe Stein's go_kafka_client GitHub repository, which provides a Kafka client library written in Go. It describes the motivation for creating a new Go Kafka client, how to use producers and consumers with the library, and distributed processing patterns like mirroring and reactive streams. The client aims to be lightweight with few dependencies while supporting real-world use cases for Kafka producers and high-level consumers.
This document provides guidance on scaling Apache Kafka clusters and tuning performance. It discusses expanding Kafka clusters horizontally across inexpensive servers for increased throughput and CPU utilization. Key aspects that impact performance like disk layout, OS tuning, Java settings, broker and topic monitoring, client tuning, and anticipating problems are covered. Application performance can be improved through configuration of batch size, compression, and request handling, while consumer performance relies on partitioning, fetch settings, and avoiding perpetual rebalances.
Running Galera Cluster in Microsoft Azure involves setting up virtual machines and installing Galera Cluster software. This provides more control than Azure Database for MySQL, which uses asynchronous replication. While Azure Database for MySQL is fully managed, Galera Cluster in VMs supports the virtually synchronous replication that is its core feature. Cost estimates show running three Galera Cluster nodes in VMs costs less monthly than three hosted MySQL instances in Azure Database for MySQL.
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
Kafka provides reliability guarantees through replication and configuration settings. It replicates data across multiple brokers to protect against failures. Producers can ensure data is committed to all in-sync replicas through configuration settings like request.required.acks. Consumers maintain offsets and can commit after processing to prevent data loss. Monitoring is also important to detect any potential issues or data loss in the Kafka system.
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
Chris Curtin gave a presentation on Apache Kafka at the Atlanta Java Users Group. He discussed his background in technology and current role at Silverpop. He then provided an overview of Apache Kafka, describing its core functionality as a distributed publish-subscribe messaging system. Finally, he demonstrated how producers and consumers interact with Kafka and highlighted some use cases and performance figures from LinkedIn's deployment of Kafka.
Apache Kafka's rise in popularity as a streaming platform has demanded a revisit of its traditional at-least-once message delivery semantics.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients. The main focus will be the specific semantics that Kafka distributed transactions enable and the underlying mechanics which allow them to scale efficiently.
How Apache Kafka is transforming Hadoop, Spark and StormEdureka!
This document provides an overview of Apache Kafka and how it is transforming Hadoop, Spark, and Storm. It begins with explaining why Kafka is needed, then defines what Kafka is and describes its architecture. Key components of Kafka like topics, producers, consumers and brokers are explained. The document also shows how Kafka can be used with Hadoop, Spark, and Storm for stream processing. It lists some companies that use Kafka and concludes by advertising an Edureka course on Apache Kafka.
This document discusses event-driven architectures and messaging. It provides an agenda that includes a case study, retrospective, definitions of event-driven systems and messaging, and an overview of JMS, RabbitMQ and Kafka. Benefits of moving to an event-driven system with RabbitMQ are outlined, including reduced I/O load, improved scalability and distribution. Key criteria for choosing a messaging system include throughput, clustering, topologies, persistence, routing and delivery guarantees. Common messaging patterns like queues, publish-subscribe and request-reply are described. The document concludes with questions.
This document discusses 101 mistakes that FINN.no learned from in running Apache Kafka. It begins with an introduction to Kafka and why FINN.no chose to use it. It then discusses FINN.no's Kafka architecture and usage over time as their implementation grew. The document outlines several common mistakes made including not distinguishing between internal and external data, lack of external schema definition, using a single configuration for all topics, defaulting to 128 partitions, and running Zookeeper on overloaded nodes. Each mistake is explained, potential consequences are given, better solutions are proposed, and what FINN.no has done to address them.
(Stephane Maarek, DataCumulus) Kafka Summit SF 2018
Security in Kafka is a cornerstone of true enterprise production-ready deployment: It enables companies to control access to the cluster and limit risks in data corruption and unwanted operations. Understanding how to use security in Kafka and exploiting its capabilities can be complex, especially as the documentation that is available is aimed at people with substantial existing knowledge on the matter.
This talk will be delivered in a “hero journey” fashion, tracing the experience of an engineer with basic understanding of Kafka who is tasked with securing a Kafka cluster. Along the way, I will illustrate the benefits and implications of various mechanisms and provide some real-world tips on how users can simplify security management.
Attendees of this talk will learn about aspects of security in Kafka, including:
-Encryption: What is SSL, what problems it solves and how Kafka leverages it. We’ll discuss encryption in flight vs. encryption at rest.
-Authentication: Without authentication, anyone would be able to write to any topic in a Kafka cluster, do anything and remain anonymous. We’ll explore the available authentication mechanisms and their suitability for different types of deployment, including mutual SSL authentication, SASL/GSSAPI, SASL/SCRAM and SASL/PLAIN.
-Authorization: How ACLs work in Kafka, ZooKeeper security (risks and mitigations) and how to manage ACLs at scale
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
My talk from ScalaDays 2016 in New York on May 11, 2016:
Transitioning from a monolithic application to a set of microservices can help increase performance and scalability, but it can also drastically increase complexity. Layers of inter-service network calls for add latency and an increasing risk of failure where previously only local function calls existed. In this talk, I'll speak about how to tame this complexity using Apache Kafka and Reactive Streams to:
- Extract non-critical processing from the critical path of your application to reduce request latency
- Provide back-pressure to handle both slow and fast producers/consumers
- Maintain high availability, high performance, and reliable messaging
- Evolve message payloads while maintaining backwards and forwards compatibility.
Big data event streaming is very common part of any big data Architecture. Of the available open source big data streaming technologies Apache Kafka stands out because of it realtime, distributed, and reliable characteristics. This is possible because of the Kafka Architecture. This talk highlights those features.
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platformconfluent
Many enterprises have a large technical debt in legacy applications hosted in on-premises data centers. There is a strong desire to modernize and move to a cloud-based infrastructure, but the world won’t stop for you to transition. Existing applications need to be supported and enhanced; data from legacy platforms is required to make decisions that drive the business. On the other hand, data from cloud-based applications does not exist in a vacuum. Legacy applications need access to these cloud data sources and vice versa.
Can an enterprise have it both ways? Can new applications be built in the cloud while existing applications are maintained in a private data center?
Monsanto has adopted a cloud-first mentality—today most new development is focused on the cloud. However, this transition did not happen overnight.
Chrix Finne and Bob Lehmann share their experience building and implementing a Kafka-based cross-data-center streaming platform to facilitate the move to the cloud—in the process, kick-starting Monsanto’s transition from batch to stream processing. Details include an overview of the challenges involved in transitioning to the cloud and a deep dive into the cross-data-center stream platform architecture, including best practices for running this architecture in production and a summary of the benefits seen after deploying this architecture.
Apache Kafka is a distributed publish-subscribe messaging system that can handle high volumes of data and enable messages to be passed from one endpoint to another. It uses a distributed commit log that allows messages to be persisted on disk for durability. Kafka is fast, scalable, fault-tolerant, and guarantees zero data loss. It is used by companies like LinkedIn, Twitter, and Netflix to handle high volumes of real-time data and streaming workloads.
Watch this talk here: https://www.confluent.io/online-talks/how-apache-kafka-works-on-demand
Pick up best practices for developing applications that use Apache Kafka, beginning with a high level code overview for a basic producer and consumer. From there we’ll cover strategies for building powerful stream processing applications, including high availability through replication, data retention policies, producer design and producer guarantees.
We’ll delve into the details of delivery guarantees, including exactly-once semantics, partition strategies and consumer group rebalances. The talk will finish with a discussion of compacted topics, troubleshooting strategies and a security overview.
This session is part 3 of 4 in our Fundamentals for Apache Kafka series.
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
To manage the ever-increasing volume and velocity of data within your company, you have successfully made the transition from single machines and one-off solutions to large distributed stream infrastructures in your data center, powered by Apache Kafka. But what if one data center is not enough? I will describe building resilient data pipelines with Apache Kafka that span multiple data centers and points of presence, and provide an overview of best practices and common patterns while covering key areas such as architecture guidelines, data replication, and mirroring as well as disaster scenarios and failure handling.
Apache Kafka is a distributed publish-subscribe messaging system that was originally created by LinkedIn and contributed to the Apache Software Foundation. It is written in Scala and provides a multi-language API to publish and consume streams of records. Kafka is useful for both log aggregation and real-time messaging due to its high performance, scalability, and ability to serve as both a distributed messaging system and log storage system with a single unified architecture. To use Kafka, one runs Zookeeper for coordination, Kafka brokers to form a cluster, and then publishes and consumes messages with a producer API and consumer API.
With Apache Kafka 0.9, the community has introduced a number of features to make data streams secure. In this talk, we’ll explain the motivation for making these changes, discuss the design of Kafka security, and explain how to secure a Kafka cluster. We will cover common pitfalls in securing Kafka, and talk about ongoing security work.
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...confluent
In this baller talk, we will be addressing the elephant in the room that no one ever wants to look at or talk about: security. We generally never want to talk about configuring security because if we do, we allocate risk of penetration by exposing ourselves to exploitation. However, this leads to a lot of confusion around proper Kafka security best practices and how to appropriately lock down a cluster when you are starting out. In this talk we will demystify the elephant in the room without deconstructing it limb by limb. We will give you a notion of how to configure the following for BOTH clients and servers: * TLS or Kerberos Authentication * Encrypt your network traffic via TLS * Perform authorization via access control lists (ACLs) We will also demonstrate the above with a GitHub repo you can try out for yourself. Lastly, we will present a reference implementation of oauth if that suits your fancy. All in all you should walk away with a pretty decent understanding of the necessary aspects required for a secure Kafka environment.
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub ServiceOracle Korea
오라클 클라우드에서는 카프카를 관리형 서비스로 제공합니다. 밋업 세션에서는 관리형 카프카 서비스의 편의성을 소개하고 카프카 서비스의 데모를 진행합니다. 또한 MSA, 빅데이터 및 Blockchain의 인프라로 카프카가 핵심 위치를 갖는 것 뿐만 아니라 오라클 클라우드의 통합 핵심 컴포넌트로 카프카는 중요한 의미를 갖습니다.
오라클 클라우드의 통합 컴포넌트로 카프카의 역할과 주요 서비스의 구성을 소개합니다.
* 본 세션은 “입문자/초급자/중급자” 분들께 두루 적합한 세션입니다.
Developing with the Go client for Apache KafkaJoe Stein
This document summarizes Joe Stein's go_kafka_client GitHub repository, which provides a Kafka client library written in Go. It describes the motivation for creating a new Go Kafka client, how to use producers and consumers with the library, and distributed processing patterns like mirroring and reactive streams. The client aims to be lightweight with few dependencies while supporting real-world use cases for Kafka producers and high-level consumers.
This document provides guidance on scaling Apache Kafka clusters and tuning performance. It discusses expanding Kafka clusters horizontally across inexpensive servers for increased throughput and CPU utilization. Key aspects that impact performance like disk layout, OS tuning, Java settings, broker and topic monitoring, client tuning, and anticipating problems are covered. Application performance can be improved through configuration of batch size, compression, and request handling, while consumer performance relies on partitioning, fetch settings, and avoiding perpetual rebalances.
Running Galera Cluster in Microsoft Azure involves setting up virtual machines and installing Galera Cluster software. This provides more control than Azure Database for MySQL, which uses asynchronous replication. While Azure Database for MySQL is fully managed, Galera Cluster in VMs supports the virtually synchronous replication that is its core feature. Cost estimates show running three Galera Cluster nodes in VMs costs less monthly than three hosted MySQL instances in Azure Database for MySQL.
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
Kafka provides reliability guarantees through replication and configuration settings. It replicates data across multiple brokers to protect against failures. Producers can ensure data is committed to all in-sync replicas through configuration settings like request.required.acks. Consumers maintain offsets and can commit after processing to prevent data loss. Monitoring is also important to detect any potential issues or data loss in the Kafka system.
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
Chris Curtin gave a presentation on Apache Kafka at the Atlanta Java Users Group. He discussed his background in technology and current role at Silverpop. He then provided an overview of Apache Kafka, describing its core functionality as a distributed publish-subscribe messaging system. Finally, he demonstrated how producers and consumers interact with Kafka and highlighted some use cases and performance figures from LinkedIn's deployment of Kafka.
Apache Kafka's rise in popularity as a streaming platform has demanded a revisit of its traditional at-least-once message delivery semantics.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients. The main focus will be the specific semantics that Kafka distributed transactions enable and the underlying mechanics which allow them to scale efficiently.
How Apache Kafka is transforming Hadoop, Spark and StormEdureka!
This document provides an overview of Apache Kafka and how it is transforming Hadoop, Spark, and Storm. It begins with explaining why Kafka is needed, then defines what Kafka is and describes its architecture. Key components of Kafka like topics, producers, consumers and brokers are explained. The document also shows how Kafka can be used with Hadoop, Spark, and Storm for stream processing. It lists some companies that use Kafka and concludes by advertising an Edureka course on Apache Kafka.
This document discusses event-driven architectures and messaging. It provides an agenda that includes a case study, retrospective, definitions of event-driven systems and messaging, and an overview of JMS, RabbitMQ and Kafka. Benefits of moving to an event-driven system with RabbitMQ are outlined, including reduced I/O load, improved scalability and distribution. Key criteria for choosing a messaging system include throughput, clustering, topologies, persistence, routing and delivery guarantees. Common messaging patterns like queues, publish-subscribe and request-reply are described. The document concludes with questions.
This document discusses 101 mistakes that FINN.no learned from in running Apache Kafka. It begins with an introduction to Kafka and why FINN.no chose to use it. It then discusses FINN.no's Kafka architecture and usage over time as their implementation grew. The document outlines several common mistakes made including not distinguishing between internal and external data, lack of external schema definition, using a single configuration for all topics, defaulting to 128 partitions, and running Zookeeper on overloaded nodes. Each mistake is explained, potential consequences are given, better solutions are proposed, and what FINN.no has done to address them.
Seattle kafka meetup nov 2015 published siphonNitin Kumar
1) Microsoft uses Kafka extensively across multiple datacenters to ingest over 1 million events per second from services like Bing, Ads and Office.
2) The presentation discusses a Kafka-based streaming solution called Siphon that was developed to reduce the latency of customer-facing reports from 4 hours to under 15 minutes.
3) Siphon uses Kafka as a distributed queue and StreamScope for distributed processing. It includes components like a collector for data ingestion, consumer APIs, and monitoring through techniques like canary tests and an audit trail.
The rise of microservices - containers and orchestrationAndrew Morgan
Organisations are building their applications around microservice architectures because of the flexibility, speed of delivery, and maintainability they deliver. In this session, the concepts behind containers and orchestration will be explained and how to use them with MongoDB.
Presented at the inaugural Kafka summit (2016) hosted by Confluent in San Francisco
Abstract:
Kafka is a backbone for various data pipelines and asynchronous messaging at LinkedIn and beyond. 2015 was an exciting year at LinkedIn in that we hit a new level of scale with Kafka: we now process more than 1 trillion published messages per day across nearly 1300 brokers. We run into some interesting production issues at this scale and I will dive into some of the most critical incidents that we encountered at LinkedIn in the past year:
Data loss: We have extremely stringent SLAs on latency and completeness that were violated on a few occasions. Some of these incidents were due to subtle configuration problems or even missing features.
Offset resets: As of early 2015, Kafka-based offset management was still a relatively new feature and we occasionally hit offset resets. Troubleshooting these incidents turned out to be extremely tricky and resulted in various fixes in offset management/log compaction as well as our monitoring.
Cluster unavailability due to high request/response latencies: Such incidents demonstrate how even subtle performance regressions and monitoring gaps can lead to an eventual cluster meltdown.
Power failures! What happens when an entire data center goes down? We experienced this first hand and it was not so pretty.
and more…
This talk will go over how we detected, investigated and remediated each of these issues and summarize some of the features in Kafka that we are working on that will help eliminate or mitigate such incidents in the future.
This document discusses tuning Kafka for performance. It covers optimizing Zookeeper configurations like using SSDs; using RAID or JBOD for Kafka broker disks with testing showing XFS performs best; scaling Kafka clusters by considering disk capacity, network capacity, and partition counts; configuring topics for retention settings and partition balancing; and tuning Mirror Maker for network locality and producer/consumer settings.
This document provides a summary of Amazon Kinesis and Apache Kafka, two platforms for processing real-time streaming data at large scale. It describes key features of each system such as durability, interfaces, processing options, and deployment. Kinesis is a fully managed cloud service that provides high durability for data across AWS availability zones. Kafka is an open source platform that offers lower latency and more flexibility in how data is processed but requires more operational overhead. The document also includes a deep dive on concepts and internals of the Kafka platform.
Learn everything you need to know to get started building a MongoDB-based app in Java. We'll explore the relationship between MongoDB and various languages on the Java Virtual Machine such as Java, Scala, and Clojure. From there, we'll examine the popular frameworks and integration points between MongoDB and the JVM including Spring Data and object-document mappers like Morphia.
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
In this talk we will walk through how Apache Kafka and Apache Accumulo can be used together to orchestrate a de-coupled, real-time distributed and reactive request/response system at massive scale. Multiple data pipelines can perform complex operations for each message in parallel at high volumes with low latencies. The final result will be inline with the initiating call. The architecture gains are immense. They allow for the requesting system to receive a response without the need for direct integration with the data pipeline(s) that messages must go through. By utilizing Apache Kafka and Apache Accumulo, these gains sustain at scale and allow for complex operations of different messages to be applied to each response in real-time.
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...Andrew Morgan
Organisations are building their applications around microservice architectures because of the flexibility, speed of delivery, and maintainability they deliver.
Want to try out MongoDB on your laptop? Execute a single command and you have a lightweight, self-contained sandbox; another command removes all trace when you're done. Need an identical copy of your application stack in multiple environments? Build your own container image and then your entire development, test, operations, and support teams can launch an identical clone environment.
Containers are revolutionizing the entire software lifecycle: from the earliest technical experiments and proofs of concept through development, test, deployment, and support. Orchestration tools manage how multiple containers are created, upgraded and made highly available. Orchestration also controls how containers are connected to build sophisticated applications from multiple, microservice containers.
This presentation introduces you to technologies such as Docker, Kubernetes & Kafka which are driving the microservices revolution. Learn about containers and orchestration – and most importantly how to exploit them for stateful services such as MongoDB.
I Heart Log: Real-time Data and Apache KafkaJay Kreps
This presentation discusses how logs and stream-processing can form a backbone for data flow, ETL, and real-time data processing. It will describe the challenges and lessons learned as LinkedIn built out its real-time data subscription and processing infrastructure. It will also discuss the role of real-time processing and its relationship to offline processing frameworks such as MapReduce.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaLightbend
Since its stable release in 2016, Akka Streams is quickly becoming the de facto standard integration layer between various Streaming systems and products. Enterprises like PayPal, Intel, Samsung and Norwegian Cruise Lines see this is a game changer in terms of designing Reactive streaming applications by connecting pipelines of back-pressured asynchronous processing stages.
This comes from the Reactive Streams initiative in part, which has been long led by Lightbend and others, allowing multiple streaming libraries to inter-operate between each other in a performant and resilient fashion, providing back-pressure all the way. But perhaps even more so thanks to the various integration drivers that have sprung up in the community and the Akka team—including drivers for Apache Kafka, Apache Cassandra, Streaming HTTP, Websockets and much more.
In this webinar for JVM Architects, Konrad Malawski explores the what and why of Reactive integrations, with examples featuring technologies like Akka Streams, Apache Kafka, and Alpakka, a new community project for building Streaming connectors that seeks to “back-pressurize” traditional Apache Camel endpoints.
* An overview of Reactive Streams and what it will look like in JDK 9, and the Akka Streams API implementation for Java and Scala.
* Introduction to Alpakka, a modern, Reactive version of Apache Camel, and its growing community of Streams connectors (e.g. Akka Streams Kafka, MQTT, AMQP, Streaming HTTP/TCP/FileIO and more).
* How Akka Streams and Akka HTTP work with Websockets, HTTP and TCP, with examples in both in Java and Scala.
An example of a successful proof of conceptETLSolutions
In this presentation we explain how to create a successful proof of concept for software, using a real example from our work in the Oil & Gas industry.
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
Apache Kafka is a distributed streaming platform that allows for building real-time data pipelines and streaming apps. It provides a publish-subscribe messaging system with persistence that allows for building real-time streaming applications. Producers publish data to topics which are divided into partitions. Consumers subscribe to topics and process the streaming data. The system handles scaling and data distribution to allow for high throughput and fault tolerance.
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
This document discusses building a dynamic visualization of large streaming transaction data. It proposes using Apache Kafka to handle the transaction stream, Apache Spark Streaming to process and aggregate the data, MongoDB for intermediate storage, a Node.js server, and Socket.io for real-time updates. Visualization would use Crossfilter, DC.js and D3.js to enable interactive exploration of billions of records in the browser.
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
Keystone - Processing over Half a Trillion events per day with 8 million events & 17 GB per second peaks, and at-least once processing semantics. We will explore in detail how we employ Kafka, Samza, and Docker at scale to implement a multi-tenant pipeline. We will also look at the evolution to its current state and where the pipeline is headed next in offering a self-service stream processing infrastructure atop the Kafka based pipeline and support Spark Streaming.
This document summarizes Instaclustr's lessons learned from building a managed Apache Kafka service. It provides an overview of Kafka and how it works, details Instaclustr's offering and development process, and discusses choices around hardware, security configuration, monitoring, backups and restores. Key topics covered include benchmarking storage types, enabling SSL, managing topics and users, and exposing metrics for monitoring brokers and topics.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
This is the first part of the presentation.
Here is the 2nd part of this presentation:-
http://www.slideshare.net/knoldus/introduction-to-apache-kafka-part-2
http://www.oreilly.com/pub/e/3764
Keystone processes over 700 billion events per day (1 peta byte) with at-least-once processing semantics in the cloud. Monal Daxini details how they used Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. He'll also share plans on offering a Stream Processing as a Service for all of Netflix use.
Instaclustr Kafka Meetup Sydney PresentationBen Slater
This document summarizes lessons learned from building a managed Apache Kafka service. It discusses hardware choices and benchmarking of different configurations, including disk types, encryption, number of topics, and colocating Zookeeper. It also covers topic and user management without direct Zookeeper access, broker security, monitoring approaches, and plans for backups and restore. The service is currently in preview and aims to simplify deployment and management of Kafka in the cloud.
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
Keystone processes over 700 billion events per day (1 peta byte) with at-least once processing semantics in the cloud. We will explore in detail how we leverage Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. We will also share our plans on offering a Stream Processing as a Service for all of Netflix use.
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
Apache Kafka is the most used data streaming broker by companies. It could manage millions of messages easily and it is the base of many architectures based in events, micro-services, orchestration, ... and now cloud environments. OpenShift is the most extended Platform as a Service (PaaS). It is based in Kubernetes and it helps the companies to deploy easily any kind of workload in a cloud environment. Thanks many of its features it is the base for many architectures based in stateless applications to build new Cloud Native Applications. Strimzi is an open source community that implements a set of Kubernetes Operators to help you to manage and deploy Apache Kafka brokers in OpenShift environments.
These slides will introduce you Strimzi as a new component on OpenShift to manage your Apache Kafka clusters.
Slides used at OpenShift Meetup Spain:
- https://www.meetup.com/es-ES/openshift_spain/events/261284764/
Apache Kafka is a distributed streaming platform. It provides a high-throughput distributed messaging system with publish-subscribe capabilities. The document discusses Kafka producers and consumers, Kafka clients in different programming languages, and important configuration settings for Kafka brokers and topics. It also demonstrates sending messages to Kafka topics from a Java producer and consuming messages from the console consumer.
This document discusses end-to-end processing of 3.7 million telemetry events per second using a lambda architecture at Symantec. It provides an overview of Symantec's security data lake infrastructure, the telemetry data processing architecture using Kafka, Storm and HBase, tuning targets for the infrastructure components, and performance benchmarks for Kafka, Storm and Hive.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
This document summarizes Netflix's use of Kafka in their data pipeline. It discusses how Netflix evolved from using S3 and EMR to introducing Kafka and Kafka producers and consumers to handle 400 billion events per day. It covers challenges of scaling Kafka clusters and tuning Kafka clients and brokers. Finally, it outlines Netflix's roadmap which includes contributing to open source projects like Kafka and testing failure resilience.
The document discusses how a company called HBC evolved their architecture from a monolithic application to a microservices architecture with streams. It describes how they introduced Kafka and Kafka Streams to share data between microservices in real-time, avoid common antipatterns, simplify development, and improve resilience and performance. The talk outlines how HBC uses Kafka Streams within their microservices to process streaming data, perform aggregations and joins, enable interactive queries, and power their search functionality.
Stream Processing with Apache Kafka and .NETconfluent
Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesLINE Corporation
Yuto Kawamura
LINE / Z Part Team
At LINE we've been operating Apache Kafka to provide the company-wide shared data pipeline for services using it for storing and distributing data.
Kafka is underlying many of our services in some way, not only the messaging service but also AD, Blockchain, Pay, Timeline, Cryptocurrency trading and more.
Many services feeding many data into our cluster, leading over 250 billion daily messages and 3.5GB incoming bytes in 1 second which is one of the world largest scale.
At the same time, it is required to be stable and performant all the time because many important services uses it as a backend.
In this talk I will introduce the overview of Kafka usage at LINE and how we're operating it.
I'm also going to talk about some engineerings we did for maximizing its performance, solving troubles led particularly by hosting huge data from many services, leveraging advanced techniques like kernel-level dynamic tracing.
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
Developing Realtime Data Pipelines With Apache Kafka. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
4. If data is the lifeblood of high technology,
Apache Kafka is the circulatory system in use at LinkedIn.
-- Todd Palino
Source: https://engineering.linkedin.com/kafka/running-kafka-scale
5. How Kafka Came To Be At LinkedIn
Source: http://www.infoq.com/presentations/kafka-big-data [Neha Narkhede]
6. How Kafka Came To Be At LinkedIn
Source: http://www.infoq.com/presentations/kafka-big-data [Neha Narkhede]
7. A Week @ LinkedIn
630,516,047 msg/days (avg per broker)
7,298 msg/sec (avg per broker)
Source: http://www.confluent.io/kafka-summit-2016-ops-some-kafkaesque-days-in-operations-at-linkedin-in-2015 [Joel Koshy]
9. Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Requirements
○ Very high volume
● Track user activity
○ Page view
○ Searches
○ Actions
● Goals
○ Real time processing
○ Monitoring
○ Load into Hadoop
■ Reporting
10. Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Requirements
○ Very high volumes
● Feed of operational data
○ VMs
○ Apps
11. Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Log file collection on servers
● Similar to Scribe or Flume
○ Equally goof performance
○ Stronger durability
○ Lower end-to-end ltency
12. Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Stage of processing = topic
● Companion framework
○ Storm
○ Spark
○ ...
13. How Other Companies Use Kafka
LinkedIn: activity streams, operational metrics
Yahoo: real time analytics (peak: 20Gbps compressed data), Kafka Manager
Twitter: storm stream processing
Netflix: real time monitoring, event processing pipelines
Spotify: log delivery system
Airbnb: event pipelines
. . .
15. Kafka Controller
One broker take the role of Controller which manages:
● Partition leaders
● State of partitions
● Partition reassignments
● Replicas
28. Why Commit Log ?
● Records what happened and when
● Databases
○ Record changes to data structures (physical or logical)
○ Used for replication
● Distributed Systems
○ Update ordering
○ State machine Replication principle
○ Last log timestamp defines its state
Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
31. Kafka Guaranties
● Messages sent by a producer to a particular topic partition will be appended in the
order they are sent.
● A consumer instance sees messages in the order they are stored in the log.
● For a topic with replication factor N, Kafka will tolerate up to N-1 server failures
without losing any messages committed to a topic.
32. Durability Guarantees
Producer can configure acknowledgements:
● <= 0.8.2: request.required.acks
● >= 0.9.0: acks
Value Impact Durability
0 Producer doesn’t wait for leader weak
1 (default)
Producer waits for leader.
Leader sends ack when message written to log.
No wait for followers.
medium
all (0.9.0)
-1 (0.8.2)
Producer waits for leader.
Leader sends ack when all In-Sync-Replica have
acknowledged.
strong
33. Consumer Offset Management
● < 0.8.2: Zookeeper only
○ Zookeeper not meant for heavy write => scalability issues
● >= 0.8.2: Kafka Topic (__consumer_offset)
○ Configurable: offsets.storage=kafka
● Documentation show how to migrate offsets from Zookeeper to Kafka
http://kafka.apache.org/082/documentation.html#offsetmigration
34. Data Retention
3 ways to configure it:
● Time based
● Size based
● Log compaction based
Broker Configuration
log.retention.bytes={ -1|...}
log.retention.{ms,minutes,hours}=...
log.retention.check.interval.ms=...
log.cleanup.policy={delete|compact}
log.cleaner.enable={ false|true}
log.cleaner.threads=1
log.cleaner.io.max.bytes.per.second=Double.MaxValue
log.cleaner.backoff.ms=15000
log.cleaner.delete.retention.ms=1d
Topic Configuration
cleanup.policy=...
delete.retention.ms=...
...
Reconfiguring a Topic at Runtime
kafka-topics.sh --zookeeper localhost:2181
--alter --topic my-topic
--config max.message.
bytes=128000
kafka-topics.sh --zookeeper localhost:2181
--alter --topic my-topic
--deleteConfig max.message.bytes
39. Kafka Performance - Theory
● Efficient Storage
○ Fast sequential write and read
○ Leverages OS page cache (i.e. RAM)
○ Avoid storing data twice in JVM and in OS cache (better perf on startup)
○ Caches 28-30GB data in 32GB machine
○ Zero copy I/O using IBM’s sendfile API (https://www.ibm.com/developerworks/library/j-zerocopy/)
● Batching of messages + compression
● Broker doesn’t hold client state
● Dependent on persistence guaranties request.required.acks
40. Kafka Benchmark (1/5)
● 0.8.1
● Setup
○ 6 Machines
■ Intel Xeon 2.5 GHz processor with six cores
■ Six 7200 RPM SATA drives (822 MB/sec of linear disk I/O)
■ 32GB of RAM
■ 1Gb Ethernet
○ 3 nodes for brokers + 3 for ZK and clients
Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
45. Kafka Demo 2 - Java High Level API
● Start a multi broker cluster + replicated topic
○ Write Java Partitioned Producer
○ Write Multi-Threaded Consumer
● Unit testing with Kafka
46. Kafka Versions
0.8.2 (2014.12)
● New Producer API
● Delete topic
● Scalable offset writes
0.9.x (2015.10)
● Security
○ Encryption
○ Kerberos
○ ACLs
● Quotas (client rate control)
● Kafka Connect
0.10.x (2016.03)
● Kafka Streams
● Rack Awareness
● More SASL features
● Timestamp in messages
● API to better manage Connectors
47. Starting with Kafka: Jay Kreps’ Recommendations
● Start with a single cluster
● Only a few non critical, limited usecases
● Pick a single data format for a given organisation
○ Avro
■ Good language support
■ One schema per topic (message validation, documentation...)
■ Supports schema evolution
■ Data embeds schema
■ Make Data Scientists job easier
■ Put some thoughts into field naming conventions
Source: http://www.confluent.io/blog/stream-data-platform-2/
48. Conclusions
● Easy to start with for a PoC
● Maybe not so easy to build a production system from scratch
● Must have serious monitoring in place (see Yahoo, Confluent, DataDog)
● Vibrant community, fast pace technology
● Videos of Kafka Summit are online: http://kafka-summit.org/sessions/
https://github.com/samuel-kerrien/kafka-demo