Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
In this presentation Guido Schmutz talks about Apache Kafka, Kafka Core, Kafka Connect, Kafka Streams, Kafka and "Big Data"/"Fast Data Ecosystems, Confluent Data Platform and Kafka in Architecture.
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
These are the slides of my Kafka talk at Apache: Big Data Europe in Budapest, Hungary. Enjoy! --Michael
Apache Kafka is a high-throughput distributed messaging system that has become a mission-critical infrastructure component for modern data platforms. Kafka is used across a wide range of industries by thousands of companies such as Twitter, Netflix, Cisco, PayPal, and many others.
After a brief introduction to Kafka this talk will provide an update on the growth and status of the Kafka project community. Rest of the talk will focus on walking the audience through what's required to put Kafka in production. We’ll give an overview of the current ecosystem of Kafka, including: client libraries for creating your own apps; operational tools; peripheral components required for running Kafka in production and for integration with other systems like Hadoop. We will cover the upcoming project roadmap, which adds key features to make Kafka even more convenient to use and more robust in production.
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Data Con LA
Abstract:- Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform for building real-time streaming data pipelines and streaming data applications without the need for other tools/clusters for data ingestion, storage and stream processing. In this talk you will learn more about: A quick introduction to Kafka Core, Kafka Connect and Kafka Streams through code examples, key concepts and key features. A reference architecture for building such Kafka-based streaming data applications. A demo of an end-to-end Kafka-based streaming data application.
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform for building real-time streaming data pipelines and streaming data applications without the need for other tools/clusters for data ingestion, storage and stream processing.
In this talk you will learn more about:
1. A quick introduction to Kafka Core, Kafka Connect and Kafka Streams: What is and why?
2. Code and step-by-step instructions to build an end-to-end streaming data application using Apache Kafka
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Kafka, Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform (Kafka Core + Kafka Connect + Kafka Streams) for building streaming data pipelines and streaming data applications.
This talk, that I gave at the Chicago Java Users Group (CJUG) on June 8th 2017, is mainly focusing on Kafka Streams, a lightweight open source Java library for building stream processing applications on top of Kafka using Kafka topics as input/output.
You will learn more about the following:
1. Apache Kafka: a Streaming Data Platform
2. Overview of Kafka Streams: Before Kafka Streams? What is Kafka Streams? Why Kafka Streams? What are Kafka Streams key concepts? Kafka Streams APIs and code examples?
3. Writing, deploying and running your first Kafka Streams application
4. Code and Demo of an end-to-end Kafka-based Streaming Data Application
5. Where to go from here?
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
High level introduction to Confluent REST Proxy and Schema Registry (leveraging Apache Avro under the hood), two components of the Apache Kafka open source ecosystem. See the concepts, architecture and features.
World of Tanks Experience of Using KafkaLevon Avakyan
In this paper I speak about BigWorld technology, WoT server, Apache Kafka and how we started to use it together. What difficulties we had and how we had solved them.
Making Apache Kafka Even Faster And More ScalablePaulBrebner2
Introduction to the 6th Community over Code Performance Engineering track and my talk on Apache Kafka Performance changes resulting from architectural changes including KRaft and the introduction of Kafka Tiered Storage.
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration StoryJoan Viladrosa Riera
This document provides an overview of Apache Kafka and Apache Spark Streaming and their integration. It discusses what Kafka and Spark Streaming are, how they work, their benefits, and semantics when used together. It also provides examples of code for using the new Kafka integration in Spark 2.0+, including getting metadata, storing offsets in Kafka, and achieving at-most-once, at-least-once, and exactly-once processing semantics. Finally, it shares some insights into how Billy Mobile uses Spark Streaming with Kafka to process large volumes of data.
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Big Data Spain
This document provides an overview of Apache Kafka and Spark Streaming and their integration. It discusses:
- What Apache Kafka is and how it works as a publish-subscribe messaging system with topics, partitions, producers, and consumers.
- What Apache Spark Streaming is and how it provides streaming data processing using micro-batching and leveraging Spark's APIs and engine.
- The evolution of the integration between Kafka and Spark Streaming, from using receivers to the direct approach without receivers in Spark 1.3+.
- Details on how to use the new direct Kafka integration in Spark 2.0+ including location strategies, consumer strategies, and committing offsets directly to Kafka.
- Considerations around at-least
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
Kafka is a scalable, distributed publish subscribe messaging system that's used as a data transmission backbone in many data intensive digital businesses. Couchbase Server is a scalable, flexible document database that's fast, agile, and elastic. Because they both appeal to the same type of customers, Couchbase and Kafka are often used together.
This presentation from a meetup in Mountain View describes Kafka's design and why people use it, Couchbase Server and its uses, and the use cases for both together. Also covered is a description and demo of Couchbase Server writing documents to a Kafka topic and consuming messages from a Kafka topic. using the Couchbase Kafka Connector.
Apache Kafka is a high-throughput distributed messaging system that can be used for building real-time data pipelines and streaming apps. It provides a publish-subscribe messaging model and is designed as a distributed commit log. Kafka allows for both push and pull models where producers push data and consumers pull data from topics which are divided into partitions to allow for parallelism.
Distributed & Highly Available server applications in Java and ScalaMax Alexejev
This document summarizes a presentation about distributed and highly available server applications in Java and Scala. It discusses the Talkbits architecture, which uses lightweight SOA principles with stateless edge services and specialized systems to manage state. The presentation describes using the Finagle library as a distributed RPC framework with Apache Zookeeper for service discovery. It also covers configuration, deployment, monitoring and logging of services using tools like SLF4J, Logback, CodaHale metrics, Jolokia, Fabric, and Datadog.
Apache Kafka is a distributed publish-subscribe messaging system that was originally created by LinkedIn and contributed to the Apache Software Foundation. It is written in Scala and provides a multi-language API to publish and consume streams of records. Kafka is useful for both log aggregation and real-time messaging due to its high performance, scalability, and ability to serve as both a distributed messaging system and log storage system with a single unified architecture. To use Kafka, one runs Zookeeper for coordination, Kafka brokers to form a cluster, and then publishes and consumes messages with a producer API and consumer API.
Apache Kafka 0.8 basic training - VerisignMichael Noll
Apache Kafka 0.8 basic training (120 slides) covering:
1. Introducing Kafka: history, Kafka at LinkedIn, Kafka adoption in the industry, why Kafka
2. Kafka core concepts: topics, partitions, replicas, producers, consumers, brokers
3. Operating Kafka: architecture, hardware specs, deploying, monitoring, P&S tuning
4. Developing Kafka apps: writing to Kafka, reading from Kafka, testing, serialization, compression, example apps
5. Playing with Kafka using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/
Many thanks to the LinkedIn Engineering team (the creators of Kafka) and the Apache Kafka open source community!
This document provides information on connecting Apache Kafka with Mule ESB. It discusses the key components of Apache Kafka including topics, producers, consumers, partitions, brokers and clusters. It also outlines some common use cases for Apache Kafka like messaging, website activity tracking, metrics, and log aggregation. The document then provides step-by-step instructions on setting up Zookeeper and Apache Kafka on Windows Server. It demonstrates how to create topics, install the Anypoint Kafka connector in Mule, and build Mule flows to integrate Apache Kafka as a producer and consumer. Code examples are provided for the Mule flows.
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
Slides for our solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.
Apache Kafka is a distributed streaming platform. It provides a high-throughput distributed messaging system with publish-subscribe capabilities. The document discusses Kafka producers and consumers, Kafka clients in different programming languages, and important configuration settings for Kafka brokers and topics. It also demonstrates sending messages to Kafka topics from a Java producer and consuming messages from the console consumer.
This document provides an introduction to Apache Kafka, an open-source distributed event streaming platform. It discusses Kafka's history as a project originally developed by LinkedIn, its use cases like messaging, activity tracking and stream processing. It describes key Kafka concepts like topics, partitions, offsets, replicas, brokers and producers/consumers. It also gives examples of how companies like Netflix, Uber and LinkedIn use Kafka in their applications and provides a comparison to Apache Spark.
The document introduces Apache Kafka's Streams API for stream processing. Some key points covered include:
- The Streams API allows building stream processing applications without needing a separate cluster, providing an elastic, scalable, and fault-tolerant processing engine.
- It integrates with existing Kafka deployments and supports both stateful and stateless computations on data in Kafka topics.
- Applications built with the Streams API are standard Java applications that run on client machines and leverage Kafka for distributed, parallel processing and fault tolerance via state stores in Kafka.
In this session, Neil Avery covers the planning and operation of your KSQL deployment, including under-the-hood architectural details. You will learn about the various deployment models, how to track and monitor your KSQL applications, how to scale in and out and how to think about capacity planning. This is part 3 out of 3 in the Empowering Streams through KSQL series.
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
High level introduction to Confluent REST Proxy and Schema Registry (leveraging Apache Avro under the hood), two components of the Apache Kafka open source ecosystem. See the concepts, architecture and features.
World of Tanks Experience of Using KafkaLevon Avakyan
In this paper I speak about BigWorld technology, WoT server, Apache Kafka and how we started to use it together. What difficulties we had and how we had solved them.
Making Apache Kafka Even Faster And More ScalablePaulBrebner2
Introduction to the 6th Community over Code Performance Engineering track and my talk on Apache Kafka Performance changes resulting from architectural changes including KRaft and the introduction of Kafka Tiered Storage.
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration StoryJoan Viladrosa Riera
This document provides an overview of Apache Kafka and Apache Spark Streaming and their integration. It discusses what Kafka and Spark Streaming are, how they work, their benefits, and semantics when used together. It also provides examples of code for using the new Kafka integration in Spark 2.0+, including getting metadata, storing offsets in Kafka, and achieving at-most-once, at-least-once, and exactly-once processing semantics. Finally, it shares some insights into how Billy Mobile uses Spark Streaming with Kafka to process large volumes of data.
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Big Data Spain
This document provides an overview of Apache Kafka and Spark Streaming and their integration. It discusses:
- What Apache Kafka is and how it works as a publish-subscribe messaging system with topics, partitions, producers, and consumers.
- What Apache Spark Streaming is and how it provides streaming data processing using micro-batching and leveraging Spark's APIs and engine.
- The evolution of the integration between Kafka and Spark Streaming, from using receivers to the direct approach without receivers in Spark 1.3+.
- Details on how to use the new direct Kafka integration in Spark 2.0+ including location strategies, consumer strategies, and committing offsets directly to Kafka.
- Considerations around at-least
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
Kafka is a scalable, distributed publish subscribe messaging system that's used as a data transmission backbone in many data intensive digital businesses. Couchbase Server is a scalable, flexible document database that's fast, agile, and elastic. Because they both appeal to the same type of customers, Couchbase and Kafka are often used together.
This presentation from a meetup in Mountain View describes Kafka's design and why people use it, Couchbase Server and its uses, and the use cases for both together. Also covered is a description and demo of Couchbase Server writing documents to a Kafka topic and consuming messages from a Kafka topic. using the Couchbase Kafka Connector.
Apache Kafka is a high-throughput distributed messaging system that can be used for building real-time data pipelines and streaming apps. It provides a publish-subscribe messaging model and is designed as a distributed commit log. Kafka allows for both push and pull models where producers push data and consumers pull data from topics which are divided into partitions to allow for parallelism.
Distributed & Highly Available server applications in Java and ScalaMax Alexejev
This document summarizes a presentation about distributed and highly available server applications in Java and Scala. It discusses the Talkbits architecture, which uses lightweight SOA principles with stateless edge services and specialized systems to manage state. The presentation describes using the Finagle library as a distributed RPC framework with Apache Zookeeper for service discovery. It also covers configuration, deployment, monitoring and logging of services using tools like SLF4J, Logback, CodaHale metrics, Jolokia, Fabric, and Datadog.
Apache Kafka is a distributed publish-subscribe messaging system that was originally created by LinkedIn and contributed to the Apache Software Foundation. It is written in Scala and provides a multi-language API to publish and consume streams of records. Kafka is useful for both log aggregation and real-time messaging due to its high performance, scalability, and ability to serve as both a distributed messaging system and log storage system with a single unified architecture. To use Kafka, one runs Zookeeper for coordination, Kafka brokers to form a cluster, and then publishes and consumes messages with a producer API and consumer API.
Apache Kafka 0.8 basic training - VerisignMichael Noll
Apache Kafka 0.8 basic training (120 slides) covering:
1. Introducing Kafka: history, Kafka at LinkedIn, Kafka adoption in the industry, why Kafka
2. Kafka core concepts: topics, partitions, replicas, producers, consumers, brokers
3. Operating Kafka: architecture, hardware specs, deploying, monitoring, P&S tuning
4. Developing Kafka apps: writing to Kafka, reading from Kafka, testing, serialization, compression, example apps
5. Playing with Kafka using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/
Many thanks to the LinkedIn Engineering team (the creators of Kafka) and the Apache Kafka open source community!
This document provides information on connecting Apache Kafka with Mule ESB. It discusses the key components of Apache Kafka including topics, producers, consumers, partitions, brokers and clusters. It also outlines some common use cases for Apache Kafka like messaging, website activity tracking, metrics, and log aggregation. The document then provides step-by-step instructions on setting up Zookeeper and Apache Kafka on Windows Server. It demonstrates how to create topics, install the Anypoint Kafka connector in Mule, and build Mule flows to integrate Apache Kafka as a producer and consumer. Code examples are provided for the Mule flows.
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
Slides for our solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.
Apache Kafka is a distributed streaming platform. It provides a high-throughput distributed messaging system with publish-subscribe capabilities. The document discusses Kafka producers and consumers, Kafka clients in different programming languages, and important configuration settings for Kafka brokers and topics. It also demonstrates sending messages to Kafka topics from a Java producer and consuming messages from the console consumer.
This document provides an introduction to Apache Kafka, an open-source distributed event streaming platform. It discusses Kafka's history as a project originally developed by LinkedIn, its use cases like messaging, activity tracking and stream processing. It describes key Kafka concepts like topics, partitions, offsets, replicas, brokers and producers/consumers. It also gives examples of how companies like Netflix, Uber and LinkedIn use Kafka in their applications and provides a comparison to Apache Spark.
The document introduces Apache Kafka's Streams API for stream processing. Some key points covered include:
- The Streams API allows building stream processing applications without needing a separate cluster, providing an elastic, scalable, and fault-tolerant processing engine.
- It integrates with existing Kafka deployments and supports both stateful and stateless computations on data in Kafka topics.
- Applications built with the Streams API are standard Java applications that run on client machines and leverage Kafka for distributed, parallel processing and fault tolerance via state stores in Kafka.
In this session, Neil Avery covers the planning and operation of your KSQL deployment, including under-the-hood architectural details. You will learn about the various deployment models, how to track and monitor your KSQL applications, how to scale in and out and how to think about capacity planning. This is part 3 out of 3 in the Empowering Streams through KSQL series.
Odoo Inventory Rules and Routes v17 - Odoo SlidesCeline George
Odoo's inventory management system is highly flexible and powerful, allowing businesses to efficiently manage their stock operations through the use of Rules and Routes.
Multi-currency in odoo accounting and Update exchange rates automatically in ...Celine George
Most business transactions use the currencies of several countries for financial operations. For global transactions, multi-currency management is essential for enabling international trade.
The *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview:
Structure
1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information.
2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing.
3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles.
Functions
1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell.
2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding.
3. *Behavioral responThe *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview:
Structure
1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information.
2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing.
3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles.
Functions
1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell.
2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding.
3. *Behavioral responses*: Insects can exhibit complex behaviors, such as mating, foraging, and social interactions.
Characteristics
1. *Decentralized*: Insect nervous systems have some autonomy in different body parts.
2. *Specialized*: Different parts of the nervous system are specialized for specific functions.
3. *Efficient*: Insect nervous systems are highly efficient, allowing for rapid processing and response to stimuli.
The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive in diverse environments.
The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive
Exploring Substances:
Acidic, Basic, and
Neutral
Welcome to the fascinating world of acids and bases! Join siblings Ashwin and
Keerthi as they explore the colorful world of substances at their school's
National Science Day fair. Their adventure begins with a mysterious white paper
that reveals hidden messages when sprayed with a special liquid.
In this presentation, we'll discover how different substances can be classified as
acidic, basic, or neutral. We'll explore natural indicators like litmus, red rose
extract, and turmeric that help us identify these substances through color
changes. We'll also learn about neutralization reactions and their applications in
our daily lives.
by sandeep swamy
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsDrNidhiAgarwal
Unemployment is a major social problem, by which not only rural population have suffered but also urban population are suffered while they are literate having good qualification.The evil consequences like poverty, frustration, revolution
result in crimes and social disorganization. Therefore, it is
necessary that all efforts be made to have maximum.
employment facilities. The Government of India has already
announced that the question of payment of unemployment
allowance cannot be considered in India
The ever evoilving world of science /7th class science curiosity /samyans aca...Sandeep Swamy
The Ever-Evolving World of
Science
Welcome to Grade 7 Science4not just a textbook with facts, but an invitation to
question, experiment, and explore the beautiful world we live in. From tiny cells
inside a leaf to the movement of celestial bodies, from household materials to
underground water flows, this journey will challenge your thinking and expand
your knowledge.
Notice something special about this book? The page numbers follow the playful
flight of a butterfly and a soaring paper plane! Just as these objects take flight,
learning soars when curiosity leads the way. Simple observations, like paper
planes, have inspired scientific explorations throughout history.
INTRO TO STATISTICS
INTRO TO SPSS INTERFACE
CLEANING MULTIPLE CHOICE RESPONSE DATA WITH EXCEL
ANALYZING MULTIPLE CHOICE RESPONSE DATA
INTERPRETATION
Q & A SESSION
PRACTICAL HANDS-ON ACTIVITY
1. 1
What’s new in Kafka 0.10.0
Introducing Kafka Streams
Eno Thereska
[email protected]
Kafka Meetup, July 21, 2016
Slide contributions: Michael Noll and Ismael
enotheres
ka
2. 2
What’s new in Kafka 0.10.0
1. Lots of new KIPs in
1. KIP-4 metadata
2. KIP-31 Relative offsets in compressed message sets
3. KIP-32 Add timestamps to Kafka message
4. KIP-35 Retrieving protocol version
5. KIP-36 Rack aware replica assignment
6. KIP-41 KafkaConsumer Max Records
7. KIP-42: Add Producer and Consumer Interceptors
8. KIP-45 Standardize all client sequence interaction
9. KIP-43 Kafka SASL enhancements
10. KIP-57 - Interoperable LZ4 Framing
11. KIP-51 - List Connectors REST API
12. KIP-52: Connector Control APIs
13. KIP-56: Allow cross origin HTTP requests on all HTTP methods
2. Kafka Streams
3. 3
Kafka Streams
• Powerful yet easy-to use Java library
• Part of open source Apache Kafka, introduced in v0.10, May 2016
• Source code: https://github.com/apache/kafka/tree/trunk/streams
• Build your own stream processing applications that are
• highly scalable
• fault-tolerant
• distributed
• stateful
• able to handle late-arriving, out-of-order data
5. 5
When to use Kafka Streams (as of Kafka 0.10)
Recommended use cases
• Application Development
• “Fast Data” apps (small or big
data)
• Reactive and stateful
applications
• Linear streams
• Event-driven systems
• Continuous transformations
• Continuous queries
• Microservices
Questionable use cases
• Data Science / Data
Engineering
• “Heavy lifting”
• Data mining
• Non-linear, branching streams
(graphs)
• Machine learning, number
crunching
• What you’d do in a data
warehouse
6. 6
Alright, can you show me some code now?
KStream<Integer, Integer> input = builder.stream(“numbers-topic”);
// Stateless computation
KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2);
// Stateful computation
KTable<Integer, Integer> sumOfOdds = input
.filter((k,v) -> v % 2 != 0)
.selectKey((k, v) -> 1)
.reduceByKey((v1, v2) -> v1 + v2, ”sum-of-odds");
• API option 1: Kafka Streams DSL (declarative)
7. 7
Alright, can you show me some code now?
Startup
Process a record
Periodic action
Shutdown
• API option 2: low-level Processor API (imperative)
8. 8
How do I install Kafka Streams?
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.0.0</version>
</dependency>
• There is and there should be no “install”.
• It’s a library. Add it to your app like any other library.
9. 9
Do I need to install a CLUSTER to run my apps?
• No, you don’t. Kafka Streams allows you to stay lean and lightweight.
• Unlearn bad habits: “do cool stuff with data != must have cluster”
Ok. Ok. Ok. Ok.
10. 10
How do I package and deploy my apps? How do I …?
11. 11
How do I package and deploy my apps? How do I …?
• Whatever works for you. Stick to what you/your company think is the
best way.
• Why? Because an app that uses Kafka Streams is…a normal Java app.
• Your Ops/SRE/InfoSec teams may finally start to love not hate you.
19. 19
Streams meet Tables
A stream is a changelog of a table
A table is a materialized view at time of a stream
20. 20
Streams meet Tables – in the Kafka Streams DSL
alice 2 bob 10 alice 3
time
“Alice clicked 2 times.”
“Alice clicked 2 times.”
time
“Alice clicked 2+3 = 5
times.”
“Alice clicked 2 3 times.”
KTable
= interprets data as changelog stream
~ is a continuously updated materialized view
KStream
= interprets data as record stream
21. 21
Streams meet Tables – in the Kafka Streams DSL
• JOIN example: compute user clicks by region via
KStream.leftJoin(KTable)
22. 22
Streams meet Tables – in the Kafka Streams DSL
• JOIN example: compute user clicks by region via
KStream.leftJoin(KTable)
23. 23
Streams meet Tables – in the Kafka Streams DSL
• JOIN example: compute user clicks by region via
KStream.leftJoin(KTable)
alice 13 bob 5
Input KStream
alice
(europe,
13)
bob (europe, 5)
leftJoin()
w/ KTable
KStream
13
europe 5
europe
map() KStream
KTable
reduceByKey(_ + _) 13
europe
…
…
…
…
18
europe
…
…
…
…
26. 26
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
27. 27
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations (e.g. joins, aggregations)
32. 32
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations
• Time model
35. 35
Time
• You configure the desired time semantics through timestamp extractors
• Default extractor yields event-time semantics
• Extracts embedded timestamps of Kafka messages (introduced in v0.10)
36. 36
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations
• Time model
• Windowing
37. 37
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations
• Time model
• Windowing
• Supports late-arriving and out-of-order data
• Millisecond processing latency, no micro-batching
• At-least-once processing guarantees (exactly-once is in the works)
38. 38
Where to go from here?
• Kafka Streams is available in Apache Kafka 0.10 and Confluent Platform
3.0
• http://kafka.apache.org/
• http://www.confluent.io/download (free + enterprise versions,
tar/zip/deb/rpm)
• Kafka Streams demos at https://github.com/confluentinc/examples
• Java 7, Java 8+ with lambdas, and Scala
• WordCount, Joins, Avro integration, Top-N computation, Windowing, …
• Apache Kafka documentation:
http://kafka.apache.org/documentation.html
• Confluent documentation: http://docs.confluent.io/3.0.0/streams/
• Quickstart, Concepts, Architecture, Developer Guide, FAQ
• Join our bi-weekly Ask Me Anything sessions on Kafka Streams
• Contact me at [email protected] for details
39. 39
Some of the things to come
• Exactly-once semantics
• Queriable state – tap into the state of your applications (KIP-67: adopted)
• SQL interface
• Listen to and collaborate with the developer community
• Your feedback counts a lot! Share it via [email protected]
40. 40
Want to contribute to Kafka and open source?
Join the Kafka community
http://kafka.apache.org/
Questions, comments? Tweet with #bbuzz and /cc to @ConfluentInc
…in a great team with the creators of Kafka?
Confluent is hiring
http://confluent.io/
43. 43
KIP-4 Metadata
- Update MetadataRequest and MetadataResponse
- Expose new fields for KIP-4 - not used yet
- Make it possible to ask for cluster information with no topics
- Fix nasty bug where request would be repeatedly sent if producer
was started and unused for more than 5 minutes
- KAFKA-3602
44. 44
KIP-31 Relative offsets in compressed message
sets
- Message format change (affects FetchRequest, ProduceRequest
and on-disk format)
- Avoids recompression to assign offsets
- Improves broker latency
- Should also improve throughput, but can affect producer batch
sizes so can reduce throughput in some cases, tune linger.ms and
batch.size
45. 45
KIP-32 Add timestamps to Kafka message
- CreateTime or LogAppendTime
- Increases message size by 8 bytes
- Small throughput degradation, particularly for small messages
- Careful not to go over network limit due to this increase
46. 46
Migration from V1 to V2 format
- Read the upgrade notes
- 0.10 Producer produces in new format
- 0.10 broker can store in old or new format depending on config
- 0.10 consumers can use either format
- 0.9 consumers only support old format
- Broker can do conversion on the fly (with performance impact)
47. 47
KIP-35 Retrieving protocol version
- Request type that returns all the requests and versions supported
by the broker
- Aim is for clients to use this to help them support multiple broker
versions
- Not used by Java client yet
- Used by librdkafka and kafka-python
48. 48
KIP-36 Rack aware replica assignment
- Kafka can now run with a rack awareness feature that isolates
replicas so they are guaranteed to span multiple racks or
availability zones. This allows all of Kafka’s durability guarantees to
be applied to these larger architectural units, significantly
increasing availability
- Old clients must be upgraded to 0.9.0.1 before going to 0.10.0.0
- broker.rack in server.properties
- Can be disabled when launching reassignment tool
49. 49
New consumer enhancements
- KIP-41 KafkaConsumer Max Records
- KIP-42: Add Producer and Consumer Interceptors
- KIP-45 Standardize all client sequence interaction on j.u.Collection.
50. 50
KIP-43 Kafka SASL enhancements
- Multiple SASL mechanisms: PLAIN and Kerberos included
- Pluggable
- Added support for protocol evolution
51. 51
KIP-57 - Interoperable LZ4 Framing
It was broken, fixed in 0.10, took advantage of message format bump
52. 52
Connect KIPs
KIP-51 - List Connectors REST API
KIP-52: Connector Control APIs
KIP-56: Allow cross origin HTTP requests on all HTTP methods
53. 53
Lots of bugs fixed
Producer ordering, SocketServer leaks, New Consumer, Offset
handling in the broker
http://mirrors.muzzy.org.uk/apache/kafka/0.10.0.0/RELEASE_NOTES
.html
Editor's Notes
#5: Basic workflow is:
Get the input data into Kafka, e.g. via Kafka Connect (part of Apache Kafka) or via your own applications that write data to Kafka.
Process the data with Kafka Streams, and write the results back to Kafka.
#8: FYI: Some people have begun using the low-level Processor API to port their Apache Storm code to Kafka Streams.
#12: In some sense, Kafka Streams exploits economies of scale. Projects like Mesos or Kubernetes, for example, are totally focused on resource management and scheduling, and they’ll always do a better job here than a tool that’s focused on stream processing. Kafka Streams should rather allow for “composition” with these deployment tools and resources managers (think: Unix philosophy) rather than being strongly opinionated and dictating any such choices upon you.
#17: Stream: ordered, re-playable, fault-tolerant sequence of immutable data records
Data records: key-value pairs (closely related to Kafka’s key-value messages)
#18: Processor topology: computational logic of an app’s data processing
Defined via the Kafka Streams DSL or the low-level Processor API
Stream processor: a node in the topology, represents a processing step
As a user, you will only be exposed to these nuts n’ bolts if you use the Processor API. The Kafka Streams DSL hides this from you.
#19: Stream partitions and stream tasks are the logical units of parallelism
Stream partition: totally ordered sequence of data records; maps to a Kafka topic
A data record in the stream maps to a Kafka message from that topic
The keys of data records determine the partitioning of data in both Kafka and Kafka Streams, i.e. how data is routed to specific partitions
A processor topology is scaled by breaking it into multiple stream tasks, based on number of input stream partitions
Stream tasks work in isolation, i.e. independent from each other
Each Stream task has its own local, fault-tolerant state
#21: KStream ~ records are interpreted as INSERTs (since no record replaces any existing record)
KTable – records are interpreted as UPDATEs (since any existing row with the same key is overwritten)
Note: We ignore Bob in this diagram. Bob is only shown to highlight that there is generally more data than just Alice’s.
#25: To summarize: some operations such as map() retain the shape (e.g. a stream will stay a stream), some operations change the shape (e.g. a stream will become a table).
#32: Mention DSL abstracts stores away, but low level API provides direct access
#34: A proper notion and model of time is crucial for stream processing