Kafka Explainaton

1
What’s new in Kafka 0.10.0
Introducing Kafka Streams
Eno Thereska
eno@confluent.io
Kafka Meetup, July 21, 2016
Slide contributions: Michael Noll and Ismael
enotheres
ka

2
What’s new in Kafka 0.10.0
1. Lots of new KIPs in
1. KIP-4 metadata
2. KIP-31 Relative offsets in compressed message sets
3. KIP-32 Add timestamps to Kafka message
4. KIP-35 Retrieving protocol version
5. KIP-36 Rack aware replica assignment
6. KIP-41 KafkaConsumer Max Records
7. KIP-42: Add Producer and Consumer Interceptors
8. KIP-45 Standardize all client sequence interaction
9. KIP-43 Kafka SASL enhancements
10. KIP-57 - Interoperable LZ4 Framing
11. KIP-51 - List Connectors REST API
12. KIP-52: Connector Control APIs
13. KIP-56: Allow cross origin HTTP requests on all HTTP methods
2. Kafka Streams

3
Kafka Streams
• Powerful yet easy-to use Java library
• Part of open source Apache Kafka, introduced in v0.10, May 2016
• Source code: https://github.com/apache/kafka/tree/trunk/streams
• Build your own stream processing applications that are
• highly scalable
• fault-tolerant
• distributed
• stateful
• able to handle late-arriving, out-of-order data

5
When to use Kafka Streams (as of Kafka 0.10)
Recommended use cases
• Application Development
• “Fast Data” apps (small or big
data)
• Reactive and stateful
applications
• Linear streams
• Event-driven systems
• Continuous transformations
• Continuous queries
• Microservices
Questionable use cases
• Data Science / Data
Engineering
• “Heavy lifting”
• Data mining
• Non-linear, branching streams
(graphs)
• Machine learning, number
crunching
• What you’d do in a data
warehouse

6
Alright, can you show me some code now? 
KStream<Integer, Integer> input = builder.stream(“numbers-topic”);
// Stateless computation
KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2);
// Stateful computation
KTable<Integer, Integer> sumOfOdds = input
.filter((k,v) -> v % 2 != 0)
.selectKey((k, v) -> 1)
.reduceByKey((v1, v2) -> v1 + v2, ”sum-of-odds");
• API option 1: Kafka Streams DSL (declarative)

7
Alright, can you show me some code now? 
Startup
Process a record
Periodic action
Shutdown
• API option 2: low-level Processor API (imperative)

8
How do I install Kafka Streams?
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.0.0</version>
</dependency>
• There is and there should be no “install”.
• It’s a library. Add it to your app like any other library.

9
Do I need to install a CLUSTER to run my apps?
• No, you don’t. Kafka Streams allows you to stay lean and lightweight.
• Unlearn bad habits: “do cool stuff with data != must have cluster”
Ok. Ok. Ok. Ok.

10
How do I package and deploy my apps? How do I …?

11
How do I package and deploy my apps? How do I …?
• Whatever works for you. Stick to what you/your company think is the
best way.
• Why? Because an app that uses Kafka Streams is…a normal Java app.
• Your Ops/SRE/InfoSec teams may finally start to love not hate you.

16
Stream: ordered, re-playable, fault-tolerant sequence of immutable data
records

17
Processor topology: computational logic of an app’s data
processing

18
Stream partitions and stream tasks: units of parallelism

19
Streams meet Tables
A stream is a changelog of a table
A table is a materialized view at time of a stream

20
Streams meet Tables – in the Kafka Streams DSL
alice 2 bob 10 alice 3
time
“Alice clicked 2 times.”
“Alice clicked 2 times.”
time
“Alice clicked 2+3 = 5
times.”
“Alice clicked 2 3 times.”
KTable
= interprets data as changelog stream
~ is a continuously updated materialized view
KStream
= interprets data as record stream

21
• JOIN example: compute user clicks by region via
KStream.leftJoin(KTable)

22

23
alice 13 bob 5
Input KStream
alice
(europe,
13)
bob (europe, 5)
leftJoin()
w/ KTable
KStream
13
europe 5
europe
map() KStream
KTable
reduceByKey(_ + _) 13
europe
…
…
…
…
18
europe
…
…
…
…

24

26
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic

27
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations (e.g. joins, aggregations)

28
Fault tolerance
State stores

29
Fault tolerance
State stores
charlie 3
bob 1
alice 1
alice 2

30
Fault tolerance
State stores

31
Fault tolerance
State stores alice 1
alice 2

32
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations
• Time model

35
Time
• You configure the desired time semantics through timestamp extractors
• Default extractor yields event-time semantics
• Extracts embedded timestamps of Kafka messages (introduced in v0.10)

36
• Highly scalable
• Fault-tolerant
• Elastic
• Time model
• Windowing

37
• Highly scalable
• Fault-tolerant
• Elastic
• Time model
• Windowing
• Supports late-arriving and out-of-order data
• Millisecond processing latency, no micro-batching
• At-least-once processing guarantees (exactly-once is in the works)

38
Where to go from here?
• Kafka Streams is available in Apache Kafka 0.10 and Confluent Platform
3.0
• http://kafka.apache.org/
• http://www.confluent.io/download (free + enterprise versions,
tar/zip/deb/rpm)
• Kafka Streams demos at https://github.com/confluentinc/examples
• Java 7, Java 8+ with lambdas, and Scala
• WordCount, Joins, Avro integration, Top-N computation, Windowing, …
• Apache Kafka documentation:
http://kafka.apache.org/documentation.html
• Confluent documentation: http://docs.confluent.io/3.0.0/streams/
• Quickstart, Concepts, Architecture, Developer Guide, FAQ
• Join our bi-weekly Ask Me Anything sessions on Kafka Streams
• Contact me at eno@confluent.io for details

39
Some of the things to come
• Exactly-once semantics
• Queriable state – tap into the state of your applications (KIP-67: adopted)
• SQL interface
• Listen to and collaborate with the developer community
• Your feedback counts a lot! Share it via users@kafka.apache.org

40
Want to contribute to Kafka and open source?
Join the Kafka community
http://kafka.apache.org/
Questions, comments? Tweet with #bbuzz and /cc to @ConfluentInc
…in a great team with the creators of Kafka?
Confluent is hiring 
http://confluent.io/

42
Details on other KIPs
(Slides contributed by Ismael Juma)

43
KIP-4 Metadata
- Update MetadataRequest and MetadataResponse
- Expose new fields for KIP-4 - not used yet
- Make it possible to ask for cluster information with no topics
- Fix nasty bug where request would be repeatedly sent if producer
was started and unused for more than 5 minutes
- KAFKA-3602

44
KIP-31 Relative offsets in compressed message
sets
- Message format change (affects FetchRequest, ProduceRequest
and on-disk format)
- Avoids recompression to assign offsets
- Improves broker latency
- Should also improve throughput, but can affect producer batch
sizes so can reduce throughput in some cases, tune linger.ms and
batch.size

45
KIP-32 Add timestamps to Kafka message
- CreateTime or LogAppendTime
- Increases message size by 8 bytes
- Small throughput degradation, particularly for small messages
- Careful not to go over network limit due to this increase

46
Migration from V1 to V2 format
- Read the upgrade notes
- 0.10 Producer produces in new format
- 0.10 broker can store in old or new format depending on config
- 0.10 consumers can use either format
- 0.9 consumers only support old format
- Broker can do conversion on the fly (with performance impact)

47
KIP-35 Retrieving protocol version
- Request type that returns all the requests and versions supported
by the broker
- Aim is for clients to use this to help them support multiple broker
versions
- Not used by Java client yet
- Used by librdkafka and kafka-python

48
KIP-36 Rack aware replica assignment
- Kafka can now run with a rack awareness feature that isolates
replicas so they are guaranteed to span multiple racks or
availability zones. This allows all of Kafka’s durability guarantees to
be applied to these larger architectural units, significantly
increasing availability
- Old clients must be upgraded to 0.9.0.1 before going to 0.10.0.0
- broker.rack in server.properties
- Can be disabled when launching reassignment tool

49
New consumer enhancements
- KIP-41 KafkaConsumer Max Records
- KIP-42: Add Producer and Consumer Interceptors
- KIP-45 Standardize all client sequence interaction on j.u.Collection.

50
KIP-43 Kafka SASL enhancements
- Multiple SASL mechanisms: PLAIN and Kerberos included
- Pluggable
- Added support for protocol evolution

51
KIP-57 - Interoperable LZ4 Framing
It was broken, fixed in 0.10, took advantage of message format bump

52
Connect KIPs
KIP-51 - List Connectors REST API
KIP-52: Connector Control APIs
KIP-56: Allow cross origin HTTP requests on all HTTP methods

53
Lots of bugs fixed
Producer ordering, SocketServer leaks, New Consumer, Offset
handling in the broker
http://mirrors.muzzy.org.uk/apache/kafka/0.10.0.0/RELEASE_NOTES
.html

Kafka Explainaton

Recommended

More Related Content

Similar to Kafka Explainaton (20)

Recently uploaded (20)

Kafka Explainaton

Editor's Notes