SlideShare a Scribd company logo
MANCHESTER LONDON NEW YORK
Martin Zapletal @zapletal_martin
#ScalaDays
Data in Motion: Streaming Static Data Efficiently
in Akka Persistence (and elsewhere)
@cakesolutions
Databases
Batch processing
Data at scale
● Reactive
● Real time, asynchronous and message driven
● Elastic and scalable
● Resilient and fault tolerant
Streams
Streaming static data
● Turning database into a stream
Pulling data from source
0 0
5 5
10 10
0 0
0 0
5 5
10 10
5 5
0 0
5 5
10 10
0 0
10 10
0
5 5
10 10
5 5 0 0
0
10 10
0 0
5 5
10 10
5 5 0 01 1
Inserts
10 10
0 0
5 55
10 10
5 5 0 0
Updates
Pushing data from source
● Change log, change data capture
0 0
5 5
10 10
0 0
5 5
10 10
1 1
11
0 0
5 5
10 10
1 1
Infinite streams of finite data source
● Consistent snapshot and change log
0 0
5 5
10 10
0 0
5 5
10 10
1 1
0 0
5 5
10 10
1 1
0
1
2
3
4
0
5
10
1
5
Inserted value 0
Inserted value 5
Inserted value 10
Inserted value 1
Inserted value 55
Log data structure
Pulling data from a log
10 10 5 5 0 0
0 0
10
5 5
10
10 10 5 5 0 0
0 0
10
15 15
5 5
10
0 0
15 15
5 5 15 15 10 10 5 5 0 0
10 10
persistence_id1, event 2
persistence_id1, event 3
persistence_id1, event 4
persistence_id1, event 1
2
35
Akka Persistence
1 4
Akka Persistence Query
● eventsByPersistenceId, allPersistenceIds, eventsByTag
1 4 2
35
persistence_id1, event 2
persistence_id1, event 3
persistence_id1, event 4
persistence_id1, event 1
Persistence_ id partition_nr
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
Akka Persistence Query Cassandra
● Purely pull
● Event (log) data
Actor publisher
private[query] abstract class QueryActorPublisher[MessageType, State: ClassTag](refreshInterval: Option[FiniteDuration])
extends ActorPublisher[MessageType] {
protected def initialState: Future[State]
protected def initialQuery(initialState: State): Future[Action]
protected def requestNext(state: State, resultSet: ResultSet): Future[Action]
protected def requestNextFinished(state: State, resultSet: ResultSet): Future[Action]
protected def updateState(state: State, row: Row): (Option[MessageType], State)
protected def completionCondition(state: State): Boolean
private[this] def nextBehavior(...): Receive = {
if (shouldFetchMore(...)) {
listenableFutureToFuture(resultSet.fetchMoreResults()).map(FetchedResultSet).pipeTo(self)
awaiting(resultSet, state, finished)
} else if (shouldIdle(...)) {
idle(resultSet, state, finished)
} else if (shouldComplete(...)) {
onCompleteThenStop()
Actor.emptyBehavior
} else if (shouldRequestMore(...)) {
if (finished) requestNextFinished(state, resultSet).pipeTo(self)
else requestNext(state, resultSet).pipeTo(self)
awaiting(resultSet, state, finished)
} else {
idle(resultSet, state, finished)
}
}
}
private[query] abstract class QueryActorPublisher[MessageType, State: ClassTag](refreshInterval: Option[FiniteDuration])
extends ActorPublisher[MessageType] {
protected def initialState: Future[State]
protected def initialQuery(initialState: State): Future[Action]
protected def requestNext(state: State, resultSet: ResultSet): Future[Action]
protected def requestNextFinished(state: State, resultSet: ResultSet): Future[Action]
protected def updateState(state: State, row: Row): (Option[MessageType], State)
protected def completionCondition(state: State): Boolean
private[this] def nextBehavior(...): Receive = {
if (shouldFetchMore(...)) {
listenableFutureToFuture(resultSet.fetchMoreResults()).map(FetchedResultSet).pipeTo(self)
awaiting(resultSet, state, finished)
} else if (shouldIdle(...)) {
idle(resultSet, state, finished)
} else if (shouldComplete(...)) {
onCompleteThenStop()
Actor.emptyBehavior
} else if (shouldRequestMore(...)) {
if (finished) requestNextFinished(state, resultSet).pipeTo(self)
else requestNext(state, resultSet).pipeTo(self)
awaiting(resultSet, state, finished)
} else {
idle(resultSet, state, finished)
}
}
}
initialQuery
Cancel
initialFinishe
d
shouldFetch
More
shouldIdle
shouldTermi
nate
shouldReque
stMore
Subscription
Timeout
Cancel
Subscription
Timeout
initialNewRes
ultSet
request newResultSet
fetchedResul
tSet
finished
Cancel
Subscription
Timeout
request
continue
Red transitions
deliver buffer and update
internal state (progress)
Blue transitions
asynchronous database
query
SELECT * FROM ${tableName} WHERE
persistence_id = ? AND
partition_nr = ? AND
sequence_nr >= ? AND
sequence_nr <= ?
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
Events by persistence id
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 0 event 1
event 100 event 101 event 102
event 2
private[query] class EventsByPersistenceIdPublisher(...)
extends QueryActorPublisher[PersistentRepr, EventsByPersistenceIdState](...) {
override protected def initialState: Future[EventsByPersistenceIdState] = {
...
EventsByPersistenceIdState(initialFromSequenceNr, 0, currentPnr)
}
override protected def updateState(
state: EventsByPersistenceIdState,
Row: Row): (Option[PersistentRepr], EventsByPersistenceIdState) = {
val event = extractEvent(row)
val partitionNr = row.getLong("partition_nr") + 1
(Some(event),
EventsByPersistenceIdState(event.sequenceNr + 1, state.count + 1, partitionNr))
}
}
private[query] class EventsByPersistenceIdPublisher(...)
extends QueryActorPublisher[PersistentRepr, EventsByPersistenceIdState](...) {
override protected def initialState: Future[EventsByPersistenceIdState] = {
...
EventsByPersistenceIdState(initialFromSequenceNr, 0, currentPnr)
}
override protected def updateState(
state: EventsByPersistenceIdState,
Row: Row): (Option[PersistentRepr], EventsByPersistenceIdState) = {
val event = extractEvent(row)
val partitionNr = row.getLong("partition_nr") + 1
(Some(event),
EventsByPersistenceIdState(event.sequenceNr + 1, state.count + 1, partitionNr))
}
}
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
All persistence ids
SELECT DISTINCT persistence_id, partition_nr FROM $tableName
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
0
0
0
1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
private[query] class AllPersistenceIdsPublisher(...)
extends QueryActorPublisher[String, AllPersistenceIdsState](...) {
override protected def initialState: Future[AllPersistenceIdsState] =
Future.successful(AllPersistenceIdsState(Set.empty))
override protected def updateState(
state: AllPersistenceIdsState, row: Row): (Option[String], AllPersistenceIdsState) = {
val event = row.getString("persistence_id")
if (state.knownPersistenceIds.contains(event)) {
(None, state)
} else {
(Some(event), state.copy(knownPersistenceIds = state.knownPersistenceIds + event))
}
}
}
private[query] class AllPersistenceIdsPublisher(...)
extends QueryActorPublisher[String, AllPersistenceIdsState](...) {
override protected def initialState: Future[AllPersistenceIdsState] =
Future.successful(AllPersistenceIdsState(Set.empty))
override protected def updateState(
state: AllPersistenceIdsState, row: Row): (Option[String], AllPersistenceIdsState) = {
val event = row.getString("persistence_id")
if (state.knownPersistenceIds.contains(event)) {
(None, state)
} else {
(Some(event), state.copy(knownPersistenceIds = state.knownPersistenceIds + event))
}
}
}
Events by tag
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0 event 2,
tag 1
1 0
event 0 event 1 event 2,
tag 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
1 0
event 0 event 1
event 0
event 2,
tag 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0 event 2,
tag 1
1 0
event 1event 0 event 2,
tag 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0 event 2,
tag 1
1 0
event 0 event 1 event 2,
tag 1
event 0
event 0
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
1 0
event 1 event 2,
tag 1
event 0
event 0 event 1
0 0
0 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
1 0
event 2,
tag 1
event 1,
tag 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
1 0
event 2,
tag 1
event 0
event 0 event 1
event 1,
tag 1
event 1,
tag 1
event 2,
tag 1
event 0
event 0 event 1
event 1,
tag 10 0
0 1
event 100,
tag 1
event 101 event 102
1 0
event 2,
tag 1
event 2,
tag 1
event 0
event 0 event 1
0 0
0 1
event 100,
tag 1
event 101 event 102
1 0
event 2,
tag 1
event 1,
tag 1
0 0
0 1
1 0
event 2,
tag 1
event 0
event 0 event 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
event 1,
tag 1
Events by tag
Id 0,
event 1
Id 1,
event 2
Id 0,
event 100
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
Id 0,
event 2
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
SELECT * FROM $eventsByTagViewName$tagId WHERE
tag$tagId = ? AND
timebucket = ? AND
timestamp > ? AND
timestamp <= ?
ORDER BY timestamp ASC
LIMIT ?
Id 1,
event 2
Id 0,
event 100
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
Id 0,
event 2
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
Id 1,
event 2
Id 0,
event 100
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
Id 0,
event 2
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
Id 0,
event 100
Id 1,
event 2
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
Id 0,
event 2
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
Id 0,
event 100
Id 1,
event 2
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
Id 0,
event 2
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
event 2,
tag 1
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
tag 1 1/1/2016
tag 1 1/2/2016
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
persistence
_id
seq
0 1
1 . . .
event 2,
tag 1
Id 0,
event 100
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
persistence
_id
seq
0 ?
1 . . .
event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
Id 0,
event 100
Id 0,
event 2
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
persistence
_id
seq
0 ?
1
event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
. . .
seqNumbers match {
case None =>
replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr))
loop(n - 1)
case Some(s) =>
s.isNext(pid, seqNr) match {
case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst =>
seqNumbers = Some(s.updated(pid, seqNr))
replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr))
loop(n - 1)
case SequenceNumbers.After =>
replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr)
// end loop
case SequenceNumbers.Before =>
// duplicate, discard
if (!backtracking)
log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " +
s"but current sequence number is [${s.get(pid)}]")
loop(n - 1)
}
}
seqNumbers match {
case None =>
replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr))
loop(n - 1)
case Some(s) =>
s.isNext(pid, seqNr) match {
case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst =>
seqNumbers = Some(s.updated(pid, seqNr))
replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr))
loop(n - 1)
case SequenceNumbers.After =>
replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr)
// end loop
case SequenceNumbers.Before =>
// duplicate, discard
if (!backtracking)
log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " +
s"but current sequence number is [${s.get(pid)}]")
loop(n - 1)
}
}
def replay(): Unit = {
val backtracking = isBacktracking
val limit =
if (backtracking) maxBufferSize
else maxBufferSize - buf.size
val toOffs =
if (backtracking && abortDeadline.isEmpty) highestOffset
else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis)
context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking,
self, session, preparedSelect, seqNumbers, settings))
context.become(replaying(limit))
}
def replaying(limit: Int): Receive = {
case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer
case ReplayDone(count, seqN, highest) => // Request more
case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) =>
// Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged
case ReplayFailed(cause) => // Failure
case _: Request => // Deliver buffer
case Continue => // Do nothing
case Cancel => // Stop
}
def replay(): Unit = {
val backtracking = isBacktracking
val limit =
if (backtracking) maxBufferSize
else maxBufferSize - buf.size
val toOffs =
if (backtracking && abortDeadline.isEmpty) highestOffset
else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis)
context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking,
self, session, preparedSelect, seqNumbers, settings))
context.become(replaying(limit))
}
def replaying(limit: Int): Receive = {
case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer
case ReplayDone(count, seqN, highest) => // Request more
case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) =>
// Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged
case ReplayFailed(cause) => // Failure
case _: Request => // Deliver buffer
case Continue => // Do nothing
case Cancel => // Stop
}
Akka Persistence Cassandra Replay
def asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)
(replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future {
new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => {
replayCallback(msg)
})
}
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator
[PersistentRepr] {
private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr)
private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr)
private var mcnt = 0L
private var c: PersistentRepr = null
private var n: PersistentRepr = PersistentRepr(Undefined)
fetch()
def hasNext: Boolean = ...
def next(): PersistentRepr = …
...
}
Akka Persistence Cassandra Replay
def asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)
(replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future {
new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => {
replayCallback(msg)
})
}
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator
[PersistentRepr] {
private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr)
private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr)
private var mcnt = 0L
private var c: PersistentRepr = null
private var n: PersistentRepr = PersistentRepr(Undefined)
fetch()
def hasNext: Boolean = ...
def next(): PersistentRepr = …
...
}
Akka Persistence Cassandra Replay
def asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)
(replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future {
new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => {
replayCallback(msg)
})
}
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator
[PersistentRepr] {
private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr)
private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr)
private var mcnt = 0L
private var c: PersistentRepr = null
private var n: PersistentRepr = PersistentRepr(Undefined)
fetch()
def hasNext: Boolean = ...
def next(): PersistentRepr = …
...
}
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] {
var currentPnr = partitionNr(fromSequenceNr)
var currentSnr = fromSequenceNr
var fromSnr = fromSequenceNr
var toSnr = toSequenceNr
var iter = newIter()
def newIter() =
session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = {
if (iter.hasNext) true
else if (!inUse) false
} else {
currentPnr += 1
fromSnr = currentSnr
iter = newIter()
hasNext
}
}
def next(): Row = {
val row = iter.next()
currentSnr = row.getLong("sequence_nr")
row
}
}
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] {
var currentPnr = partitionNr(fromSequenceNr)
var currentSnr = fromSequenceNr
var fromSnr = fromSequenceNr
var toSnr = toSequenceNr
var iter = newIter()
def newIter() =
session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = {
if (iter.hasNext) true
else if (!inUse) false
} else {
currentPnr += 1
fromSnr = currentSnr
iter = newIter()
hasNext
}
}
def next(): Row = {
val row = iter.next()
currentSnr = row.getLong("sequence_nr")
row
}
}
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] {
var currentPnr = partitionNr(fromSequenceNr)
var currentSnr = fromSequenceNr
var fromSnr = fromSequenceNr
var toSnr = toSequenceNr
var iter = newIter()
def newIter() =
session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = {
if (iter.hasNext) true
else if (!inUse) false
} else {
currentPnr += 1
fromSnr = currentSnr
iter = newIter()
hasNext
}
}
def next(): Row = {
val row = iter.next()
currentSnr = row.getLong("sequence_nr")
row
}
}
Non blocking asynchronous replay
private[this] val queries: CassandraReadJournal =
new CassandraReadJournal(
extendedActorSystem,
context.system.settings.config.getConfig("cassandra-query-journal"))
override def asyncReplayMessages(
persistenceId: String,
fromSequenceNr: Long,
toSequenceNr: Long,
max: Long)(replayCallback: (PersistentRepr) => Unit): Future[Unit] =
queries
.eventsByPersistenceId(
persistenceId,
fromSequenceNr,
toSequenceNr,
max,
replayMaxResultSize,
None,
"asyncReplayMessages")
.runForeach(replayCallback)
.map(_ => ())
private[this] val queries: CassandraReadJournal =
new CassandraReadJournal(
extendedActorSystem,
context.system.settings.config.getConfig("cassandra-query-journal"))
override def asyncReplayMessages(
persistenceId: String,
fromSequenceNr: Long,
toSequenceNr: Long,
max: Long)(replayCallback: (PersistentRepr) => Unit): Future[Unit] =
queries
.eventsByPersistenceId(
persistenceId,
fromSequenceNr,
toSequenceNr,
max,
replayMaxResultSize,
None,
"asyncReplayMessages")
.runForeach(replayCallback)
.map(_ => ())
Benchmarks
5000
10 000
15 000
20 000
25 000
30 000
35 000
40 000
5000
10 000
15 000
20 000
25 000
30 000
35 000
40 000
0 0
10 000
20 000
30 000
40 000
0
50 000
Time(s)
Time(s)
Time(s)
Actors
Threads, Actors
Threads
20 40 60 80 100 120 1405000 10000 15000 20000 25000 30000
10 20 30 40 50 60 70
45 000
50 000
blocking
asynchronous
REPLAY STRONG SCALING
WEAK SCALING
node_id
Alternative architecture
0
1
persistence_id 0,
event 0
persistence_id 0,
event 1
persistence_id 1,
event 0
persistence_id 0,
event 2
persistence_id 2,
event 0
persistence_id 0,
event 3
persistence_id 0,
event 0
persistence_id 0,
event 1
persistence_id 1,
event 0
persistence_id 2,
event 0
persistence_id 0,
event 2
persistence_id 0,
event 3
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 1event o
node_id
0
1
Id 0,
event 0
Id 0,
event 1
Id 1,
event 0
Id 0,
event 2
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 1,
event 0
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds)
Future.sequence(boundStatements).flatMap { stmts =>
val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...)
stmts.foreach(batch.add)
session.underlying().flatMap(_.executeAsync(batch))
}
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds)
Future.sequence(boundStatements).flatMap { stmts =>
val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...)
stmts.foreach(batch.add)
session.underlying().flatMap(_.executeAsync(batch))
}
val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement)
val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement)
...
session.underlying().flatMap { s =>
val ebpResult = s.executeAsync(eventsByPersistenceIdStatement)
val batchResult = s.executeAsync(batch))
...
}
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement)
val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement)
...
session.underlying().flatMap { s =>
val ebpResult = s.executeAsync(eventsByPersistenceIdStatement)
val batchResult = s.executeAsync(batch))
...
}
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
Event time processing
● Ingestion time, processing time, event time
Data in Motion: Streaming Static Data Efficiently
Ordering
10 2
1 12:34:57 1
KEY TIME VALUE
2 12:34:58 2
KEY TIME VALUE
0 12:34:56 0
KEY TIME VALUE
0
1
2
1 12:34:57 1
KEY TIME VALUE
2 12:34:58 2
KEY TIME VALUE
0 12:34:56 0
KEY TIME VALUE
Distributed causal stream merging
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
node_id
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
node_id
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
node_id
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
node_id
persistence
_id
seq
0 0
1 . . .
2 . . .
persistence
_id
seq
0 1
1 . . .
2 . . .
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 0
node_id
0
1
Id 2,
event 0
Id 0,
event 0
Id 0,
event 1
Id 0,
event 3
persistence
_id
seq
0 2
1 0
2 0
Id 0,
event 1
Id 0,
event 0
Id 1,
event 0
node_id
0
1
Id 2,
event 0
Id 0,
event 0
Id 0,
event 1
Id 0,
event 2
Id 0,
event 3
Id 2,
event 0
Id 0,
event 2
Id 1,
event 0
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
node_id
Id 1,
event 0
persistence
_id
seq
0 3
1 0
2 0
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
node_id
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
Replay
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
node_id
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
node_id
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
node_id
persistence
_id
seq
0 2
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
persistence
_id
seq
0 2
stream_id seq
0 1
1 2
1
node_id
Exactly once delivery
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 3
Id 1,
event 0
ACK ACK ACK ACK ACK
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 3
Id 1,
event 0
ACK ACK ACK ACK ACK
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 3
Id 1,
event 0
ACK ACK ACK ACK ACK
Checkpoint
data
State
Backend
Source 1: 6791
Source 2: 7252
Source 3: 5589
Source 4: 6843
State 1: ptr 1
State 1: ptr 2
Sink 2: ack!
Sink 2: ack!
class KafkaSource(private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager])
extends TimeReplayableSource {
def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = {
fetch.setStartOffset(topicAndPartition, offsetManager.resolveOffset(time))
...
}
def read(batchSize: Int): List[Message]
def close(): Unit
}
class KafkaSource(private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager])
extends TimeReplayableSource {
def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = {
fetch.setStartOffset(topicAndPartition, offsetManager.resolveOffset(time))
...
}
def read(batchSize: Int): List[Message]
def close(): Unit
}
class DirectKafkaInputDStream[K, V, U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag, R](
_ssc: StreamingContext,
val kafkaParams: Map[String, String],
val fromOffsets: Map[TopicAndPartition, Long],
messageHandler: MessageAndMetadata[K, V] => R
) extends InputDStream[R](_ssc) with Logging {
override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = {
val untilOffsets = latestLeaderOffsets(maxRetries)
...
}
}
class DirectKafkaInputDStream[K, V, U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag, R](
_ssc: StreamingContext,
val kafkaParams: Map[String, String],
val fromOffsets: Map[TopicAndPartition, Long],
messageHandler: MessageAndMetadata[K, V] => R
) extends InputDStream[R](_ssc) with Logging {
override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = {
val untilOffsets = latestLeaderOffsets(maxRetries)
...
}
}
Exactly once delivery
● Durable offset
0 1 2 3 4
0 1 2 3 4
10 2 3 4
10 3 42
Stream
source
Stream
source
Stream
source
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
select
map filter
filtermap
select
select
select
Optimisation
Worker
Worker
Worker
Worker
select where
select where
Worker
Stream
source
Stream
source
Stream
source
select where
select where
Worker
Worker
Worker
select where
select where
Stream
source
Stream
source
Stream
source select where
select where
select where
select where
val partitioner =
partitionerClassName match {
case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory
case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory
case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName")
}
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = {
if (range.end == tokenFactory.minToken)
List(CqlTokenRange(s"token($pk) > ?", startToken))
else if (range.start == tokenFactory.minToken)
List(CqlTokenRange(s"token($pk) <= ?", endToken))
else if (!range.isWrapAround)
List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken))
else
List(
CqlTokenRange(s"token($pk) > ?", startToken),
CqlTokenRange(s"token($pk) <= ?", endToken))
}
val partitioner =
partitionerClassName match {
case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory
case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory
case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName")
}
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = {
if (range.end == tokenFactory.minToken)
List(CqlTokenRange(s"token($pk) > ?", startToken))
else if (range.start == tokenFactory.minToken)
List(CqlTokenRange(s"token($pk) <= ?", endToken))
else if (!range.isWrapAround)
List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken))
else
List(
CqlTokenRange(s"token($pk) > ?", startToken),
CqlTokenRange(s"token($pk) <= ?", endToken))
}
val partitioner =
partitionerClassName match {
case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory
case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory
case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName")
}
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = {
if (range.end == tokenFactory.minToken)
List(CqlTokenRange(s"token($pk) > ?", startToken))
else if (range.start == tokenFactory.minToken)
List(CqlTokenRange(s"token($pk) <= ?", endToken))
else if (!range.isWrapAround)
List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken))
else
List(
CqlTokenRange(s"token($pk) > ?", startToken),
CqlTokenRange(s"token($pk) <= ?", endToken))
}
override def getPreferredLocations(split: Partition): Seq[String] =
split.asInstanceOf[CassandraPartition].endpoints.flatMap(nodeAddresses.hostNames).toSeq
override def getPartitions: Array[Partition] = {
val partitioner = CassandraRDDPartitioner(connector, tableDef, splitCount, splitSize)
val partitions = partitioner.partitions(where)
partitions
}
override def compute(split: Partition, context: TaskContext): Iterator[R] = {
val session = connector.openSession()
val partition = split.asInstanceOf[CassandraPartition]
val tokenRanges = partition.tokenRanges
val metricsUpdater = InputMetricsUpdater(context, readConf)
val rowIterator = tokenRanges.iterator.flatMap(
fetchTokenRange(session, _, metricsUpdater))
new CountingIterator(rowIterator, limit)
}
override def getPreferredLocations(split: Partition): Seq[String] =
split.asInstanceOf[CassandraPartition].endpoints.flatMap(nodeAddresses.hostNames).toSeq
override def getPartitions: Array[Partition] = {
val partitioner = CassandraRDDPartitioner(connector, tableDef, splitCount, splitSize)
val partitions = partitioner.partitions(where)
partitions
}
override def compute(split: Partition, context: TaskContext): Iterator[R] = {
val session = connector.openSession()
val partition = split.asInstanceOf[CassandraPartition]
val tokenRanges = partition.tokenRanges
val metricsUpdater = InputMetricsUpdater(context, readConf)
val rowIterator = tokenRanges.iterator.flatMap(
fetchTokenRange(session, _, metricsUpdater))
new CountingIterator(rowIterator, limit)
}
object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case filter @ Filter(condition, project @ Project(fields, grandChild))
if fields.forall(_.deterministic) =>
val aliasMap = AttributeMap(fields.collect {
case a: Alias => (a.toAttribute, a.child)
})
project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild))
}
}
object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case filter @ Filter(condition, project @ Project(fields, grandChild))
if fields.forall(_.deterministic) =>
val aliasMap = AttributeMap(fields.collect {
case a: Alias => (a.toAttribute, a.child)
})
project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild))
}
}
Table and stream duality
1
4
3
5
2
Table and stream duality
1
4
3
5
2
1 State X
1 Id 0
Event 1
Table and stream duality
1
4
3
5
2
1 State X
Id 0
Event 2
Id 0
Event 1
Snapshot
for offset N
Table and stream duality
1
4
3
5
2
1 Id 0
Event 1
1 State X
Id 0
Event 2
Id 0
Event 1
4
Table and stream duality
Snapshot
for offset N
1
4
3
5
2
1 Id 0
Event 1
1 State X
Id 0
Event 2
Id 0
Event 1
4
N
Id 0
Offset 123
State X
Id 11
Offset 123
State X
Cache / view / index /
replica / system / service
Continuous stream applying
transformation function
Updates to the source
of truth data
Original table
Infinite streams application
internet
services
devices
social
Kafka
Stream
processing
apps
Stream
consumer
Search
Apps
Services
Databases
Batch
Serialisation
Distributed systems
User
Mobile
System
Microservice
Microservice
Microservice
Microservice Microservice Microservice
Microservice
CQRS/ES Relational NoSQL
Client 1
Client 2
Client 3
Update
Update
Update
Model devices Model devices Model devices
Input data Input data Input data
Parameter devices
P
ΔP
ΔP
ΔP
Challenges
● All the solved problems
○ Exactly once delivery
○ Consistency
○ Availability
○ Fault tolerance
○ Cross service invariants and consistency
○ Transactions
○ Automated deployment and configuration management
○ Serialization, versioning, compatibility
○ Automated elasticity
○ No downtime version upgrades
○ Graceful shutdown of nodes
○ Distributed system verification, logging, tracing, monitoring, debugging
○ Split brains
○ ...
Conclusion
● From request, response, synchronous, mutable state
● To streams, asynchronous messaging
● Production ready distributed systems
MANCHESTER LONDON NEW YORK
Questions
MANCHESTER LONDON NEW YORK
@zapletal_martin @cakesolutions
347 708 1518
enquiries@cakesolutions.net
We are hiring
http://www.cakesolutions.net/careers
Ad

More Related Content

What's hot (20)

Rxjs ngvikings
Rxjs ngvikingsRxjs ngvikings
Rxjs ngvikings
Christoffer Noring
 
Rxjs vienna
Rxjs viennaRxjs vienna
Rxjs vienna
Christoffer Noring
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
Ngrx slides
Ngrx slidesNgrx slides
Ngrx slides
Christoffer Noring
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Investigation of Transactions in Cassandra
Investigation of Transactions in CassandraInvestigation of Transactions in Cassandra
Investigation of Transactions in Cassandra
Ian Chen
 
Using Akka Persistence to build a configuration datastore
Using Akka Persistence to build a configuration datastoreUsing Akka Persistence to build a configuration datastore
Using Akka Persistence to build a configuration datastore
Anargyros Kiourkos
 
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Dan Robinson
 
Reactive Design Patterns — J on the Beach
Reactive Design Patterns — J on the BeachReactive Design Patterns — J on the Beach
Reactive Design Patterns — J on the Beach
Roland Kuhn
 
Stateful streaming data pipelines
Stateful streaming data pipelinesStateful streaming data pipelines
Stateful streaming data pipelines
Timothy Farkas
 
Angular2 rxjs
Angular2 rxjsAngular2 rxjs
Angular2 rxjs
Christoffer Noring
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
Matthew Dennis
 
Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for android
Esa Firman
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Matthias Niehoff
 
RxJS Schedulers - Controlling Time
RxJS Schedulers - Controlling TimeRxJS Schedulers - Controlling Time
RxJS Schedulers - Controlling Time
Ilia Idakiev
 
Rxjs ppt
Rxjs pptRxjs ppt
Rxjs ppt
Christoffer Noring
 
Marble Testing RxJS streams
Marble Testing RxJS streamsMarble Testing RxJS streams
Marble Testing RxJS streams
Ilia Idakiev
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
StampedeCon
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Investigation of Transactions in Cassandra
Investigation of Transactions in CassandraInvestigation of Transactions in Cassandra
Investigation of Transactions in Cassandra
Ian Chen
 
Using Akka Persistence to build a configuration datastore
Using Akka Persistence to build a configuration datastoreUsing Akka Persistence to build a configuration datastore
Using Akka Persistence to build a configuration datastore
Anargyros Kiourkos
 
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Dan Robinson
 
Reactive Design Patterns — J on the Beach
Reactive Design Patterns — J on the BeachReactive Design Patterns — J on the Beach
Reactive Design Patterns — J on the Beach
Roland Kuhn
 
Stateful streaming data pipelines
Stateful streaming data pipelinesStateful streaming data pipelines
Stateful streaming data pipelines
Timothy Farkas
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
Matthew Dennis
 
Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for android
Esa Firman
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Matthias Niehoff
 
RxJS Schedulers - Controlling Time
RxJS Schedulers - Controlling TimeRxJS Schedulers - Controlling Time
RxJS Schedulers - Controlling Time
Ilia Idakiev
 
Marble Testing RxJS streams
Marble Testing RxJS streamsMarble Testing RxJS streams
Marble Testing RxJS streams
Ilia Idakiev
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
StampedeCon
 

Viewers also liked (15)

Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
Scala Days NYC 2016
Scala Days NYC 2016Scala Days NYC 2016
Scala Days NYC 2016
Martin Odersky
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
Evan Chan
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal
 
Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
Martin Zapletal
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
Humoyun Ahmedov
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
Martin Zapletal
 
Curator intro
Curator introCurator intro
Curator intro
Jordan Zimmerman
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
Martin Zapletal
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
Dean Wampler
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
Martin Odersky
 
Why The Free Monad isn't Free
Why The Free Monad isn't FreeWhy The Free Monad isn't Free
Why The Free Monad isn't Free
Kelley Robinson
 
6 Snapchat Hacks Too Easy To Ignore
6 Snapchat Hacks Too Easy To Ignore6 Snapchat Hacks Too Easy To Ignore
6 Snapchat Hacks Too Easy To Ignore
Gary Vaynerchuk
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
Evan Chan
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal
 
Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
Martin Zapletal
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
Humoyun Ahmedov
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
Martin Zapletal
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
Martin Zapletal
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
Dean Wampler
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
Martin Odersky
 
Why The Free Monad isn't Free
Why The Free Monad isn't FreeWhy The Free Monad isn't Free
Why The Free Monad isn't Free
Kelley Robinson
 
6 Snapchat Hacks Too Easy To Ignore
6 Snapchat Hacks Too Easy To Ignore6 Snapchat Hacks Too Easy To Ignore
6 Snapchat Hacks Too Easy To Ignore
Gary Vaynerchuk
 
Ad

Similar to Data in Motion: Streaming Static Data Efficiently (20)

Serverless stateful
Serverless statefulServerless stateful
Serverless stateful
Patrick Di Loreto
 
Functional programming techniques in real-world microservices
Functional programming techniques in real-world microservicesFunctional programming techniques in real-world microservices
Functional programming techniques in real-world microservices
András Papp
 
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingUsing Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Databricks
 
code for quiz in my sql
code for quiz  in my sql code for quiz  in my sql
code for quiz in my sql
JOYITAKUNDU1
 
Event sourcing in the functional world (22 07-2021)
Event sourcing in the functional world (22 07-2021)Event sourcing in the functional world (22 07-2021)
Event sourcing in the functional world (22 07-2021)
Vitaly Brusentsev
 
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Dan Robinson
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
Vadym Khondar
 
Event Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BEEvent Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BE
Andrzej Ludwikowski
 
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
Databricks
 
Casting for not so strange actors
Casting for not so strange actorsCasting for not so strange actors
Casting for not so strange actors
zucaritask
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
Mahmoud Samir Fayed
 
JavaScript Refactoring
JavaScript RefactoringJavaScript Refactoring
JavaScript Refactoring
Krzysztof Szafranek
 
The Ring programming language version 1.4.1 book - Part 13 of 31
The Ring programming language version 1.4.1 book - Part 13 of 31The Ring programming language version 1.4.1 book - Part 13 of 31
The Ring programming language version 1.4.1 book - Part 13 of 31
Mahmoud Samir Fayed
 
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
sjabs
 
Managing Complex UI using xState
Managing Complex UI using xStateManaging Complex UI using xState
Managing Complex UI using xState
Xavier Lozinguez
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
Mahmoud Samir Fayed
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
MongoSF
 
Sane Sharding with Akka Cluster
Sane Sharding with Akka ClusterSane Sharding with Akka Cluster
Sane Sharding with Akka Cluster
miciek
 
JavaScript and the AST
JavaScript and the ASTJavaScript and the AST
JavaScript and the AST
Jarrod Overson
 
Functional programming techniques in real-world microservices
Functional programming techniques in real-world microservicesFunctional programming techniques in real-world microservices
Functional programming techniques in real-world microservices
András Papp
 
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingUsing Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Databricks
 
code for quiz in my sql
code for quiz  in my sql code for quiz  in my sql
code for quiz in my sql
JOYITAKUNDU1
 
Event sourcing in the functional world (22 07-2021)
Event sourcing in the functional world (22 07-2021)Event sourcing in the functional world (22 07-2021)
Event sourcing in the functional world (22 07-2021)
Vitaly Brusentsev
 
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Dan Robinson
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
Vadym Khondar
 
Event Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BEEvent Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BE
Andrzej Ludwikowski
 
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
Databricks
 
Casting for not so strange actors
Casting for not so strange actorsCasting for not so strange actors
Casting for not so strange actors
zucaritask
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
Mahmoud Samir Fayed
 
The Ring programming language version 1.4.1 book - Part 13 of 31
The Ring programming language version 1.4.1 book - Part 13 of 31The Ring programming language version 1.4.1 book - Part 13 of 31
The Ring programming language version 1.4.1 book - Part 13 of 31
Mahmoud Samir Fayed
 
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
sjabs
 
Managing Complex UI using xState
Managing Complex UI using xStateManaging Complex UI using xState
Managing Complex UI using xState
Xavier Lozinguez
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
Mahmoud Samir Fayed
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
MongoSF
 
Sane Sharding with Akka Cluster
Sane Sharding with Akka ClusterSane Sharding with Akka Cluster
Sane Sharding with Akka Cluster
miciek
 
JavaScript and the AST
JavaScript and the ASTJavaScript and the AST
JavaScript and the AST
Jarrod Overson
 
Ad

Recently uploaded (20)

Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 

Data in Motion: Streaming Static Data Efficiently

  • 2. Martin Zapletal @zapletal_martin #ScalaDays Data in Motion: Streaming Static Data Efficiently in Akka Persistence (and elsewhere) @cakesolutions
  • 5. Data at scale ● Reactive ● Real time, asynchronous and message driven ● Elastic and scalable ● Resilient and fault tolerant
  • 7. Streaming static data ● Turning database into a stream
  • 8. Pulling data from source 0 0 5 5 10 10
  • 9. 0 0 0 0 5 5 10 10
  • 10. 5 5 0 0 5 5 10 10 0 0
  • 11. 10 10 0 5 5 10 10 5 5 0 0 0
  • 12. 10 10 0 0 5 5 10 10 5 5 0 01 1 Inserts
  • 13. 10 10 0 0 5 55 10 10 5 5 0 0 Updates
  • 14. Pushing data from source ● Change log, change data capture 0 0 5 5 10 10
  • 15. 0 0 5 5 10 10 1 1
  • 16. 11 0 0 5 5 10 10 1 1
  • 17. Infinite streams of finite data source ● Consistent snapshot and change log 0 0 5 5 10 10 0 0 5 5 10 10 1 1 0 0 5 5 10 10 1 1
  • 18. 0 1 2 3 4 0 5 10 1 5 Inserted value 0 Inserted value 5 Inserted value 10 Inserted value 1 Inserted value 55 Log data structure
  • 19. Pulling data from a log 10 10 5 5 0 0 0 0 10 5 5 10
  • 20. 10 10 5 5 0 0 0 0 10 15 15 5 5 10
  • 21. 0 0 15 15 5 5 15 15 10 10 5 5 0 0 10 10
  • 22. persistence_id1, event 2 persistence_id1, event 3 persistence_id1, event 4 persistence_id1, event 1 2 35 Akka Persistence 1 4
  • 23. Akka Persistence Query ● eventsByPersistenceId, allPersistenceIds, eventsByTag 1 4 2 35 persistence_id1, event 2 persistence_id1, event 3 persistence_id1, event 4 persistence_id1, event 1
  • 24. Persistence_ id partition_nr 0 0 0 1 event 1 event 100 event 101 event 102 event 0 event 2 1 0 event 0 event 1 event 2 Akka Persistence Query Cassandra ● Purely pull ● Event (log) data
  • 25. Actor publisher private[query] abstract class QueryActorPublisher[MessageType, State: ClassTag](refreshInterval: Option[FiniteDuration]) extends ActorPublisher[MessageType] { protected def initialState: Future[State] protected def initialQuery(initialState: State): Future[Action] protected def requestNext(state: State, resultSet: ResultSet): Future[Action] protected def requestNextFinished(state: State, resultSet: ResultSet): Future[Action] protected def updateState(state: State, row: Row): (Option[MessageType], State) protected def completionCondition(state: State): Boolean private[this] def nextBehavior(...): Receive = { if (shouldFetchMore(...)) { listenableFutureToFuture(resultSet.fetchMoreResults()).map(FetchedResultSet).pipeTo(self) awaiting(resultSet, state, finished) } else if (shouldIdle(...)) { idle(resultSet, state, finished) } else if (shouldComplete(...)) { onCompleteThenStop() Actor.emptyBehavior } else if (shouldRequestMore(...)) { if (finished) requestNextFinished(state, resultSet).pipeTo(self) else requestNext(state, resultSet).pipeTo(self) awaiting(resultSet, state, finished) } else { idle(resultSet, state, finished) } } }
  • 26. private[query] abstract class QueryActorPublisher[MessageType, State: ClassTag](refreshInterval: Option[FiniteDuration]) extends ActorPublisher[MessageType] { protected def initialState: Future[State] protected def initialQuery(initialState: State): Future[Action] protected def requestNext(state: State, resultSet: ResultSet): Future[Action] protected def requestNextFinished(state: State, resultSet: ResultSet): Future[Action] protected def updateState(state: State, row: Row): (Option[MessageType], State) protected def completionCondition(state: State): Boolean private[this] def nextBehavior(...): Receive = { if (shouldFetchMore(...)) { listenableFutureToFuture(resultSet.fetchMoreResults()).map(FetchedResultSet).pipeTo(self) awaiting(resultSet, state, finished) } else if (shouldIdle(...)) { idle(resultSet, state, finished) } else if (shouldComplete(...)) { onCompleteThenStop() Actor.emptyBehavior } else if (shouldRequestMore(...)) { if (finished) requestNextFinished(state, resultSet).pipeTo(self) else requestNext(state, resultSet).pipeTo(self) awaiting(resultSet, state, finished) } else { idle(resultSet, state, finished) } } }
  • 28. SELECT * FROM ${tableName} WHERE persistence_id = ? AND partition_nr = ? AND sequence_nr >= ? AND sequence_nr <= ? 0 0 0 1 event 1 event 100 event 101 event 102 event 0 event 2 Events by persistence id
  • 29. 0 0 0 1 event 1 event 100 event 101 event 102 event 2event 0
  • 30. 0 0 0 1 event 1 event 100 event 101 event 102 event 2event 0
  • 31. 0 0 0 1 event 1 event 100 event 101 event 102 event 2event 0
  • 32. 0 0 0 1 event 1 event 100 event 101 event 102 event 2event 0
  • 33. 0 0 0 1 event 1 event 100 event 101 event 102 event 2event 0
  • 34. 0 0 0 1 event 0 event 1 event 100 event 101 event 102 event 2
  • 35. private[query] class EventsByPersistenceIdPublisher(...) extends QueryActorPublisher[PersistentRepr, EventsByPersistenceIdState](...) { override protected def initialState: Future[EventsByPersistenceIdState] = { ... EventsByPersistenceIdState(initialFromSequenceNr, 0, currentPnr) } override protected def updateState( state: EventsByPersistenceIdState, Row: Row): (Option[PersistentRepr], EventsByPersistenceIdState) = { val event = extractEvent(row) val partitionNr = row.getLong("partition_nr") + 1 (Some(event), EventsByPersistenceIdState(event.sequenceNr + 1, state.count + 1, partitionNr)) } }
  • 36. private[query] class EventsByPersistenceIdPublisher(...) extends QueryActorPublisher[PersistentRepr, EventsByPersistenceIdState](...) { override protected def initialState: Future[EventsByPersistenceIdState] = { ... EventsByPersistenceIdState(initialFromSequenceNr, 0, currentPnr) } override protected def updateState( state: EventsByPersistenceIdState, Row: Row): (Option[PersistentRepr], EventsByPersistenceIdState) = { val event = extractEvent(row) val partitionNr = row.getLong("partition_nr") + 1 (Some(event), EventsByPersistenceIdState(event.sequenceNr + 1, state.count + 1, partitionNr)) } }
  • 37. 0 0 0 1 event 1 event 100 event 101 event 102 event 0 event 2 1 0 event 0 event 1 event 2 All persistence ids SELECT DISTINCT persistence_id, partition_nr FROM $tableName
  • 38. 0 0 0 1 event 1 event 100 event 101 event 102 event 0 event 2 1 0 event 0 event 1 event 2
  • 39. 0 0 0 1 event 1 event 100 event 101 event 102 event 0 event 2 1 0 event 0 event 1 event 2
  • 40. 0 0 0 1 event 1 event 100 event 101 event 102 event 0 event 2 1 0 event 0 event 1 event 2
  • 41. private[query] class AllPersistenceIdsPublisher(...) extends QueryActorPublisher[String, AllPersistenceIdsState](...) { override protected def initialState: Future[AllPersistenceIdsState] = Future.successful(AllPersistenceIdsState(Set.empty)) override protected def updateState( state: AllPersistenceIdsState, row: Row): (Option[String], AllPersistenceIdsState) = { val event = row.getString("persistence_id") if (state.knownPersistenceIds.contains(event)) { (None, state) } else { (Some(event), state.copy(knownPersistenceIds = state.knownPersistenceIds + event)) } } }
  • 42. private[query] class AllPersistenceIdsPublisher(...) extends QueryActorPublisher[String, AllPersistenceIdsState](...) { override protected def initialState: Future[AllPersistenceIdsState] = Future.successful(AllPersistenceIdsState(Set.empty)) override protected def updateState( state: AllPersistenceIdsState, row: Row): (Option[String], AllPersistenceIdsState) = { val event = row.getString("persistence_id") if (state.knownPersistenceIds.contains(event)) { (None, state) } else { (Some(event), state.copy(knownPersistenceIds = state.knownPersistenceIds + event)) } } }
  • 43. Events by tag 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 event 2, tag 1 1 0 event 0 event 1 event 2, tag 1
  • 44. 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 2, tag 1 1 0 event 0 event 1 event 0 event 2, tag 1
  • 45. 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 event 2, tag 1 1 0 event 1event 0 event 2, tag 1
  • 46. 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 event 2, tag 1 1 0 event 0 event 1 event 2, tag 1
  • 47. event 0 event 0 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 2, tag 1 1 0 event 1 event 2, tag 1
  • 48. event 0 event 0 event 1 0 0 0 1 event 100, tag 1 event 101 event 102 event 2, tag 1 1 0 event 2, tag 1 event 1, tag 1
  • 49. 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 2, tag 1 1 0 event 2, tag 1 event 0 event 0 event 1 event 1, tag 1
  • 50. event 1, tag 1 event 2, tag 1 event 0 event 0 event 1 event 1, tag 10 0 0 1 event 100, tag 1 event 101 event 102 1 0 event 2, tag 1
  • 51. event 2, tag 1 event 0 event 0 event 1 0 0 0 1 event 100, tag 1 event 101 event 102 1 0 event 2, tag 1 event 1, tag 1
  • 52. 0 0 0 1 1 0 event 2, tag 1 event 0 event 0 event 1 event 100, tag 1 event 101 event 102 event 2, tag 1 event 1, tag 1
  • 53. Events by tag Id 0, event 1 Id 1, event 2 Id 0, event 100 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 1 0 event 0 event 1 event 2, tag 1 Id 0, event 2 tag 1 1/1/2016 tag 1 1/2/2016 event 2, tag 1 SELECT * FROM $eventsByTagViewName$tagId WHERE tag$tagId = ? AND timebucket = ? AND timestamp > ? AND timestamp <= ? ORDER BY timestamp ASC LIMIT ?
  • 54. Id 1, event 2 Id 0, event 100 Id 0, event 1 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 Id 0, event 2 1 0 event 0 event 1 event 2, tag 1 tag 1 1/1/2016 tag 1 1/2/2016 event 2, tag 1
  • 55. Id 1, event 2 Id 0, event 100 Id 0, event 1 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 Id 0, event 2 1 0 event 0 event 1 event 2, tag 1 tag 1 1/1/2016 tag 1 1/2/2016 event 2, tag 1
  • 56. Id 0, event 100 Id 1, event 2 Id 0, event 1 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 Id 0, event 2 1 0 event 0 event 1 event 2, tag 1 tag 1 1/1/2016 tag 1 1/2/2016 event 2, tag 1
  • 57. Id 0, event 100 Id 1, event 2 Id 0, event 1 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 1 0 event 0 event 1 event 2, tag 1 tag 1 1/1/2016 tag 1 1/2/2016 event 2, tag 1 Id 0, event 2
  • 58. 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 event 2, tag 1 1 0 event 0 event 1 event 2, tag 1 tag 1 1/1/2016 tag 1 1/2/2016
  • 59. tag 1 1/1/2016 tag 1 1/2/2016 Id 0, event 1 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 1 0 event 0 event 1 event 2, tag 1 persistence _id seq 0 1 1 . . . event 2, tag 1
  • 60. Id 0, event 100 Id 0, event 1 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 1 0 event 0 event 1 event 2, tag 1 persistence _id seq 0 ? 1 . . . event 2, tag 1 tag 1 1/1/2016 tag 1 1/2/2016
  • 61. Id 0, event 100 Id 0, event 2 Id 0, event 1 0 0 0 1 event 1, tag 1 event 100, tag 1 event 101 event 102 event 0 1 0 event 0 event 1 event 2, tag 1 persistence _id seq 0 ? 1 event 2, tag 1 tag 1 1/1/2016 tag 1 1/2/2016 . . .
  • 62. seqNumbers match { case None => replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1) case Some(s) => s.isNext(pid, seqNr) match { case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst => seqNumbers = Some(s.updated(pid, seqNr)) replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1) case SequenceNumbers.After => replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr) // end loop case SequenceNumbers.Before => // duplicate, discard if (!backtracking) log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " + s"but current sequence number is [${s.get(pid)}]") loop(n - 1) } }
  • 63. seqNumbers match { case None => replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1) case Some(s) => s.isNext(pid, seqNr) match { case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst => seqNumbers = Some(s.updated(pid, seqNr)) replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1) case SequenceNumbers.After => replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr) // end loop case SequenceNumbers.Before => // duplicate, discard if (!backtracking) log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " + s"but current sequence number is [${s.get(pid)}]") loop(n - 1) } }
  • 64. def replay(): Unit = { val backtracking = isBacktracking val limit = if (backtracking) maxBufferSize else maxBufferSize - buf.size val toOffs = if (backtracking && abortDeadline.isEmpty) highestOffset else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis) context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking, self, session, preparedSelect, seqNumbers, settings)) context.become(replaying(limit)) } def replaying(limit: Int): Receive = { case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer case ReplayDone(count, seqN, highest) => // Request more case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) => // Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged case ReplayFailed(cause) => // Failure case _: Request => // Deliver buffer case Continue => // Do nothing case Cancel => // Stop }
  • 65. def replay(): Unit = { val backtracking = isBacktracking val limit = if (backtracking) maxBufferSize else maxBufferSize - buf.size val toOffs = if (backtracking && abortDeadline.isEmpty) highestOffset else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis) context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking, self, session, preparedSelect, seqNumbers, settings)) context.become(replaying(limit)) } def replaying(limit: Int): Receive = { case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer case ReplayDone(count, seqN, highest) => // Request more case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) => // Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged case ReplayFailed(cause) => // Failure case _: Request => // Deliver buffer case Continue => // Do nothing case Cancel => // Stop }
  • 66. Akka Persistence Cassandra Replay def asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) } class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator [PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ... }
  • 67. Akka Persistence Cassandra Replay def asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) } class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator [PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ... }
  • 68. Akka Persistence Cassandra Replay def asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) } class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator [PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ... }
  • 69. class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter() def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } } def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row } }
  • 70. class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter() def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } } def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row } }
  • 71. class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter() def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } } def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row } }
  • 72. Non blocking asynchronous replay private[this] val queries: CassandraReadJournal = new CassandraReadJournal( extendedActorSystem, context.system.settings.config.getConfig("cassandra-query-journal")) override def asyncReplayMessages( persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)(replayCallback: (PersistentRepr) => Unit): Future[Unit] = queries .eventsByPersistenceId( persistenceId, fromSequenceNr, toSequenceNr, max, replayMaxResultSize, None, "asyncReplayMessages") .runForeach(replayCallback) .map(_ => ())
  • 73. private[this] val queries: CassandraReadJournal = new CassandraReadJournal( extendedActorSystem, context.system.settings.config.getConfig("cassandra-query-journal")) override def asyncReplayMessages( persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)(replayCallback: (PersistentRepr) => Unit): Future[Unit] = queries .eventsByPersistenceId( persistenceId, fromSequenceNr, toSequenceNr, max, replayMaxResultSize, None, "asyncReplayMessages") .runForeach(replayCallback) .map(_ => ())
  • 74. Benchmarks 5000 10 000 15 000 20 000 25 000 30 000 35 000 40 000 5000 10 000 15 000 20 000 25 000 30 000 35 000 40 000 0 0 10 000 20 000 30 000 40 000 0 50 000 Time(s) Time(s) Time(s) Actors Threads, Actors Threads 20 40 60 80 100 120 1405000 10000 15000 20000 25000 30000 10 20 30 40 50 60 70 45 000 50 000 blocking asynchronous REPLAY STRONG SCALING WEAK SCALING
  • 75. node_id Alternative architecture 0 1 persistence_id 0, event 0 persistence_id 0, event 1 persistence_id 1, event 0 persistence_id 0, event 2 persistence_id 2, event 0 persistence_id 0, event 3
  • 76. persistence_id 0, event 0 persistence_id 0, event 1 persistence_id 1, event 0 persistence_id 2, event 0 persistence_id 0, event 2 persistence_id 0, event 3
  • 77. tag 1 0 allIds Id 0, event 1 Id 2, event 1 0 1 0 0 event 1event o
  • 78. node_id 0 1 Id 0, event 0 Id 0, event 1 Id 1, event 0 Id 0, event 2 Id 2, event 0 Id 0, event 3 Id 0, event 0 Id 0, event 1 Id 1, event 0 Id 2, event 0 Id 0, event 2 Id 0, event 3 tag 1 0 allIds Id 0, event 1 Id 2, event 1 0 1 0 0 event 0 event 1
  • 79. tag 1 0 allIds Id 0, event 1 Id 2, event 1 0 1 0 0 event 0 event 1 val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds) Future.sequence(boundStatements).flatMap { stmts => val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...) stmts.foreach(batch.add) session.underlying().flatMap(_.executeAsync(batch)) }
  • 80. tag 1 0 allIds Id 0, event 1 Id 2, event 1 0 1 0 0 event 0 event 1 val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds) Future.sequence(boundStatements).flatMap { stmts => val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...) stmts.foreach(batch.add) session.underlying().flatMap(_.executeAsync(batch)) }
  • 81. val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement) val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement) ... session.underlying().flatMap { s => val ebpResult = s.executeAsync(eventsByPersistenceIdStatement) val batchResult = s.executeAsync(batch)) ... } tag 1 0 allIds Id 0, event 1 Id 2, event 1 0 1 0 0 event 0 event 1
  • 82. val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement) val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement) ... session.underlying().flatMap { s => val ebpResult = s.executeAsync(eventsByPersistenceIdStatement) val batchResult = s.executeAsync(batch)) ... } tag 1 0 allIds Id 0, event 1 Id 2, event 1 0 1 0 0 event 0 event 1
  • 83. Event time processing ● Ingestion time, processing time, event time
  • 85. Ordering 10 2 1 12:34:57 1 KEY TIME VALUE 2 12:34:58 2 KEY TIME VALUE 0 12:34:56 0 KEY TIME VALUE
  • 86. 0 1 2 1 12:34:57 1 KEY TIME VALUE 2 12:34:58 2 KEY TIME VALUE 0 12:34:56 0 KEY TIME VALUE
  • 87. Distributed causal stream merging Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 node_id
  • 88. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 Id 0, event 0 node_id
  • 89. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 Id 0, event 0 node_id
  • 90. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 Id 0, event 0 node_id persistence _id seq 0 0 1 . . . 2 . . .
  • 91. persistence _id seq 0 1 1 . . . 2 . . . Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 0 node_id 0 1 Id 2, event 0 Id 0, event 0 Id 0, event 1 Id 0, event 3
  • 92. persistence _id seq 0 2 1 0 2 0 Id 0, event 1 Id 0, event 0 Id 1, event 0 node_id 0 1 Id 2, event 0 Id 0, event 0 Id 0, event 1 Id 0, event 2 Id 0, event 3 Id 2, event 0 Id 0, event 2 Id 1, event 0
  • 93. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 0, event 3 node_id Id 1, event 0 persistence _id seq 0 3 1 0 2 0
  • 94. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 node_id Id 1, event 0 0 0 Id 0, event 0 Id 0, event 1 Replay
  • 95. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 node_id Id 1, event 0 0 0 Id 0, event 0 Id 0, event 1
  • 96. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 1, event 0 0 0 Id 0, event 0 Id 0, event 1 node_id
  • 97. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 1 Id 2, event 0 Id 0, event 3 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 1, event 0 0 0 Id 0, event 0 Id 0, event 1 node_id persistence _id seq 0 2
  • 98. Id 0, event 2 Id 0, event 1 Id 0, event 0 Id 1, event 00 Id 2, event 0 Id 0, event 3 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 1, event 0 0 0 Id 0, event 0 Id 0, event 1 persistence _id seq 0 2 stream_id seq 0 1 1 2 1 node_id
  • 100. Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 0, event 3 Id 1, event 0
  • 101. Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 0, event 3 Id 1, event 0
  • 102. Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 0, event 3 Id 1, event 0 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 3 Id 1, event 0 ACK ACK ACK ACK ACK
  • 103. Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 0, event 3 Id 1, event 0 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 3 Id 1, event 0 ACK ACK ACK ACK ACK
  • 104. Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 2 Id 0, event 3 Id 1, event 0 Id 0, event 0 Id 0, event 1 Id 2, event 0 Id 0, event 3 Id 1, event 0 ACK ACK ACK ACK ACK
  • 105. Checkpoint data State Backend Source 1: 6791 Source 2: 7252 Source 3: 5589 Source 4: 6843 State 1: ptr 1 State 1: ptr 2 Sink 2: ack! Sink 2: ack!
  • 106. class KafkaSource(private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager]) extends TimeReplayableSource { def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = { fetch.setStartOffset(topicAndPartition, offsetManager.resolveOffset(time)) ... } def read(batchSize: Int): List[Message] def close(): Unit }
  • 107. class KafkaSource(private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager]) extends TimeReplayableSource { def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = { fetch.setStartOffset(topicAndPartition, offsetManager.resolveOffset(time)) ... } def read(batchSize: Int): List[Message] def close(): Unit }
  • 108. class DirectKafkaInputDStream[K, V, U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag, R]( _ssc: StreamingContext, val kafkaParams: Map[String, String], val fromOffsets: Map[TopicAndPartition, Long], messageHandler: MessageAndMetadata[K, V] => R ) extends InputDStream[R](_ssc) with Logging { override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = { val untilOffsets = latestLeaderOffsets(maxRetries) ... } }
  • 109. class DirectKafkaInputDStream[K, V, U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag, R]( _ssc: StreamingContext, val kafkaParams: Map[String, String], val fromOffsets: Map[TopicAndPartition, Long], messageHandler: MessageAndMetadata[K, V] => R ) extends InputDStream[R](_ssc) with Logging { override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = { val untilOffsets = latestLeaderOffsets(maxRetries) ... } }
  • 110. Exactly once delivery ● Durable offset 0 1 2 3 4
  • 111. 0 1 2 3 4
  • 112. 10 2 3 4
  • 117. val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") } private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken)) }
  • 118. val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") } private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken)) }
  • 119. val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") } private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken)) }
  • 120. override def getPreferredLocations(split: Partition): Seq[String] = split.asInstanceOf[CassandraPartition].endpoints.flatMap(nodeAddresses.hostNames).toSeq override def getPartitions: Array[Partition] = { val partitioner = CassandraRDDPartitioner(connector, tableDef, splitCount, splitSize) val partitions = partitioner.partitions(where) partitions } override def compute(split: Partition, context: TaskContext): Iterator[R] = { val session = connector.openSession() val partition = split.asInstanceOf[CassandraPartition] val tokenRanges = partition.tokenRanges val metricsUpdater = InputMetricsUpdater(context, readConf) val rowIterator = tokenRanges.iterator.flatMap( fetchTokenRange(session, _, metricsUpdater)) new CountingIterator(rowIterator, limit) }
  • 121. override def getPreferredLocations(split: Partition): Seq[String] = split.asInstanceOf[CassandraPartition].endpoints.flatMap(nodeAddresses.hostNames).toSeq override def getPartitions: Array[Partition] = { val partitioner = CassandraRDDPartitioner(connector, tableDef, splitCount, splitSize) val partitions = partitioner.partitions(where) partitions } override def compute(split: Partition, context: TaskContext): Iterator[R] = { val session = connector.openSession() val partition = split.asInstanceOf[CassandraPartition] val tokenRanges = partition.tokenRanges val metricsUpdater = InputMetricsUpdater(context, readConf) val rowIterator = tokenRanges.iterator.flatMap( fetchTokenRange(session, _, metricsUpdater)) new CountingIterator(rowIterator, limit) }
  • 122. object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, project @ Project(fields, grandChild)) if fields.forall(_.deterministic) => val aliasMap = AttributeMap(fields.collect { case a: Alias => (a.toAttribute, a.child) }) project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild)) } }
  • 123. object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, project @ Project(fields, grandChild)) if fields.forall(_.deterministic) => val aliasMap = AttributeMap(fields.collect { case a: Alias => (a.toAttribute, a.child) }) project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild)) } }
  • 124. Table and stream duality 1 4 3 5 2
  • 125. Table and stream duality 1 4 3 5 2 1 State X
  • 126. 1 Id 0 Event 1 Table and stream duality 1 4 3 5 2 1 State X Id 0 Event 2 Id 0 Event 1
  • 127. Snapshot for offset N Table and stream duality 1 4 3 5 2 1 Id 0 Event 1 1 State X Id 0 Event 2 Id 0 Event 1 4
  • 128. Table and stream duality Snapshot for offset N 1 4 3 5 2 1 Id 0 Event 1 1 State X Id 0 Event 2 Id 0 Event 1 4 N Id 0 Offset 123 State X Id 11 Offset 123 State X
  • 129. Cache / view / index / replica / system / service Continuous stream applying transformation function Updates to the source of truth data Original table Infinite streams application
  • 132. Client 1 Client 2 Client 3 Update Update Update Model devices Model devices Model devices Input data Input data Input data Parameter devices P ΔP ΔP ΔP
  • 133. Challenges ● All the solved problems ○ Exactly once delivery ○ Consistency ○ Availability ○ Fault tolerance ○ Cross service invariants and consistency ○ Transactions ○ Automated deployment and configuration management ○ Serialization, versioning, compatibility ○ Automated elasticity ○ No downtime version upgrades ○ Graceful shutdown of nodes ○ Distributed system verification, logging, tracing, monitoring, debugging ○ Split brains ○ ...
  • 134. Conclusion ● From request, response, synchronous, mutable state ● To streams, asynchronous messaging ● Production ready distributed systems
  • 135. MANCHESTER LONDON NEW YORK Questions
  • 136. MANCHESTER LONDON NEW YORK @zapletal_martin @cakesolutions 347 708 1518 [email protected] We are hiring http://www.cakesolutions.net/careers