SPL Event-Time Processing in IBM Streams V4.3

IBM Watson – IBM Streams
© 2018 IBM Corporation
IBM Streams V4.3
SPL Event-Time Processing
Victor Dogaru
IBM Streams Development

IBM Confidential © 2018 IBM Corporation2 © 2018 IBM Corporation
Please note
▪ IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM’s sole discretion.
▪ Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
▪ The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract.
▪ The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
▪ Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will experience
will vary depending upon many factors, including considerations such as the amount of
multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and
the workload processed. Therefore, no assurance can be given that an individual user will
achieve results similar to those stated here.
2

Overview
▪ Use Case: How does this help Streams developers and users?
▪ Out of order streams and late data
▪ SPL Watermarks
▪ SPL event-time language definitions
– @eventTime annotation
– TimeInterval window
▪ SPL TimeInterval window and window panes
▪ SPL event-time functions
▪ Support for Java and C++ operators
3

Use Case
Streams application for monitoring device data
▪ The user has to write an application which
– Ingests timestamped events
– Calculates metrics of interest every 20 minutes, for events which occurred from 10:00 to 11:00,
10:20 to 11:20, 10:40 to 11:40, etc.
– Updates calculations if late events arrive after metrics were calculated
– Discards events if they arrive later than 6 hours after metrics were calculated

Use Case
Solution
▪ Designate SPL attribute for the event timestamp
– Add attribute to stream schemas from the data ingest point downstream
▪ Insert Aggregate operator with an event-time window that groups tuples into intervals based on
their event timestamp
▪ Designate the operator which generates watermarks (usually at data ingest point)
– Watermarks provide a time base for event-time streams
– The Streams runtime and operator logic ensures that the tuple order relative to watermarks is preserved
– For each operator, if inputs are not late then output should not be late with respect to the watermark
value
▪ As watermarks reach the event-time window, they trigger:
– Calculation of aggregate metrics
– Updates in case of late events
– Eviction for data beyond the discarding age horizon

But How About Out-of-Order Streams and Late Data?
▪ Event time is the time that an event happened in the real world
– Event-time timestamp is carried with the tuple
▪ Processing (or system) time is the time measured by a machine that processes the event
– Processing time is the machine time when the tuple is being processed
▪ Events are streamed out of order because of variable delay prior to data ingestion and within
Streams
– Some event producers are not always connected (sensors, mobile devices, etc.)
– Some event producers locally buffer data and emit their events in bursts
– Events and Tuples travel on different network paths
– Backpressure and queuing delays from the stream operators
6

Watermarks
▪ Watermarks provide a measure of event time progress in a data stream
– For an input stream and a Watermark with value X, all tuples with event time less than X have been received
– For an output stream and a Watermark with value X, all tuples with event time less than X have been submitted
▪ A Watermark is only an estimate of completeness
– Events with timestamps earlier than X may arrive after the Watermark X. These are late data.
7
1312
WM 14 WM 10WM 15
1112
Tuple Late Tuple WM Watermark
▪ The IBM Streams runtime broadcasts Watermarks downstream
– Ensures tuple order is maintained with respect to Watermarks
– Tuples derived from non-late inputs should not be submitted late
▪ A new “currentWatermark” operator custom metric displays the current watermark value in
milliseconds

SPL Event-time Language Definitions
@eventTime annotation
– Attribute name, resolution
– Watermark generation
Event-time stream schemas
contain the event-time
attribute
TimeInterval window
– Calculates aggregates for
defined time intervals
8
// Event-time source
@eventTime(eventTimeAttribute=et, lag=5.0, minimumGap=0.075)
stream<timestamp et, ...> Events = TCPSource()
{ ... }
. . .
// Event-time graph
stream<..., timestamp et, ...> B = MyOperator(A) {}
. . .
// Aggregate over event-time window
stream<timestamp et, ...> Out = Aggregate(In) {
window In : timeInterval, intervalDuration(3600.0),
creationPeriod(1200.0), discardAge(21600.0),
partitioned;
param partitionBy : a, b, c;
output Out :
timeStart = windowBegin(),
timeEnd = windowEnd(),
...
}

@eventTime Annotation
@eventTime(eventTimeAttribute=et, resolution=Nanoseconds, lag=5.0, minimumGap=0.075)
▪ Indicates that the annotated operator and all the downstream operators which are connected
via event-time streams participate in an event-time graph
– Connectivity extends only downstream
– Event-time ends at a sink or at an operator which does not output the event-time attribute
▪ Annotation elements
– eventTimeAttribute : name of the tuple attribute which represents the event time of the tuple
– Supported types: timestamp, uint64, int64
– resolution : time units of the event-time attribute values -- Milliseconds, Microseconds, Nanoseconds
– lag : duration in seconds between the maximum event-time of submitted tuples and the value of the watermark
– minimumGap : minimum event-time duration in seconds between subsequent watermarks
▪ The operator's watermark set to WM = max(event-time of processed tuples) – lag
9

TimeInterval Window
window In : timeInterval, intervalDuration(3600.0), creationPeriod(1200.0), intervalOffset(1800.0), discardAge(21600.0)
▪ Window options
– timeInterval : the window kind -- tuples are placed into panes which correspond to equal intervals in the event-
time domain
– intervalDuration : duration between the lower and upper interval endpoints
– creationPeriod : duration between the lower endpoint of consecutive intervals
– discardAge : duration between the point in time when a window pane becomes complete and the point in time
when the pane closes and does not accept late tuples. Panes are discarded after they close.
– intervalOffset : point in time value which coincides with an interval start time
▪ Window panes partition the event time domain into intervals of the form:
[N * creationPeriod + intervalOffset, N * creationPeriod + intervalDuration + intervalOffset)
▪ Value 0 represents the Unix epoch: 1970-01-01T00:00:00Z UTC
10

TimeInterval Window Panes
▪ TimeInterval Window manages a collection of window panes
▪ Each pane stores tuples for a fixed event-time interval
▪ Panes trigger when Watermark reaches the top of the interval
▪ Panes close and get discarded when they get older than the ‘discardAge’
▪ System creates new panes as specified by the ‘creationPeriod’
Example
– When Tuple(13:55) is received: Tuple is assigned to Pane D
– On Watermark(14:00): Pane D is complete and triggers, Pane A closes and gets discarded
– When late Tuple(12:45) is received: Tuple is assigned to Pane C, Pane C triggers (on the next Watermark)
11
timeInterval, intervalDuration(60.0), discardAge(180.0)
12:45
14:00 10:0011:0012:0013:0013:55
WM
14:00
D AC B
Arriving tuples

SPL Event-time Functions
timestamp windowBegin();
timestamp windowEnd();
<tuple T> timestamp getEventTime(T t);
timestamp toTimestamp(uint64 ticks, enum {Milliseconds, Microseconds,
Nanoseconds} resolution);
timestamp toTimestamp(int64 ticks, enum {Milliseconds, Microseconds, Nanoseconds}
resolution);
int64 int64TicksFromTimestamp(timestamp ts, enum {Milliseconds, Microseconds,
uint64 uint64TicksFromTimestamp(timestamp ts, enum {Milliseconds, Microseconds,
public uint64 paneIndex();
Sys.PaneTiming paneTiming();
12
Window intervals
Event-time
transformations
Window pane status

Support for primitive Java and C++ Operators
▪ New C++ windowing library classes for TimeInterval window
▪ Java and C++ primitive operators can explicitly set the operator’s Watermark value, or let the
system set it for them
13

SPL Event-Time Processing in IBM Streams V4.3

Recommended

More Related Content

Similar to SPL Event-Time Processing in IBM Streams V4.3 (20)

More from lisanl (20)

Recently uploaded (20)

SPL Event-Time Processing in IBM Streams V4.3