Trill: A High-Performance Incremental Query Processor For Diverse Analytics
Trill: A High-Performance Incremental Query Processor For Diverse Analytics
Microsoft Research
Contact: [email protected] Twitter: @badrishc
*
Current affiliation: Google. This work was performed at Microsoft Research.
Diverse Scenarios for Analytics
• Real-time
• Monitor app telemetry (e.g., ad clicks) &
raise alerts when problems are detected
Real-Time
• Progressive
• Non-temporal analysis (e.g., BI) over large dataset,
stream data, get quick approximate results
Interactive Query Authoring
Three Key Requirements
• Performance
Scenarios
• High throughput: critical for large offline datasets
• Low latency & overhead: Important for real time monitoring • monitor
telemetry & raise
alerts
• Fabric & language integration • correlate real-
• Cloud app/service acts as driver, uses the analytics engine time with logs
• develop initial
• Need rich data-types, integrate custom logic seamlessly
monitoring query
• back-test over
historical logs
• Query model • offline analysis
• Need to support real-time and offline data, temporal and (BI) with early
results
relational queries, early results for exploratory offline queries
Trill: Fast Streaming Analytics Library
• Performance
• 2-4 orders of magnitude faster than traditional SPEs
• For relational queries, comparable to best DBMS
• User-controlled latency specification
• explicit latency vs. throughput tradeoff
• Query model
• Extended LINQ syntax based on tempo-relational query model
• Supports broad & rich analytics scenarios (relational, progressive, time-based)
Trill’s Use Cases
• Azure Stream Analytics
Cloud service
• With Scope for Bing Ads
• With Orleans for Halo game
monitoring & debugging
• …
…
𝑜𝑝 2
• Users specify latency constraint
𝑜𝑝 1
(10 secs)
…
…
• Timestamps as arrays
• Bitvector to indicate row absence
class DataBatch { 𝑜𝑝 2
long[] SyncTime;
...
Bitvector BV;
𝑜𝑝 1 …
}
• One array per payload field
…
class UserData_Gen : DataBatch {
long[] c_ClickTime;
long[] c_User; timestamp payload columns
long[] c_AdId; bitvector
}
• Batch classes are generated & compiled on-the-fly (under the hood)
• Enables efficient QP & serialization
+ Fabric & Language Integration Application