The Feature Store and The Semantic Layer
The Feature Store and The Semantic Layer
Operational ML
with real-time data
Business Value
Operational ML
with historical data
Analytical ML
Semantic Layer
& Metrics Store
BI: AI:
DESCRIPTIVE & DIAGNOSTIC PREDICTIVE & PRESCRIPTIVE
ANALYTICS ANALYTICS
Gap between the MDS (SQL) and Data Science (Python) Worlds
Real-Time Features
ML Engineer
Data Scientist
kafka
Data Engineers
Kinesis Operational
SQL ML System
Metrics &
Event Based Data Extract Semantic Prototype
Transform Data Layer code
& Load
Warehouse
Analytical
Fivetran, Databricks, AtScale ML System
DBT
Matillion, Snowflake,BQ..
…
OLTP DB
Real-Time Features
ML Engineer
Kinesis
Operational
SQL ML System
Enterprise Data
Operational Data Enterprise AI
MDS not suitable today for Online Use Cases (Real-time ML)
Web App
Model
Predict Serving
Read
Features
Online
Real-Time Logs Feature
Write features
store
Streaming
Feature
Pipelines
Flink
Do I have to define Features
in the Feature Store?
Can Features be Metrics in the Semantic Layer?
Feature Engineering in the Semantic Layer / SQL
Redshift
(Amazon)
Snowflake HOPSWORKS
ADLS
(Azure)
Feature Store
Connectors
JDBC
(MySQL,
Postgres,
MongoDB)
SQL Aggregations
Feature Store
SQL Data Validation
Feature Pipeline
Dimensionality
Reductions
??
Normalization
One-hot encoding
Dimensionality Reduction
Feature Reuse
Big wins by reusing Features in many models
Reuse Features from Different Feature Groups with Feature Views
Feature Store
Feature Pipeline Transformations
Normalization
One-hot encoding
The feature store should help ensure there is no training/inference skew when applying transformations
Point-in-Time Correct SQL hard to write/debug/grok
Transformation
Functions
Feature Read
Group 1 Batch
Inference
Data
Train
Data
Feature View
Model
Applications
Development
-
Services
Online & Batch
Feature Stores & AI-Enabled Apps
Efficiency
At Scale
Open &
modular
www.hopsworks.ai