0% found this document useful (0 votes)
7 views

final report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

final report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

END TO END AI SYSTEM

ARCHITECTURE

By:
Shivanshu Gupta
Sumanth Sonnathi
Table of Contents
Design Choice and Tradeoff Analysis for AI Architecture

1. Data Formats 1
JSON 2
Text vs Binary 2
Row Major Vs Column Major 3

2. Data Storage and Processing 4


Transactional and Analytical Processing 4
ETL (Extract, Transform and Load) 5

3. Flow of Data 5
Through Database 6
Through Services 6
Through Real Time Transport 6

4. Model Development 8
Evaluate Model 8
Experiment Tracking 11
Versioning 12
Batch Vs Online Prediction 12

5. Model Deployment 14
Model Infrastructure 15
Development Environment 15
Containers 15
Resource Management 16
ML Platform 17

Building End to End AI Architecture 19


Design Choices for End-to-End AI
System Architecture:
1. Data Formats
Managing and storing data from diverse sources can be a complex and expensive task. It is
crucial to consider your future data usage requirements and design a format that aligns with
your needs. By proactively planning the data format, you can optimize storage efficiency and
ensure data accessibility and usability.

There are two questions you want to answer while thinking of data storage:

1) How do I store my data such that it is cheap and fast to access?


2) How do I store data that may come in different formats like images and text?

There are different types of formats to choose from like JSON, CSV, parquet, Avro, pickle etc.
When you are selecting a data format, you should consider different characteristics such as
human readability, or whether it is text or binary which affects the size of the file. You should
also consider its access patterns.

Access patterns in the context of data formats refer to how data is typically accessed or queried
by users or applications. It involves understanding the specific ways in which the data will be
read, processed, and analyzed. Access patterns can be sequential, random or index based.

Format Binary/Text Human-readable Example use case

JSON Text Yes Everywhere

CSV Text Yes Everywhere

Pickle Binary No Python

Parquet Binary No Hadoop, Amazon


Redshift

Let’s look at some of these formats starting with JSON.

1
JSON

Pro: It is human readable, capable of handling data of different levels of structuredness. Most
programming languages can parse JSON.

Con: Once you commit, It will be very painful to change the schema. It is a text file, will take alot
of space.

Text Vs Binary Format:


Text files are readable but binary files are more compact. For e.g. Let’s say you want to store
the number 1234567. If you store this number in a text file, it will require 7 bytes because it has
seven characters, and each character is 1 byte. But if you store this as binary format as int32, it
will only take 32 bits or 4 bytes.

AWS recommends using the parquet format because it is up to 2X faster to unload and
consumes 6X less storage in Amazon S3 as compared to text format.

2
Row Major or Column Major:
The way memory stores data is different for different formats. For e.g., in a CSV file,
consecutive elements in a row are stored next to each other but in a Parquet file consecutive
elements in a column are stored next to each other.

So, if you want to access more examples of the data CSV is a better option but if you want to
access a large number of features Parquet will be the better option.

For example, your data has 1000s of features, but you want only 5, you will be able to access
through a parquet file very easily but in the case of a CSV file it will read all the row data first
then filter out the five required columns.

And If you want to keep adding examples to your data row major formats will perform faster data
writes.

This proves CSV dataframe when accessed via row is much faster than parquet file.

3
2. Data Storage and Processing:
Data storage engines, often referred to as databases, play a crucial role in determining how
data is stored and accessed within a computing system. These databases are designed with
optimization for both transactional processing and analytical processing in mind.

Transactional and Analytical Processing:

Transactions in business and e-commerce applications where multiple concurrent transactions,


such as data insertion, retrieval, modification, and deletion, need to be processed quickly and
reliably. This type of processing is known as online transaction processing (OLTP).

Transactional DB has low latency and high availability requirements as they often involve users.
That is why they need to be processed fast. Transactional DBs are often row majors. However,
OLTP systems may not be the most efficient choice for certain types of queries, such as
aggregating data or performing complex analytics. So, if you want to calculate the average price
of all rides in Mohali for the month of September, an OLTP database may not provide optimal
performance for this type of question.

For these kinds of questions, we have to rely on Analytical DBs. Analytical DBs require
aggregating column data across multiple rows. These types of processing are called online
analytical processing (OLAP).

Both the terms OLAP and OLTP are outdated.

Figure 1 Google Trends analysis of OLTP and OLAP

4
Separation of Databases like transactional and analytical has been closed due to technology.
Now there are transactional databases which can do analytical query and analytical databases
which can do transactional queries.

CockroachDB - It is a transactional database which can handle analytical queries.


Apache Iceberg and DuckDB - Analytical DB which can handle transactional queries.

ETL: Extract, Transform Load

Some companies find it difficult to keep the data structured so they thought why not store all the
data in the data lake and then process it from there so they don’t have to deal if the schema
changes. This process of loading the data into storage first and processing it later is sometimes
called ETL.

Figure 2: Overview of ETL process

However, as the data keeps on growing, it eventually becomes infeasible. Vendors offer hybrid
solutions that combine the flexibility of data lakes and data management aspects of data
warehouses. For example: Databricks and Snowflake both provide data lakehouse solutions.

5
3. Flow of Data:
When data flows from one process to another, we call it dataflow. There are three ways through
which data can flow:
1) Through Databases.
2) Through Services using requests such as REST APIs (POST/GET
request).
3) Through real time like Apache Kafka, Amazon Kinesis.
1) Through Databases:

This is the easiest way to pass data however it doesn’t always work. We have to read/write from
databases which can be very slow due to latency requirements which is undesirable for user
facing applications.

For example, to pass data from process A to process B, process A can write that data into a
database, and process B simply reads from that database.

2) Through Services

Second method is request driven data flow where to pass the data from process A to process B,
we send the data through a network.

For example: in the Uber application, each service will send request to two other services.

Figure 3: Example of request based architecture of Uber services

Services directly request data from each other, interservice data passing can become complex
and lead to bottlenecks, especially in large-scale systems. Synchronous requests can cause
delays and failures if services are unavailable or unresponsive.

To address these challenges, an event-driven architecture with a broker, such as a real-time


transport or event bus, can be used.

6
3) Data Passing through Real time transport.

Services communicate with the broker instead of directly requesting data from other services.
Each service can publish events to the broker, and other interested services can subscribe to
receive those events. This approach reduces the complexity of interservice data passing and
improves scalability and fault tolerance.

Using a real-time transport as the broker enables fast and efficient data passing among
services. Instead of relying on databases, which can introduce latency, in-memory storage is
used for event broadcasting and retrieval. This event-driven architecture is well-suited for data-
heavy systems that require real-time data updates and low latency.

Figure 4: Event driven architecture of Uber services

Real-time transports encompass pubsub and message queue models. Pubsub allows services
to publish events to various topics, which subscribed services can consume. Data retention
policies and storage options like Amazon S3 are common in pubsub solutions.

In contrast, message queues ensure targeted delivery of messages to specific consumers.


Apache Kafka, Amazon Kinesis, Apache RocketMQ, and RabbitMQ are prominent examples of
these real-time transport solutions.

7
Figure 5 Screenshot from stackshare (https://stackshare.io/kafka)

4. Model Development
Here we will discuss topics which will help you to develop your model including how you will
evaluate your model, experiment tracking and versioning.

Evaluate ML model:

If you have unlimited computation power and time, you should try all the models and see what’s
best for you. But that is not the case. So, the best option is to be strategic about what models
you select.

Deep learning has gained a lot of attention lately because some AI tool being released every
single day, but it will not replace ML algorithms anytime soon because most recommendation
systems rely on collaborative filtering and matrix factorization.

Steps for model selection:

1) Avoid Trends

Many individuals are inclined to adopt the latest state-of-the-art models, assuming that they
offer the optimal solution for their problem. The reasoning behind this is that newer models are
expected to outperform older ones.

8
However, it's crucial to recognize that the latest cutting-edge algorithm may not necessarily be
practical in terms of speed or cost for implementation. Additionally, its performance on specific
data sets is not guaranteed.

Therefore, prioritizing a solution that is cost-effective and straightforward becomes more


significant than relying solely on state-of-the-art models.

2) Start simple.

Simple model has three key benefits. First, it is easier to deploy. Second, it is easier to
understand the math behind it and third, it can serve as a baseline to which you can compare
other models.

Opting for a simpler model does not necessarily imply less effort. Similarly, leveraging a pre-
trained model can be a beneficial approach, although it may present challenges when
attempting further development.

For instance, starting with a pre-trained BERT model, which is highly complex, allows for a quick
start with minimal effort. However, enhancing or refining such a model can prove to be quite
challenging.

3) Avoid Bias

Most important part of evaluating a model is to experiment with different features and different
sets of hyperparameters to find the best model for your data. If the data scientist is excited
about the data, he is much more likely to spend more time experimenting which might result in
better performing models for their company’s architecture.

Performance of the model depends upon the context it is evaluated on like the training data,
testing data, hyper parameters etc. A model might perform better in one context, but it is unlikely
to perform better in all the contexts.

4) Performance: Now Vs Later

A tree-based model might work now because you don’t have tons of data now but two months
down the line you might be able to collect a lot more data and in that case neural networks
might perform better.

Best way to estimate performance of your model is to plot training loss, training accuracy,
validation accuracy against the number of training samples.

While the learning curve cannot provide an exact estimation of the performance improvement
that can be achieved with additional training data, it can provide an indication of whether any
performance gain can be expected from increasing the training data.

9
Figure 6 The learning curves of a naive Bayes model and SVM model. Source: Sklearn library (https://rb.gy/mdqpw)

In the above graph, training score and test score converges if we increase the training data.

5) Measure Tradeoffs:

A typical trade-off scenario involves managing the balance between false positives and false
negatives. For instance, in applications like fingerprint unlocking, where preventing unauthorized
access is crucial, a model that minimizes false positives would be preferred.

Conversely, in tasks where false negatives are more problematic than false positives, such as
cancer screening (where patients with cancer should not be misclassified as cancer-free), a
model that reduces false negatives would be preferable.

6) Understand Assumptions:

Every model comes with their own assumption. It is important to understand what their
assumptions are and work around it.

Assumptions below are not exhaustive. It is just for the demonstrations. Here are some common
assumptions:

a) Prediction assumption: Every model that seeks to predict an output Y based on an input
X assumes that such prediction is feasible.

10
b) Independent and Identically Distributed (IID): Neural networks assume that the examples
used for training are drawn independently from the same underlying distribution.

c) Smoothness: Supervised machine learning methods assume that there exists a set of
functions that can transform inputs into outputs in a way that similar inputs lead to similar
outputs. If an input X yields an output Y, a similar input would yield a proportionally
similar output.

d) Tractability: Generative models operate under the assumption that it is computationally


viable to calculate the probability P(Z|X) of the latent representation Z given the input X.

e) Boundaries: Linear classifiers make the assumption that decision boundaries can be
represented as linear functions.

f) Conditional independence: Naive Bayes classifiers assume that, given the class,
attribute values are independent of each other.

g) Normal distribution: Many statistical methods make the assumption that data follows a
normal distribution.

Experiment tracking:
The process of tracking the progress and results of an experiment is called experiment tracking.
Experiment tracking is babysitting where you have to observe the learning process of a model.

It is crucial to closely monitor the learning process to address potential issues and assess the
model's progress. Various problems can occur, such as stagnant loss, overfitting, underfitting,
fluctuating weights, dead neurons, and memory constraints. Tracking the training process helps
in identifying and resolving these problems and evaluating the model's effectiveness.

Initially, tracking only focused on loss and speed, but over time, ML practitioners started
monitoring numerous aspects, leading to intricate and comprehensive experiment tracking
boards. Some key elements to consider tracking during each experiment include:

● Loss curves for training and evaluation datasets.


● Performance metrics (e.g., accuracy, F1, perplexity) on non-test datasets.
● Logs of samples, predictions, and ground truth labels for ad hoc analysis and sanity
checks.
● Model speed in terms of steps or token processing per second.
● System performance metrics like memory usage and CPU/GPU utilization.
● Changes in relevant parameters and hyperparameters affect model performance, such
as learning rate, gradient norms, and weight norm.

11
While tracking everything is ideal, it can be overwhelming in practice. Balancing important
metrics and avoiding distractions is crucial. Experiment tracking facilitates comparisons between
different experiments, helping to understand the impact of specific changes in components.
Simple approaches involve making copies of code files and logging outputs, while third-party
tools provide enhanced dashboards and sharing capabilities with colleagues.

Versioning:
The practice of recording comprehensive details of an experiment to potentially replicate it in the
future or compare it with other experiments is known as experiment versioning. It is essential for
avoiding issues caused by undocumented changes in code and data when replicating ML
experiments. While code versioning has become a common practice, data versioning is
frequently overlooked due to the challenges it presents.

Data is larger than code, making traditional line-by-line comparison impractical. Data versioning
tools like DVC register a diff based on checksum changes or file additions/removals. Aggressive
experiment tracking helps reproducibility, but non determinism introduced by frameworks and
hardware can hinder replication. As the field progresses, a deeper understanding of models
may reduce the need for extensive experimentation, leading to more efficient model
development.

Batch Vs Online Prediction:


When building a system, one important decision that impacts both end users and developers is
how predictions are generated and served: online or batch. Online prediction, also known as on-
demand prediction, involves generating and returning predictions immediately upon request.
This is commonly done through RESTful APIs. On the other hand, batch prediction involves
periodically generating predictions and storing them for retrieval as needed. Batch prediction is
often asynchronous, and predictions are computed from stored data.

Figure 7: Architecture for Batch prediciton

In online prediction, both batch features (computed from historical data) and streaming features
(computed from real-time data) can be utilized. For example, when estimating delivery time for
an order on DoorDash, batch features like the mean preparation time of the restaurant in the
past can be combined with streaming features such as the number of recent orders and
available delivery personnel.

12
Figure 8 Architecture of online prediciton

It's important to note that "streaming features" and "online features" are not interchangeable
terms. Online features encompass any features used for online prediction, including batch
features stored in memory. Streaming features specifically refer to features computed from
streaming data.

Figure 9 Architecture of streaming prediction

A common type of batch feature used for online prediction is item embeddings, which are
precomputed in batch and fetched when needed. These embeddings can be considered online
features but not streaming features, as streaming features exclusively pertain to features
computed from streaming data.

Batch prediction is useful for processing accumulated data when you don’t need immediate
results (such as recommender systems).

13
Online prediction is useful when predictions are needed as soon as a data sample is generated
(such as fraud detection).

Online prediction and batch prediction are commonly used together in various applications. For
instance, food ordering apps like DoorDash and UberEats employ batch prediction to generate
restaurant recommendations due to the large number of restaurants. However, when users click
on a specific restaurant, online prediction is used to generate food item recommendations

5. Model Deployment
When an application receives a prediction request from a user, it can be sent to an endpoint that
is exposed by the system. This endpoint then returns the prediction to the user. Setting up a
basic deployment for this workflow can be accomplished relatively quickly if you are familiar with
the necessary tools.

However, there are several challenges involved in deploying and maintaining a production-
ready system. These challenges include ensuring that the model is accessible to a large
number of users with minimal latency and high availability, setting up a robust infrastructure for
immediate error notifications, troubleshooting and identifying issues when they occur, and
deploying updates seamlessly to address any problems.

These tasks require careful planning, expertise in infrastructure setup, and effective monitoring
and maintenance strategies to ensure optimal performance and user satisfaction.

Myth 1: Only one- or two-ML model can be deployed

Contrary to the myth that only one- or two-ML models are deployed at a time, companies
actually employ numerous ML models for various applications. In reality, applications often
require multiple models to address different features and tasks. For example, a ride-sharing app
like Uber relies on models for ride demand, driver availability, estimated arrival time, dynamic
pricing, fraud detection, customer churn, and more. Furthermore, if the app operates in multiple
countries, each country may require its own set of models. Companies like Uber, Google, and
Booking.com have thousands of models in production, highlighting the extensive use of ML
models across different industries.

Myth 2: Model performance remains the same over a long period of time

Contrary to the misconception that model performance remains consistent over time, ML
systems can experience degradation referred to as "software rot" or "bit rot." Furthermore,
changes in data distribution can also affect performance, as the model encounters different data
in production compared to its training data, resulting in a decline in performance over time.

Myth 3: Don’t need update if my model works

14
Instead of asking how often models should be updated, the focus should be on how quickly they
can be updated. Model performance deteriorates over time, making fast updates crucial.
Drawing from DevOps practices, organizations like Etsy, Netflix, and AWS have embraced
frequent updates to their systems. While many companies still update models monthly or
quarterly, Weibo, Alibaba, and ByteDance have achieved remarkable iteration cycles as short
as 10 minutes. The goal is to bring new models into production as swiftly as possible, as
emphasized by industry experts.

Model Infrastructure:
Layers of ML infrastructure:

Figure 10 Layer of ML infrastructure

Storage and Compute


The storage layer can be hosted either in an on-premises private data center or in the cloud.
While some companies used to manage their own storage infrastructure, the trend over the past
decade has been to commoditize storage and migrate it to the cloud. With the decreasing cost
of data storage, many companies now store all their data without much concern for expenses.

On the other hand, the compute layer refers to the available computing resources and the
mechanisms to utilize them. The scalability of workloads depends on the amount of compute
resources accessible. Think of the compute layer as the engine that executes various tasks. It
can range from a simple setup with a single CPU or GPU core for computations to more
advanced cloud compute solutions like AWS Elastic Compute Cloud (EC2) or GCP provided by
cloud providers.

Development Environment:
An essential aspect of the development environment is its standardization. It should be
configured to include all the necessary tools that facilitate engineers' tasks. This includes
incorporating tools for effective versioning. Currently, companies adopt various tools to version
their ML workflows, such as Git for code version control, DVC for data versioning, Weights &
Biases or Comet.ml for experiment tracking during development, and MLflow for tracking model
artifacts during deployment.

15
Containers:
During development, a fixed number of machines or instances are typically used. In production,
where workloads can fluctuate and be unpredictable, autoscaling is commonly employed.
However, setting up new instances with the required tools and packages remains a concern.

Docker addresses this issue by providing a Dockerfile, which contains instructions for recreating
an environment. Running these instructions generates a Docker image, and running the image
creates a Docker container.

Docker images can be built from scratch or based on existing images, such as those provided
by NVIDIA for TensorFlow. Container registries like Docker Hub and AWS ECR are used to
share and access Docker images.

Resource Management:
Resource management in the pre-cloud era involved maximizing limited storage and compute
resources, often at the expense of other applications. However, in the cloud, the focus shifted to
using resources cost-effectively rather than maximizing utilization. Cloud environments allow for
elastic resources without impacting other applications, simplifying resource allocation.

Companies prioritize adding resources if it enhances engineers' productivity and justifies the
added cost. Investing in workload automation, even if it reduces resource efficiency, allows
engineers to focus on higher-return tasks. This section explores resource management for ML
workflows, primarily in cloud-based environments but with relevance to private data centers as
well.

Cron, Schedulers and Orchestrators:


ML workflows have two key characteristics: repetitiveness and dependencies. ML workloads are
often repetitive, such as regularly training models or generating predictions. Cron is commonly
used to schedule repetitive jobs at fixed times. However, cron lacks the ability to handle
complex dependencies between jobs. ML workflows involve steps that depend on the success
of previous steps.

For example, pulling data, extracting features, training models, and deploying the best model
based on comparison. Conditional dependencies exist where actions depend on the outcomes
of preceding steps. Managing resources effectively requires considering both repetitiveness and
dependencies in ML workflows.

Schedulers handle dependencies and schedule jobs based on predefined criteria, such as time
or event triggers. They allocate resources and optimize resource utilization for running jobs.
Schedulers like Slurm enable job scheduling with specifications for job name, execution time,
memory, and CPU allocation.

16
Orchestrators, on the other hand, manage resources and deal with lower-level abstractions like
machines, instances, and clusters. Kubernetes is a popular orchestrator used for container
orchestration, often provided as a managed service by cloud providers. While schedulers and
orchestrators can have overlapping functionality, they can also be used independently.
Examples of orchestrators include HashiCorp Nomad, Airflow, Argo, Prefect, and Dagster.

Workflow management:
Workflow management tools such as Airflow, Argo, Prefect, Kubeflow, and Metaflow play a role
in managing workflows by allowing users to define their workflows as directed acyclic graphs
(DAGs). These tools incorporate schedulers that focus on the entire workflow rather than
individual jobs. Workflows can be defined using code (Python) or configuration files (YAML),
and each step in the workflow is considered a task.

These tools typically work in conjunction with an orchestrator to allocate resources for executing
the defined workflows. In essence, they combine the functionalities of schedulers and
orchestrators to manage and execute complex workflows.

Kubeflow and Metaflow are workflow management tools that simplify running workflows in both
development (dev) and production (prod) environments by abstracting away infrastructure
boilerplate code.

They allow data scientists to leverage the full computing power of the prod environment from
local notebooks, enabling the use of the same code in both environments. Kubeflow integrates
with Kubernetes (K8s) and utilizes KubeFlow Pipelines built on top of Argo, while Metaflow can
be used with AWS Batch or K8s.

Both tools are dynamic and fully parameterized, but Metaflow offers a superior user experience.
In Metaflow, requirements for each step can be specified using a Python decorator, eliminating
the need for Dockerfiles or YAML files. It seamlessly supports working with different
environments within the same workflow, enabling efficient experimentation and scaling with
large datasets in the cloud.

ML Platform:
The ML platform includes components such as model development, model store, and feature
store. ML platforms vary between companies, but considerations when evaluating tools include
compatibility with cloud providers or on-premises data centers and whether the tool is open
source or a managed service.

Open-source solutions offer more control but require maintenance, while managed services
may have data privacy implications and compliance considerations.

17
Model Deployment:
After training a model, the next step is to make its predictions accessible to users. Deployment
services aid in pushing models and dependencies to production and exposing them as
endpoints. Major cloud providers like AWS, GCP, Azure, and Alibaba offer deployment tools
such as SageMaker, Vertex AI, Azure ML, and Machine Learning Studio. Startups like MLflow
Models, Seldon, Cortex, and Ray Serve also provide deployment solutions.

It's crucial to consider the ease of performing both online and batch prediction with the tool.
Ensuring model quality before deployment is an open problem, and deployment services should
facilitate testing techniques like shadow deployment, A/B testing etc.

Model Store:
To effectively address issues with models, it's essential to track and store comprehensive
information associated with them. Storing just the model itself is insufficient. Key artifacts to
store include the model definition, model parameters, features and predict functions,
dependencies, data used for training, model generation code, experiment artifacts, and tags for
model discovery and filtering.

These artifacts provide crucial insights for debugging and maintenance. Storing them becomes
even more critical when the data scientist who created the model is unavailable or unable to
access the code. However, many companies store only a subset of these artifacts, and their
storage locations may be scattered across different platforms.

Model stores, like MLflow, are popular solutions, but challenges remain in efficiently storing and
accessing artifacts. Future advancements are needed to provide comprehensive and efficient
model store solutions, ensuring easy tracking and maintenance of models in production.

Feature Store:
A feature store is a solution that helps address three main problems in machine learning:
feature management, feature transformation, and feature consistency. It enables teams to share
and discover features, manage access roles, and store feature computation results.

Feature stores unify the logic for both batch and streaming features, ensuring consistency
between training and inference. They can act as a feature catalog, data warehouse, and
perform feature validation.

Popular feature store solutions include Amundsen, DataHub, Feast (for batch features), Tecton
(for batch and online features), and offerings from platforms like SageMaker and Databricks.

18
Building an End-to-End AI System
Architecture
Compute Layer:
The computing layer plays a crucial role in converting raw data into meaningful features. This
layer can be classified into two categories based on the frequency of updates: stream
computing for real-time intervals and batch computing for regular intervals.

The input data for the computing layer is sourced from event-based databases like Apache
Kafka or Kinesis, and OLAP databases like Apache Hive or Snowflake.

Store Layer
The store layer is where feature definitions are registered and deployed into the feature store. It
also handles backfilling, which involves rebuilding features using historical data when a new
feature is defined.

Kafka employs a backup mechanism to store events beyond its retention period, using storage
solutions like S3 or Hive tables.

An intermediate layer, consisting of Hive and Kafka, acts as a buffer between the computing
and store layers. This decoupling provides benefits such as increased robustness in case of
failures, scalability of individual components, reduced energy requirements, and the ability to
experiment with new technologies without disrupting existing infrastructure.

Centralized DB:
The centralized DB layer serves as the interface for data scientists to present feature-ready data
to the online and offline feature stores. The online feature store enables real-time record lookup
with low latency and high availability, while the offline feature store acts as a secure and
scalable repository of all feature data, allowing for the creation of training, validation, and batch-
scoring datasets. The feature stores synchronize with each other periodically to avoid training-
serving skew.

Model Training
The model training layer involves extracting training data from the offline feature store, ensuring
no data leakage through point-in-time queries. It also incorporates a model-retraining feedback
loop to address concept drift and maintain model accuracy.

Model Deployment

In the model deployment phase, a cloud-based scoring service is used for real-time data
serving, integrating with the feature store.

19
Feature Engineering
Workflow Feature Definition Feature
Monitoring

Model
Training

Data Sources

You might also like