SlideShare a Scribd company logo
Architect a Heterogeneous Data Platform Across Clusters,
Regions, and Clouds
Bin Fan (binfan@alluxio.com)
Founding Engineer, VP of Open Source @ Alluxio
ALLUXIO 2
About Me
2
Bin Fan (https://www.linkedin.com/in/bin-fan/)
● Founding Engineer, VP Open Source @ Alluxio
● Alluxio PMC Co-Chair, Presto TSC/committer
● Email: binfan@alluxio.com
● PhD in CS @ Carnegie Mellon University
● Originally a research project (Tachyon) in UC Berkeley AMPLab led by by-then PHD student
Haoyuan Li (Alluxio founder CEO)
● Backed by top VCs (e.g., Andreessen Horowitz) with $70M raised in total, Series C ($50M)
announced in 2021
● Deployed in production at large scale in Facebook, Uber, Microsoft, Tencent, Tiktok and etc
● More than 1200 Contributors on Github. In 2021, more than 40% commits in Github were
contributed by the community users
● The 9th most critical Java-based Open-Source projects on Github by Google/OpenSSF[1]
Alluxio Overview
ALLUXIO 3
[1] Google Comes Up With A Metric For Gauging Critical Open-Source Projects
Alluxio (Tachyon) back in 2015
Screenshot of Tachyon talk at AMPLab back in 2015
What is Tachyon Stack Release Growth
5
AMPLab活动上Tachyon演讲的截图
Alluxio (Tachyon) in 2015
Spark Task1 Spark Task 2
HDFS / Amazon S3
HDFS
disk
block 1
block 3
block 2
block 4
Tachyon
in-memory
RDD
Topology
● On-prem Hadoop → Cloud-native, Multi- or Hybrid-cloud,
Multi-datacenter
Computation
● MR/Spark → Spark, Presto, Hive, Tensorflow, Pytorch ….
● More mature frameworks (less frequent OOM etc)
Data access pattern
● Sequential-read (e.g., scanning) on unstructured files → Ad-hoc
read into structured/columnar data
● Hundred to thousand of big files → millions of small files
Whatʼs Different Today
ALLUXIO 6
Data Storage
● On-prem & colocated HDFS → S3 !!! and other object stores
(possibly across regions like us-east & us-west),
and legacy on-prem HDFS in service
Resource/Job Orchestration
● YARN → K8s
○ Lost focus on data locality
The Evolution from Hadoop to Cloud-native Era
ALLUXIO 7
Unprecedented Complexity of Data Platforms
8
Data Trend Complex Platform
New compute and storage tech
created every 3-8 years
On-premise, cloud, hybrid,
multi-cloud environments all have
different environment properties
More data generated every day,
and stored in data silos
Data copies, synchronization costs
More people and teams need to
access and leverage these data
Multiple APIs necessitate
integration and application rewrites
8
Inefficient Manual Copy Across Data Centers, Regions, Clouds
v
REGION A
v
REGION B
REGION A REGION B
PRIVATE DATA
CENTERS
Amazon
EMR
Cloud
Dataproc
Kubernetes
Engine
Compute
Engine
Hive
DATACENTER 2
DATACENTER 1
ERROR PRONE AND
NETWORK INTENSIVE
DATA COPIES
9
9
Strong Market Demand For Simplification
Acceleration &
auto-tiering of remote
data sources
EFFICIENT ACCESS &
DATA MANAGEMENT
Agility across regions for
private, hybrid or
multi-cloud
ENVIRONMENT
AGNOSTICITY
Serve analytics & AI from
multiple data locations
UNIFICATION OF
DATA LAKES
≈
10
Analytics & AI
in the Hybrid & Multi-Cloud Era
Available:
11
No-copy data access across silos
agnostic to compute engine
Foundation of a heterogeneous data
platform across geos
SOLUTION
≈
Multi-Cloud Ready Analytics & AI Platform
v
REGION A
v
REGION B
REGION A REGION B
GKE
DATACENTER 2
DATACENTER 1
HMS
12
ALLUXIO 12
Open Source Started From UC Berkeley AMPLab in 2014
Join the
conversation on
Slack
alluxio.io/slack
1,200+ contributors
& growing
9,000+ Slack
Community Members
Top 10 Most Critical Java
Based Open Source Project
GitHub’s Top 100 Most
Valuable Repositories
Out of 96 Million
COMPANIES USING ALLUXIO
INTERNET
PUBLIC CLOUD PROVIDERS
GENERAL
E-COMMERCE
OTHERS
TECHNOLOGY FINANCIAL SERVICES
TELCO & MEDIA
LEARN MORE
14
ALLUXIO 15
Examples to eliminate data copies
Case Studies
15
Top Online Travel Platform: Unify Data Lake Across Multiple Geo Regions in AWS
Problems Encountered Alluxio’s Solution Results Achieved
Data silos caused by different
brands/teams ingesting data
dispersed across multiple
regions in AWS
Central analytics query across
data silos suffered from poor
user experience and long time to
insight
Manual replication resulted in
inefficiencies, operational
overheads and expensive S3
egress cost
Enhanced user experience with
consistent & high performance
analytics, reducing time to
insights
50%
Reduced cost per query
Unify data silos without the
need to copy or move data
Federate Data Lakes without Replication & Serve Varies Compute Engines
v
TEAM A
v
TEAM B
TEAM C MAIN REGION: CENTRAL ANALYTICS
us-west-1
us-east-1
us-east-2
us-west-2
Hive
Mounted
ALLUXIO 17
Real-time responses & analysis, while
saving costs on S3 storage
Problems Encountered Alluxio’s Solution Results Achieved
2-4x
Average Performance
Improvement
7x
Key Query Speed-up
At Least 50% Cost Saving
Newly introduced chatbot to better
manage communication with gamers
globally
Presto engine performs huge
amounts of queries to support
instantaneous responses
Urgently looking for a new solution to
slash costs without losing
performance
PUBLIC CLOUD
Large Scale Analytics within a Single Cloud
ALLUXIO 18
A Typical Customer Journey
Example Journeys
18
Unified
namespace
Mount HDFS and object
storage into a common
Alluxio cluster
1
Object store
analytics
Caching layer to speed up
Presto and Spark Jobs
2
Hybrid-cloud
Burst Compute to a single
public cloud first
Run analytics on K8s and
cloud native AI for model
training
3
Multi-cloud
Replicate setup on AWS to
Google Cloud
Choose the right tool for the
job, regardless of the cloud
provider
4
EXAMPLE JOURNEY 01
On-premises HDFS to Object Storage to Hybrid Cloud
19
EXAMPLE JOURNEY 01
On-premises object storage as the source of truth
v
REGION A
v
REGION B
MULTIPLE INSTANCES
PRIVATE DATA
CENTERS
Amazon
EMR
Cloud
Dataproc
Kubernetes
Engine
Compute
Engine
DATACENTER
INGESTION ETL
Hive
20
ALLUXIO 21
Enable a Hybrid Data Lake
Architecture Overview
21
ALLUXIO 22
ARCHITECTURE
Alluxio
Master
Consensus
Standby
Master
WAN
Alluxio
Worker
RAM / SSD / HDD
Alluxio
Worker
RAM / SSD / HDD
…
…
Under Store 1
Under Store 2
Control Path
Data Path
Alluxio
Client
Alluxio
Client
22
DATA LOCALITY WITH SCALE-OUT WORKERS
Local performance for remote data with intelligent multi-tiering
Hot Warm Cold
RAM SSD HDD
Read & Write Buffering
Transparent to App
Policies for pinning,
promotion/demotion,TTL
On-premises
Public Cloud
Model
Training
Big Data ETL
Big Data Query
23
Synchronization of changes across clusters
Old File at path
/file1 ->
New File at path
/file1 ->
Alluxio Master
Policies for pinning,
promotion/demotion,TTL
Metadata Synchronization
Mutation
On-premises
Public Cloud
Model
Training
Big Data ETL
Big Data Query RAM SSD
METADATA LOCALITY WITH SCALEABLE MASTERS
RocksDB
24
SEAMLESS CATALOG DEFINITIONS
No table redefinitions required using “Transparent URI”
Example Scenario
I. Initial state
A. Data in HDFS
B. Hive Metastore table definitions pointing to HDFS
II. Compute cluster with Alluxio
A. Catalog points to Hive Metastore
B. Alluxio intercepts Presto calls to HDFS
III. Query execution
A. Accesses to HDFS are served by Alluxio
B. No manual data copies or application re-writes
Presto Catalog
Hive
Metastore
Hive Connector
hdfs://ns/table
1.
1I.
Presto
Alluxio
III.
Public Cloud
On-premise
s Hive
Metastore
HDFS
25
Spark
Alluxio
Storage
Co-locate Alluxio Workers with compute for
optimal I/O performance
Remote cluster
Same cluster
Spark
Alluxio
Storage
Deploy Alluxio as standalone cluster
between compute and Storage
Remote cluster
Same data center / region
Presto
DEPLOYMENT APPROACHES
26
Long-running Instances Ephemeral Elastic
26
UNIFIED NAMESPACE
With Replication & Live Data Migration Capabilities
hdfs://host:port/directory/
Reports Sales
• Single Alluxio path backed by multiple storage systems
• Example policy: Migrate data older than 7 days from HDFS to S3
27
POLICY DRIVEN DATA MANAGEMENT
Decouple logical file system namespace from physical storage
Alluxio
Master
Alluxio Policy Engine
Example Policy
Move files older than 90
days from HDFS to S3
Application
Apps access the same path regardless
of where the actual data is stored
28
Alluxio Proprietary and Confidential ALLUXIO 29
Training & Data Pre-processing
ML/DL
29
I/O Challenges in ML/DL
ALLUXIO 30
Training data often
consists of a
massive amount of
small files (billions
of 100KB photos)
Size of training
data keeps
growing & can
exceed individual
server capacity.
Training jobs are
highly concurrent,
require high I/O to
keep GPU utilized
Whatʼs Different
30
Using Alluxio for DL
Alluxio
Server
Alluxio
Server ...
Training Instances
POSIX POSIX POSIX
ALLUXIO 31
- Only fetch data on on cache miss
- No need to copy data before use
Distributed Caching
31
Consistent
Performance
Direct access to
data
Low latency and
high throughput
High GPU
utilization rate
ALLUXIO 32
Using Alluxio for DL
Distributed Caching
32
MOMONASDAQ:MOMO
runs thousands of Alluxio nodes across multiple Alluxio clusters,
managing more than 100+ TB data for search and training:
● Support multiple storage & compute frameworks.
● Accelerate compute & training tasks
● Reduce the metadata and data overhead
Model Training using PyTorch + Alluxio + Ceph
● 2 billion small files
● Reduce metadata & data interactions with Ceph to improve performance
33
https://www.alluxio.io/resources/videos/ml-and-query-acceleration-at-momo-with-alluxio-chinese/
Large Scale Deep Learning
TOPOLOGY: ON-PREMISES
Alluxio’s Solution
33
Twitter.com/alluxio
Linkedin.com/alluxio
Website
www.alluxio.io
Slack
https://alluxio.io/slack
@
Social Media
Q&A
34
ALLUXIO 35
Hybrid Cloud Storage Gateway with Compute On-prem
USE CASE: HYBRID CLOUD
ALLUXIO 36
• 6+ Alluxio Clusters in production
• Largest Alluxio Cluster = 1000 nodes*
• 2.5x performance improvement for IO intensive
queries & 1.2x on average
• 30% reduction in query failures due to timeouts
* Largest single Alluxio cluster across any customer is 3000 nodes
Multiple Analytics Clusters On-premises
USE CASE: ALL IN DATACENTER
Spark SQL
Alluxio
Impala
ALLUXIO 37
Cross Datacenter Access without changing Ingest Pipeline
USE CASE: MULTI DATACENTER
Trino
Alluxio
DATACENTER 1
a
DATACENTER 2
Hive
REMOTE DATA RESULTS
• Ad-hoc SQL workloads in a secondary DC as analyst
headcount reached 1800 people
• Leverage a 220+ node Alluxio cluster for compute resources
outside primary DC
ALLUXIO 38
• 40%+ reduction in AI training time & cost
• Data prefetching using asynchronous loading
• 200 GPU instances with 4x NVIDIA V100
• Alluxio uses CPU cores and NVMe
Large Scale Deep Learning
USE CASE: ALL IN CLOUD
ALLUXIO 39
Compute in GCP with Data On-premises
USE CASE: HYBRID CLOUD
• 2x Performance Improvement
For range queries
• Improved Concurrency and Pinning
• Elastic compute for up to 2x cost savings
• Auto-scaling of Alluxio workers
v
ALLUXIO 40
Burst Compute to AWS with Storage On-prem
USE CASE: HYBRID CLOUD
• 4+ Alluxio clusters on-prem with
synchronization requirements
• Alluxio as the only way to read or
write to on-prem data lake
• Compute stack cloned in second CSP
v
Amazon
EMR
Cloud
Dataproc
Kubernetes
Engine
Compute
Engine
PRIVATE
DATACENTER
INGESTION ETL
Hive
ALLUXIO 41
Shared Previously
• 40%+ reduction in training stage time & cost
over direct access to cloud storage
Whatʼs New in 2.7
• Optimal resource utilization with NVIDIA Data
Loading Library (DALI) + Alluxio
• 8-12x performance improvement in data loading
and preprocessing stages
• I/O and training can now execute in parallel,
eliminating serialization delays caused by the
copy-to-local approach
Large Scale Deep Learning
USE CASE: ALL IN CLOUD
Distributed
Deep
Learning
ALLUXIO 42
WeRide uses Alluxio as a Hybrid Cloud Storage Gateway
USER STORY: HYBRID CLOUD
Alluxio
ON PREMISE
PUBLIC CLOUD
• Network egress cost savings with cross-region access over
data copy-based solutions
• Multiple locations with GPU clusters access a centralized
data lake in AWS for training autonomous driving
• Terabytes of data generated daily from simulations & test
drives shared across regions
GPU training
ALLUXIO 43
Cross Datacenter Access without changing Ingest Pipeline
USE CASE: MULTI DATACENTER
Trino
Alluxio
DATACENTER 1
a
DATACENTER 2
Hive
REMOTE DATA RESULTS
• Ad-hoc SQL workloads in a secondary DC as analyst
headcount reached 1800 people
• Leverage a 220+ node Alluxio cluster for compute resources
outside primary DC
ALLUXIO 44
GPU Accelerated
Analytics
Alluxio and RAPIDS Accelerator for Apache Spark
A Collaboration between Alluxio and NVIDIA
Integration of RAPIDS on GPUs for compute acceleration and Alluxio for data acceleration
70 % better ROI for GPU-based processing compared to CPUs
1.9x better performance for a decision support workload
v
Cloud
Dataproc
Spark
GPU Enabled Cluster
BENCHMARKING RESULTS
90 NVIDIA DECISION SUPPORT QUERIES
CPU Config: Master: n1-standard-4, Worker: 4 x n1-standard-32 (128 cores, 480GB RAM), Cloud Costs: $7.82/hr
GPU Config: Master: n1-standard-4, Worker: 4 x n1-standard-32 (128 cores, 480GB RAM + 16 x T4), Cloud Costs: $13.41/hr
Ad

More Related Content

Similar to Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds (20)

Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio, Inc.
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
 
Accelerating Cloud Training With Alluxio
Accelerating Cloud Training With AlluxioAccelerating Cloud Training With Alluxio
Accelerating Cloud Training With Alluxio
Alluxio, Inc.
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
Alluxio, Inc.
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio, Inc.
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
Alluxio, Inc.
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio, Inc.
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory Speed
Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Data EcoSystem 2.0
Data EcoSystem 2.0Data EcoSystem 2.0
Data EcoSystem 2.0
Alluxio, Inc.
 
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio, Inc.
 
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
Alluxio, Inc.
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native Era
Alluxio, Inc.
 
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioThe Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
Shubham Tagra
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio, Inc.
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
 
Accelerating Cloud Training With Alluxio
Accelerating Cloud Training With AlluxioAccelerating Cloud Training With Alluxio
Accelerating Cloud Training With Alluxio
Alluxio, Inc.
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
Alluxio, Inc.
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio, Inc.
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
Alluxio, Inc.
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio, Inc.
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory Speed
Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio, Inc.
 
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
Alluxio, Inc.
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native Era
Alluxio, Inc.
 
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioThe Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
Shubham Tagra
 

More from Alluxio, Inc. (20)

How Coupang Leverages Distributed Cache to Accelerate ML Model Training
How Coupang Leverages Distributed Cache to Accelerate ML Model TrainingHow Coupang Leverages Distributed Cache to Accelerate ML Model Training
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio, Inc.
 
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
 
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and FinetuneAI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio:  Preprocessing, ...AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio:  Preprocessing, ...
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
Alluxio, Inc.
 
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber ScaleAI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
 
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio, Inc.
 
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference StackAI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
Alluxio, Inc.
 
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio, Inc.
 
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio Webinar | Accelerate AI: Alluxio 101Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio, Inc.
 
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AIAI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Alluxio, Inc.
 
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
AI/ML Infra Meetup | Big Data and AI, Zoom DevelopersAI/ML Infra Meetup | Big Data and AI, Zoom Developers
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Alluxio, Inc.
 
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
Alluxio, Inc.
 
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
Alluxio, Inc.
 
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
Alluxio, Inc.
 
AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...
AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...
AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...
Alluxio, Inc.
 
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Alluxio, Inc.
 
AI/ML Infra Meetup | Preference Tuning and Fine Tuning LLMs
AI/ML Infra Meetup | Preference Tuning and Fine Tuning LLMsAI/ML Infra Meetup | Preference Tuning and Fine Tuning LLMs
AI/ML Infra Meetup | Preference Tuning and Fine Tuning LLMs
Alluxio, Inc.
 
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio, Inc.
 
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
How Coupang Leverages Distributed Cache to Accelerate ML Model TrainingHow Coupang Leverages Distributed Cache to Accelerate ML Model Training
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio, Inc.
 
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
 
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and FinetuneAI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio:  Preprocessing, ...AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio:  Preprocessing, ...
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
Alluxio, Inc.
 
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber ScaleAI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
 
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio, Inc.
 
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference StackAI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
Alluxio, Inc.
 
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio, Inc.
 
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio Webinar | Accelerate AI: Alluxio 101Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio, Inc.
 
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AIAI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Alluxio, Inc.
 
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
AI/ML Infra Meetup | Big Data and AI, Zoom DevelopersAI/ML Infra Meetup | Big Data and AI, Zoom Developers
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Alluxio, Inc.
 
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
Alluxio, Inc.
 
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
Alluxio, Inc.
 
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
Alluxio, Inc.
 
AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...
AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...
AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...
Alluxio, Inc.
 
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Alluxio, Inc.
 
AI/ML Infra Meetup | Preference Tuning and Fine Tuning LLMs
AI/ML Infra Meetup | Preference Tuning and Fine Tuning LLMsAI/ML Infra Meetup | Preference Tuning and Fine Tuning LLMs
AI/ML Infra Meetup | Preference Tuning and Fine Tuning LLMs
Alluxio, Inc.
 
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio, Inc.
 
Ad

Recently uploaded (20)

LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Ad

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

  • 1. Architect a Heterogeneous Data Platform Across Clusters, Regions, and Clouds Bin Fan ([email protected]) Founding Engineer, VP of Open Source @ Alluxio
  • 2. ALLUXIO 2 About Me 2 Bin Fan (https://www.linkedin.com/in/bin-fan/) ● Founding Engineer, VP Open Source @ Alluxio ● Alluxio PMC Co-Chair, Presto TSC/committer ● Email: [email protected] ● PhD in CS @ Carnegie Mellon University
  • 3. ● Originally a research project (Tachyon) in UC Berkeley AMPLab led by by-then PHD student Haoyuan Li (Alluxio founder CEO) ● Backed by top VCs (e.g., Andreessen Horowitz) with $70M raised in total, Series C ($50M) announced in 2021 ● Deployed in production at large scale in Facebook, Uber, Microsoft, Tencent, Tiktok and etc ● More than 1200 Contributors on Github. In 2021, more than 40% commits in Github were contributed by the community users ● The 9th most critical Java-based Open-Source projects on Github by Google/OpenSSF[1] Alluxio Overview ALLUXIO 3 [1] Google Comes Up With A Metric For Gauging Critical Open-Source Projects
  • 4. Alluxio (Tachyon) back in 2015 Screenshot of Tachyon talk at AMPLab back in 2015 What is Tachyon Stack Release Growth
  • 5. 5 AMPLab活动上Tachyon演讲的截图 Alluxio (Tachyon) in 2015 Spark Task1 Spark Task 2 HDFS / Amazon S3 HDFS disk block 1 block 3 block 2 block 4 Tachyon in-memory RDD
  • 6. Topology ● On-prem Hadoop → Cloud-native, Multi- or Hybrid-cloud, Multi-datacenter Computation ● MR/Spark → Spark, Presto, Hive, Tensorflow, Pytorch …. ● More mature frameworks (less frequent OOM etc) Data access pattern ● Sequential-read (e.g., scanning) on unstructured files → Ad-hoc read into structured/columnar data ● Hundred to thousand of big files → millions of small files Whatʼs Different Today ALLUXIO 6
  • 7. Data Storage ● On-prem & colocated HDFS → S3 !!! and other object stores (possibly across regions like us-east & us-west), and legacy on-prem HDFS in service Resource/Job Orchestration ● YARN → K8s ○ Lost focus on data locality The Evolution from Hadoop to Cloud-native Era ALLUXIO 7
  • 8. Unprecedented Complexity of Data Platforms 8 Data Trend Complex Platform New compute and storage tech created every 3-8 years On-premise, cloud, hybrid, multi-cloud environments all have different environment properties More data generated every day, and stored in data silos Data copies, synchronization costs More people and teams need to access and leverage these data Multiple APIs necessitate integration and application rewrites 8
  • 9. Inefficient Manual Copy Across Data Centers, Regions, Clouds v REGION A v REGION B REGION A REGION B PRIVATE DATA CENTERS Amazon EMR Cloud Dataproc Kubernetes Engine Compute Engine Hive DATACENTER 2 DATACENTER 1 ERROR PRONE AND NETWORK INTENSIVE DATA COPIES 9 9
  • 10. Strong Market Demand For Simplification Acceleration & auto-tiering of remote data sources EFFICIENT ACCESS & DATA MANAGEMENT Agility across regions for private, hybrid or multi-cloud ENVIRONMENT AGNOSTICITY Serve analytics & AI from multiple data locations UNIFICATION OF DATA LAKES ≈ 10
  • 11. Analytics & AI in the Hybrid & Multi-Cloud Era Available: 11
  • 12. No-copy data access across silos agnostic to compute engine Foundation of a heterogeneous data platform across geos SOLUTION ≈ Multi-Cloud Ready Analytics & AI Platform v REGION A v REGION B REGION A REGION B GKE DATACENTER 2 DATACENTER 1 HMS 12 ALLUXIO 12
  • 13. Open Source Started From UC Berkeley AMPLab in 2014 Join the conversation on Slack alluxio.io/slack 1,200+ contributors & growing 9,000+ Slack Community Members Top 10 Most Critical Java Based Open Source Project GitHub’s Top 100 Most Valuable Repositories Out of 96 Million
  • 14. COMPANIES USING ALLUXIO INTERNET PUBLIC CLOUD PROVIDERS GENERAL E-COMMERCE OTHERS TECHNOLOGY FINANCIAL SERVICES TELCO & MEDIA LEARN MORE 14
  • 15. ALLUXIO 15 Examples to eliminate data copies Case Studies 15
  • 16. Top Online Travel Platform: Unify Data Lake Across Multiple Geo Regions in AWS Problems Encountered Alluxio’s Solution Results Achieved Data silos caused by different brands/teams ingesting data dispersed across multiple regions in AWS Central analytics query across data silos suffered from poor user experience and long time to insight Manual replication resulted in inefficiencies, operational overheads and expensive S3 egress cost Enhanced user experience with consistent & high performance analytics, reducing time to insights 50% Reduced cost per query Unify data silos without the need to copy or move data Federate Data Lakes without Replication & Serve Varies Compute Engines v TEAM A v TEAM B TEAM C MAIN REGION: CENTRAL ANALYTICS us-west-1 us-east-1 us-east-2 us-west-2 Hive Mounted
  • 17. ALLUXIO 17 Real-time responses & analysis, while saving costs on S3 storage Problems Encountered Alluxio’s Solution Results Achieved 2-4x Average Performance Improvement 7x Key Query Speed-up At Least 50% Cost Saving Newly introduced chatbot to better manage communication with gamers globally Presto engine performs huge amounts of queries to support instantaneous responses Urgently looking for a new solution to slash costs without losing performance PUBLIC CLOUD Large Scale Analytics within a Single Cloud
  • 18. ALLUXIO 18 A Typical Customer Journey Example Journeys 18
  • 19. Unified namespace Mount HDFS and object storage into a common Alluxio cluster 1 Object store analytics Caching layer to speed up Presto and Spark Jobs 2 Hybrid-cloud Burst Compute to a single public cloud first Run analytics on K8s and cloud native AI for model training 3 Multi-cloud Replicate setup on AWS to Google Cloud Choose the right tool for the job, regardless of the cloud provider 4 EXAMPLE JOURNEY 01 On-premises HDFS to Object Storage to Hybrid Cloud 19
  • 20. EXAMPLE JOURNEY 01 On-premises object storage as the source of truth v REGION A v REGION B MULTIPLE INSTANCES PRIVATE DATA CENTERS Amazon EMR Cloud Dataproc Kubernetes Engine Compute Engine DATACENTER INGESTION ETL Hive 20
  • 21. ALLUXIO 21 Enable a Hybrid Data Lake Architecture Overview 21
  • 22. ALLUXIO 22 ARCHITECTURE Alluxio Master Consensus Standby Master WAN Alluxio Worker RAM / SSD / HDD Alluxio Worker RAM / SSD / HDD … … Under Store 1 Under Store 2 Control Path Data Path Alluxio Client Alluxio Client 22
  • 23. DATA LOCALITY WITH SCALE-OUT WORKERS Local performance for remote data with intelligent multi-tiering Hot Warm Cold RAM SSD HDD Read & Write Buffering Transparent to App Policies for pinning, promotion/demotion,TTL On-premises Public Cloud Model Training Big Data ETL Big Data Query 23
  • 24. Synchronization of changes across clusters Old File at path /file1 -> New File at path /file1 -> Alluxio Master Policies for pinning, promotion/demotion,TTL Metadata Synchronization Mutation On-premises Public Cloud Model Training Big Data ETL Big Data Query RAM SSD METADATA LOCALITY WITH SCALEABLE MASTERS RocksDB 24
  • 25. SEAMLESS CATALOG DEFINITIONS No table redefinitions required using “Transparent URI” Example Scenario I. Initial state A. Data in HDFS B. Hive Metastore table definitions pointing to HDFS II. Compute cluster with Alluxio A. Catalog points to Hive Metastore B. Alluxio intercepts Presto calls to HDFS III. Query execution A. Accesses to HDFS are served by Alluxio B. No manual data copies or application re-writes Presto Catalog Hive Metastore Hive Connector hdfs://ns/table 1. 1I. Presto Alluxio III. Public Cloud On-premise s Hive Metastore HDFS 25
  • 26. Spark Alluxio Storage Co-locate Alluxio Workers with compute for optimal I/O performance Remote cluster Same cluster Spark Alluxio Storage Deploy Alluxio as standalone cluster between compute and Storage Remote cluster Same data center / region Presto DEPLOYMENT APPROACHES 26 Long-running Instances Ephemeral Elastic 26
  • 27. UNIFIED NAMESPACE With Replication & Live Data Migration Capabilities hdfs://host:port/directory/ Reports Sales • Single Alluxio path backed by multiple storage systems • Example policy: Migrate data older than 7 days from HDFS to S3 27
  • 28. POLICY DRIVEN DATA MANAGEMENT Decouple logical file system namespace from physical storage Alluxio Master Alluxio Policy Engine Example Policy Move files older than 90 days from HDFS to S3 Application Apps access the same path regardless of where the actual data is stored 28
  • 29. Alluxio Proprietary and Confidential ALLUXIO 29 Training & Data Pre-processing ML/DL 29
  • 30. I/O Challenges in ML/DL ALLUXIO 30 Training data often consists of a massive amount of small files (billions of 100KB photos) Size of training data keeps growing & can exceed individual server capacity. Training jobs are highly concurrent, require high I/O to keep GPU utilized Whatʼs Different 30
  • 31. Using Alluxio for DL Alluxio Server Alluxio Server ... Training Instances POSIX POSIX POSIX ALLUXIO 31 - Only fetch data on on cache miss - No need to copy data before use Distributed Caching 31
  • 32. Consistent Performance Direct access to data Low latency and high throughput High GPU utilization rate ALLUXIO 32 Using Alluxio for DL Distributed Caching 32
  • 33. MOMONASDAQ:MOMO runs thousands of Alluxio nodes across multiple Alluxio clusters, managing more than 100+ TB data for search and training: ● Support multiple storage & compute frameworks. ● Accelerate compute & training tasks ● Reduce the metadata and data overhead Model Training using PyTorch + Alluxio + Ceph ● 2 billion small files ● Reduce metadata & data interactions with Ceph to improve performance 33 https://www.alluxio.io/resources/videos/ml-and-query-acceleration-at-momo-with-alluxio-chinese/ Large Scale Deep Learning TOPOLOGY: ON-PREMISES Alluxio’s Solution 33
  • 35. ALLUXIO 35 Hybrid Cloud Storage Gateway with Compute On-prem USE CASE: HYBRID CLOUD
  • 36. ALLUXIO 36 • 6+ Alluxio Clusters in production • Largest Alluxio Cluster = 1000 nodes* • 2.5x performance improvement for IO intensive queries & 1.2x on average • 30% reduction in query failures due to timeouts * Largest single Alluxio cluster across any customer is 3000 nodes Multiple Analytics Clusters On-premises USE CASE: ALL IN DATACENTER Spark SQL Alluxio Impala
  • 37. ALLUXIO 37 Cross Datacenter Access without changing Ingest Pipeline USE CASE: MULTI DATACENTER Trino Alluxio DATACENTER 1 a DATACENTER 2 Hive REMOTE DATA RESULTS • Ad-hoc SQL workloads in a secondary DC as analyst headcount reached 1800 people • Leverage a 220+ node Alluxio cluster for compute resources outside primary DC
  • 38. ALLUXIO 38 • 40%+ reduction in AI training time & cost • Data prefetching using asynchronous loading • 200 GPU instances with 4x NVIDIA V100 • Alluxio uses CPU cores and NVMe Large Scale Deep Learning USE CASE: ALL IN CLOUD
  • 39. ALLUXIO 39 Compute in GCP with Data On-premises USE CASE: HYBRID CLOUD • 2x Performance Improvement For range queries • Improved Concurrency and Pinning • Elastic compute for up to 2x cost savings • Auto-scaling of Alluxio workers
  • 40. v ALLUXIO 40 Burst Compute to AWS with Storage On-prem USE CASE: HYBRID CLOUD • 4+ Alluxio clusters on-prem with synchronization requirements • Alluxio as the only way to read or write to on-prem data lake • Compute stack cloned in second CSP v Amazon EMR Cloud Dataproc Kubernetes Engine Compute Engine PRIVATE DATACENTER INGESTION ETL Hive
  • 41. ALLUXIO 41 Shared Previously • 40%+ reduction in training stage time & cost over direct access to cloud storage Whatʼs New in 2.7 • Optimal resource utilization with NVIDIA Data Loading Library (DALI) + Alluxio • 8-12x performance improvement in data loading and preprocessing stages • I/O and training can now execute in parallel, eliminating serialization delays caused by the copy-to-local approach Large Scale Deep Learning USE CASE: ALL IN CLOUD Distributed Deep Learning
  • 42. ALLUXIO 42 WeRide uses Alluxio as a Hybrid Cloud Storage Gateway USER STORY: HYBRID CLOUD Alluxio ON PREMISE PUBLIC CLOUD • Network egress cost savings with cross-region access over data copy-based solutions • Multiple locations with GPU clusters access a centralized data lake in AWS for training autonomous driving • Terabytes of data generated daily from simulations & test drives shared across regions GPU training
  • 43. ALLUXIO 43 Cross Datacenter Access without changing Ingest Pipeline USE CASE: MULTI DATACENTER Trino Alluxio DATACENTER 1 a DATACENTER 2 Hive REMOTE DATA RESULTS • Ad-hoc SQL workloads in a secondary DC as analyst headcount reached 1800 people • Leverage a 220+ node Alluxio cluster for compute resources outside primary DC
  • 45. Alluxio and RAPIDS Accelerator for Apache Spark A Collaboration between Alluxio and NVIDIA Integration of RAPIDS on GPUs for compute acceleration and Alluxio for data acceleration 70 % better ROI for GPU-based processing compared to CPUs 1.9x better performance for a decision support workload v Cloud Dataproc Spark GPU Enabled Cluster
  • 46. BENCHMARKING RESULTS 90 NVIDIA DECISION SUPPORT QUERIES CPU Config: Master: n1-standard-4, Worker: 4 x n1-standard-32 (128 cores, 480GB RAM), Cloud Costs: $7.82/hr GPU Config: Master: n1-standard-4, Worker: 4 x n1-standard-32 (128 cores, 480GB RAM + 16 x T4), Cloud Costs: $13.41/hr