alluxio storage big data data orchestration open source cloud machine learning distributed computing presto hybrid cloud spark model training cloud computing deep learning file system caching data management infrastructure artificial intelligence analytics summit gpu data platform memory llm data architecture tachyon project aws compute cloud storage ai multi cloud alluxio day performance hadoop hdfs separation of compute and storage hive s3 data data analytics apache spark distributed systems data loading gpu utilization kubernetes data engineering sql aws s3 distributed data caching distributed storage pytorch data lake alluxio engineering meetup model inference emr data locality ai infra object store tachyon uber architecture object stores tech talk release rocksdb tensorflow cache computer fuse fine tuning posix local cache intel google dataproc query engine data science nlp cloud bursting unified namespace metadata parquet data lakes orchestration facebook query trino ray cloud migration apache iceberg ml software use case apache hudi hybrid cloud bursting etl software development overview office hour compute storage separation demo python raft community software engineering ceph nvidia apache ozone database gpu analytics object storage memory centric olap scale tencent deep learning applications data ecosystem on-prem product release datasapiens under file system zero copy bursting analytics zoo scalable datalakes amazon emr nfs conference structured data management rakuten data stores confluent kyligence poshmark ecommerce presto caching deployment data stack grpc microsoft zookeeper memory-centric jd amazon web services data warehouse kafka baidu fluid product school alibaba infra preference tuning decoupling compute and storage amplab ctrip qunar mesosphere global namespace walmartlabs business intelligence data unification distributed query sogou pingo virtualization zero-copy burst qiniu ryte talking data security amazon elastic mapreduce tachyon nexus storage system tutorial bigdata unified developers strata developer in-memory storage datawarehouse 2.0 preview distributed system momo financial services multi-tiering search queries high performance computing framework computing generative ai cv api model traiing devops transparent uri analytics and ai cloud architecture twitter virtual file system apache ranger hybrid big data netapp bilibili data tagging open data platform metadata management shadow cache tiktok cache layer prometheus metrics grafana optane persistent memory raptorx disaggregated storage tableflow table rag apache pinot agentic ai engineering distributed cache pipeline model distribution deepleanring modeltraining machinelearning filesystem deepseek inferencing optimization cpu anyscale hpc nvme zoom user experience a/b testing experimentation testing statistics visual search milvus vector database rapids accelerator storagequery s3 api analytic workloads public cloud high performance high-performance metadata services structured data services catalog service spark workloads remote data software testing unified data zero copy hybrid bursting mapr cloud workloads dc/os object store analytics on-premise compute e-commerce datasets pipeline api usability concurrency iceberg netflix alibaba cloud gene computing data lake analytics dask aspect analytics webinar terraform eks t3go walkme unisound atlas starburst robinhood data catalog paypal gimel sql workloads jd.com distributed applications ing tech dataproc google cloud hybrid data lake helixa comcast china unicom aunalytics hub hybrid shannondb structured data
See more