SlideShare a Scribd company logo
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar

More Related Content

PPTX
Apache Spark Core
PPTX
Azure data bricks by Eugene Polonichko
PDF
End-to-end Data Pipeline with Apache Spark
PDF
Microsegmentation from strategy to execution
PPTX
Kafka Streams for Java enthusiasts
PDF
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
PPTX
Extending Complex Event Processing to Graph-structured Information
PPT
Vertical vs Horizontal Scaling
Apache Spark Core
Azure data bricks by Eugene Polonichko
End-to-end Data Pipeline with Apache Spark
Microsegmentation from strategy to execution
Kafka Streams for Java enthusiasts
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Extending Complex Event Processing to Graph-structured Information
Vertical vs Horizontal Scaling

What's hot (20)

PPTX
Introduction to Kafka and Zookeeper
PDF
Scaling Apache Spark at Facebook
KEY
ElephantDB
PDF
DDD - 1 - A gentle introduction to Domain Driven Design.pdf
PDF
Hello, kafka! (an introduction to apache kafka)
KEY
Hybrid MongoDB and RDBMS Applications
PDF
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
PPTX
IPT Chapter 2 Web Services and Middleware - Dr. J. VijiPriya
PDF
Introduction to PySpark
PPTX
Apache kafka
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PPTX
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
PDF
Data Quality With or Without Apache Spark and Its Ecosystem
PPTX
mobile infrastructure management
PDF
IBM DataPower Gateway - Common Use Cases
PPTX
Programming in Spark using PySpark
PPTX
Overview - ESBs and IBM Integration Bus
PDF
Spark SQL Join Improvement at Facebook
PPTX
Windows Azure Virtual Machines
PPTX
Configuring Aerospike - Part 2
Introduction to Kafka and Zookeeper
Scaling Apache Spark at Facebook
ElephantDB
DDD - 1 - A gentle introduction to Domain Driven Design.pdf
Hello, kafka! (an introduction to apache kafka)
Hybrid MongoDB and RDBMS Applications
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
IPT Chapter 2 Web Services and Middleware - Dr. J. VijiPriya
Introduction to PySpark
Apache kafka
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Data Quality With or Without Apache Spark and Its Ecosystem
mobile infrastructure management
IBM DataPower Gateway - Common Use Cases
Programming in Spark using PySpark
Overview - ESBs and IBM Integration Bus
Spark SQL Join Improvement at Facebook
Windows Azure Virtual Machines
Configuring Aerospike - Part 2
Ad

Similar to Scylla @ Disney+ Hotstar (20)

PDF
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
PDF
Data Science Across Data Sources with Apache Arrow
PDF
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
PPTX
Data relay introduction to big data clusters
PDF
Amazed by AWS Series #4
PDF
Introduction to apache kafka, confluent and why they matter
PDF
Redis+Spark Structured Streaming: Roshan Kumar
PDF
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
PDF
Auto scaling with Ruby, AWS, Jenkins and Redis
PDF
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
PDF
Cisco’s E-Commerce Transformation Using Kafka
PPTX
Cloud-based Data Lake for Analytics and AI
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
PDF
Alternator webinar september 2019
PPTX
Episode 3: Kubernetes and Big Data Services
PDF
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
PDF
Using ScyllaDB for Extreme Scale Workloads
PDF
SysDB — The system management and inventory collection service
PDF
Automate Your Kafka Cluster with Kubernetes Custom Resources
PDF
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Data Science Across Data Sources with Apache Arrow
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data relay introduction to big data clusters
Amazed by AWS Series #4
Introduction to apache kafka, confluent and why they matter
Redis+Spark Structured Streaming: Roshan Kumar
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Auto scaling with Ruby, AWS, Jenkins and Redis
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Cisco’s E-Commerce Transformation Using Kafka
Cloud-based Data Lake for Analytics and AI
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Alternator webinar september 2019
Episode 3: Kubernetes and Big Data Services
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Using ScyllaDB for Extreme Scale Workloads
SysDB — The system management and inventory collection service
Automate Your Kafka Cluster with Kubernetes Custom Resources
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Chapter 2 Digital Image Fundamentals.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Per capita expenditure prediction using model stacking based on satellite ima...
madgavkar20181017ppt McKinsey Presentation.pdf
MYSQL Presentation for SQL database connectivity
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
Reach Out and Touch Someone: Haptics and Empathic Computing
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Chapter 2 Digital Image Fundamentals.pdf
NewMind AI Monthly Chronicles - July 2025
Transforming Manufacturing operations through Intelligent Integrations
Chapter 3 Spatial Domain Image Processing.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf

Editor's Notes

  • #4: Notes : Add graphic explaining the numbers Continue-Watching accounts for a huge percentage of watch-time for Hotstar. Everyday on an average our users watch 1B mins of video. Every day we process almost 100-200GB of data to support accurate state of Continue-Watching for over 300 million users. Our use-case mainly focused on a DB which can handle heavy writes due to volatile nature of the user-watching behaviour We also needed a DB which can scale enough during high traffic times when the request volume goes 10-20x within a minute.
  • #5: Notes : Add graphic explaining the numbers Disney+ Hotstar as an OTT platform requires a strong data-store to store Continue-Watching data. Continue-Watching accounts for a huge percentage of watch-time for Hotstar. Everyday on an average our users watch 1B mins of video. Every day we process almost 100-200GB of data to support accurate state of Continue-Watching for over 300 million users. Our use-case mainly focused on a DB which can handle heavy writes due to volatile nature of the user-watching behaviour We also needed a DB which can scale enough during high traffic times when the request volume goes 10-20x within a minute.
  • #6: Cross platform
  • #7: Next episode New episode
  • #8: Next episode New episode
  • #9: Redis Redis gave good latencies but the increase in data-size meant that we needed to horizontally scale our cluster which increased our cost every 3-4 months. Elastic-search Elastic-search latencies were on the higher end of 200ms on an average and cost of the DB is very high considering the returns and we often had issues with node maintenance and required manual effort to resolve the issues.
  • #10: ES doc → Redis → What is the problem in ES and Redis Graphically explain
  • #11: Multiple Data-stores : Redis, Elasticsearch and Scylla open-source Different Data-models Huge data in the order of TBs Cost of migration
  • #12: We chose to go with a NoSQL Key-Value data store We wanted to simplify the data model to only have two tables User Table -> used to retrieve the entire Tray at once for the user New movie added is appended to the list for the same user_id key User-Content Table: Used for modifying a specific content_id data Ex: when the user resumes the video and pauses it a later time -> updated timestamp is stored When the video is fully watched, the entry can be directly queried and deleted
  • #17: We use snapshots of Redis instead of exporting data to csv since we don’t want to put load on Redis machine. Exported data from Redis to rdb file Convert rdb file to CSV file Use COPY `table` FROM `csv` with DELIMITER=`,` AND CHUNKSIZE=1 Ran with 7 threads and completed 1M records in 15 mins Scaled up the number of threads, increased the number of boxes to speed up the process. Similar approach followed with Elastic-search
  • #18: How did we moved our prod APIs to point to the new cluster / new flow and terminate off the older ones -- add diagrams How did we master the migration ?
  • #19: Before moving to Scylla-cloud we initially moved our data to Scylla open-source. After we explored the advantages of Scylla-cloud and it is enriched support we decided to move our data to Scylla-cloud.
  • #20: Link the snapshot folder to Scylla-Data folder ln -s ../snapshot<i> <keyspace>/<table>
  • #21: Cons : We were able to migrate the table with a single primary key to the Scylla cloud. We ran in batches of 3 nodes at a time in-order to avoid our production open-source table getting affected. SSTable migration slowed down when we have a secondary / composite key. In-order to speed up the process we tried Scylla-Spark-Migrator Notes : We ran in batches to maintain less load on active cluster.
  • #22: Unirestore tool helps in automating this. Duplicated the data with replication factor = 1. Mention the price, it is a big-node to use.
  • #31: Append only writes Write is fast on cassandra / Scylla Happy blind write Aggregate on reads Reduce number of rows Remove duplicated contents Partial aggregation for less online computation Expiration Aggregated to buckets in months
  • #33: Timeline We started seeing high latency on scylla cluster We ran nodetool repair -pr on two of the old nodes sequentially, the second one stuck and it is terminated (https://docs.scylladb.com/kb/stop-local-repair/) at ~8pm IST We scaled up the cluster. adding 3 new nodes We tried repair one of the newly added table, the repair stuck, terminated it with https://docs.scylladb.com/kb/stop-local-repair/
  • #35: Tombstones will bite you if you do lots of deletes! OMG! We are the antipattern: https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
  • #36: Tombstones will bite you if you do lots of deletes! OMG! We are the antipattern: https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
  • #37: We observed that compactions were causing the latency spikes To not have latency spikes, we stopped auto compactions during the morning hours and enable major compaction daily early in the morning Goal is to have predictable latencies
  • #38: Operations & Compaction Operation: cluster is back after removing tombstone with Compaction
  • #39: Operations & Compaction