SlideShare a Scribd company logo
Movile is the industry leader for development of mobile 
content and commerce platforms in Latin America. With 
products for mobile phones, smartphones and tablets. 
Games, on-line education, entertainment apps for adults and 
kids and many options for buying with confidence and 
comfort. All of that comes to you through Movile. 
For companies, Movile delivers complete products, 
integrating transactions in M-Payments and content 
distribution, fast and with quality.
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bumping up Startup Business with Apache
Subscription and Billing Platform a.k.a SBS 
• It is a service API 
• responsible to manage users subscriptions 
• charge users in carriers 
• renew subscriptions 
• “can not” stop anyway 
• should be as performatic as possible
Some platform numbers 
Renewal Engine: ~ 52,1 
million of billing tries a 
day 
• about 603 request/s 
• 1,5 billion billing tries 
per month 
50 million 
subscriptions 
~ 50 request/s 
Operations: 
★ subscribe 
★ cancel 
★ profile
Data Distribution 
Subscriptions by Country 
1% 2% 
4% 
25% 
68% 
Others 
Colombia 
Argentina 
Mexico 
Brazil
Platform Architecture 
“There isn’t just one way to state a system’s architecture; rather, there are 
multiple architectures in a system, and the view of what is architecturally 
significant is one that can change over a system’s lifetime.” - Patterns of 
Enterprise Application Architecture 
Martin Fowler
Basic Architecture Design
• scalabe 
• high availability 
• high performance
Very High Usage 
• veryyyyy slow... system response 
• overall throughput decreased 
• low availability, single point of 
failure 
• Even worse than stopping is to 
only work sometimes
Improved Distributed Design 
A Cassandra Based Solution 
• the operations are 
distributed across the 
nodes 
• we achieved linear 
scalability
Improved Distributed Design 
A Cassandra Based Solution 
• the performance issues 
were solved 
• the availability has improved 
• there is no longer a single 
point of failure
C* Data Modeling 
• Dernormalization: Writes are cheap, reads are expensive, so insert data in every arrangement that 
you need to read 
• Don't be afraid of denormalization 
• There are different ways to model your solution, there is no right or wrong way 
• plan your queries and how you need to get the information before modeling. Use it as driver for 
modeling decisions
Data Model V1 
CREATE TABLE subscription ( 
subscription-id text PRIMARY KEY, 
phone-number text, 
config-id int, 
… 
enabled boolean, 
creation-date timestamp 
); 
CREATE TABLE user_subscriptions ( 
phone-number text, 
subscription-id text, 
PRIMARY KEY (phone-number, subscription-id) 
);
Data Model V1 
user_subscriptions 
phone-number subscription-id 
551900001212 subs-093123 
551911114567 subs-002202 
551911114567 subs-002203 
551911114567 subs-002204 
subscriptions 
subscription-id phone-number config-id . . . enabled creation-date 
subs-093123 551900001212 342 . . . true 2013-08-01 
subs-002202 551911114567 567 . . . false 2014-06-27 
subs-002203 551911114567 678 . . . true 2014-07-05 
subs-002204 551911114567 654 . . . true 2014-08-07
Data Model V1 – Quering Profile 
user_subscriptions 
phone-number subscription-id 
551900001212 subs-093123 
551911114567 subs-002202 
551911114567 subs-002203 
551911114567 subs-002204 
#cql> 
_ 
1st step 
• check the index table to get the ids of 
subscriptions for a given user
Data Model V1 – Quering Profile 
#cql> 
SELECT 
* 
FROM 
user_subscriptions 
WHERE 
phone-­‐number 
= 
551911114567; 
user_subscriptions 
phone-number subscription-id 
551900001212 subs-093123 
551911114567 subs-002202 
551911114567 subs-002203 
551911114567 subs-002204 
551911114567 subs-002202 
551911114567 subs-002203 
551911114567 subs-002204
Data Model V1 – Quering Profile 
#cql> 
_ 
2nd step 
• query all the user’s subscriptions by its id 
551911114567 subs-002202 
551911114567 subs-002203 
551911114567 subs-002204
Data Model V1 – Quering Profile 
#cql> 
SELECT 
* 
FROM 
subscriptions 
WHERE 
subscription-­‐id 
= 
‘subs-­‐002204’; 
#cql> 
SELECT 
* 
FROM 
subscriptions 
WHERE 
subscription-­‐id 
= 
‘subs-­‐002203’; 
#cql> 
SELECT 
* 
FROM 
subscriptions 
WHERE 
subscription-­‐id 
= 
‘subs-­‐002202’; 
subscriptions 
subscription-id phone-number config-id . . . enabled creation-date 
subs-093123 551900001212 342 . . . true 2013-08-01 
subs-002202 551911114567 567 . . . false 2014-06-27 
subs-002203 551911114567 678 . . . true 2014-07-05 
subs-002204 551911114567 654 . . . true 2014-08-07
Data Model V2 
CREATE TABLE subscription ( 
phone-number text, 
subscription-id text, 
serialized blob, 
PRIMARY KEY(phone-number, subscription-id) 
);
Data Model V2 
subscriptions 
phone-number subscription-id serialized-data 
551900001212 subs-093123 array [1,1,0,1,1,1,1,0,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 
551911114567 subs-002202 array [0,1,0,1,1,0,1,1,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 
551911114567 subs-002203 array [0,1,0,0,1,1,1,0,0,1,0,10,1,1,0,1,1,0,1,1,1,1,1,1,1,0] 
551911114567 subs-002203 array [1,0,0,1,1,1,1,0,1,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,1] 
542154231121 subs-320012 array [1,1,1,1,1,0,1,0,1,0,0,10,1,0,0,1,1,0,1,1,1,0,0,1,0,1]
Data Model V2 – Quering Profile 
#cql> 
SELECT 
* 
FROM 
subscriptions 
WHERE 
phone-­‐number 
= 
‘551911114567’; 
551911114567 subs-002202 array [0,1,0,1,1,0,1,1,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 
551911114567 subs-002203 array [0,1,0,0,1,1,1,0,0,1,0,10,1,1,0,1,1,0,1,1,1,1,1,1,1,0] 
551911114567 subs-002204 array [1,0,0,1,1,1,1,0,1,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,1] 
subscriptions 
phone-number subscription-id serialized-data 
551900001212 subs-093123 array [1,1,0,1,1,1,1,0,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 
551911114567 subs-002202 array [0,1,0,1,1,0,1,1,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 
551911114567 subs-002203 array [0,1,0,0,1,1,1,0,0,1,0,10,1,1,0,1,1,0,1,1,1,1,1,1,1,0] 
551911114567 subs-002204 array [1,0,0,1,1,1,1,0,1,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,1] 
542154231121 subs-320012 array [1,1,1,1,1,0,1,0,1,0,0,10,1,0,0,1,1,0,1,1,1,0,0,1,0,1]
Storage Strategy 
• we tried some various ways to store 
information 
• it optimizes network traffic as well 
2500 
2000 
1500 
1000 
500 
0 
Object Representation 
1 
Object size in bytes 
Types of object representation 
XML Java Byte Array JSON protobuff
Database volume 
• the database size, decreases 
considerably 
• less data to handle, more 
performance 
XML 
Java Byte Array 
JSON 
protobuff 
0.00 20.00 40.00 60.00 80.00 100.00 
Data Size in GB 
Storage Strategy 
Subscription Data Volume
C* New Data Model 
• performance increased significantly 
• reduced complexity: from 2 tables to 1, simpler, lighter 
• reduced number of remote calls 
• V1 
• 1 query to the index table 
• X queries (one per index returned) 
• V2 
• 1 query brings all data 
• data volume reduced
Cassandra Cluster Configuration 
• Geographically distributed 
• 2 data centers in São Paulo Brazil
Ring Topology 
nodetool status 
Datacenter: 
DC1 
=============== 
Status=Up/Down 
|/ 
State=Normal/Leaving/Joining/Moving 
-­‐-­‐ 
Address 
Load 
Tokens 
Owns 
(effective) 
Host 
ID 
Rack 
UN 
200.xxx.xxx.73 
29.58 
GB 
256 
76,7% 
b9f890b6-­‐6137-­‐4359-­‐90c2-­‐74f87ce1676d 
RAC1 
UN 
200.xxx.xxx.72 
29.8 
GB 
256 
74,5% 
ec7fa873-­‐edd9-­‐4cb9-­‐938d-­‐60f1c9b8f742 
RAC1 
UN 
200.xxx.xxx.71 
30.76 
GB 
256 
76,1% 
1091799e-­‐0617-­‐42dd-­‐a396-­‐363f10c03295 
RAC1 
UN 
200.xxx.xxx.74 
26.68 
GB 
256 
72,7% 
984b848b-­‐0ecb-­‐4db3-­‐a1fe-­‐c9b088c295f6 
RAC1 
Datacenter: 
DC2 
=============== 
Status=Up/Down 
|/ 
State=Normal/Leaving/Joining/Moving 
-­‐-­‐ 
Address 
Load 
Tokens 
Owns 
(effective) 
Host 
ID 
Rack 
UN 
200.xxx.xxx.72 
28.99 
GB 
256 
100,0% 
f9b820d6-­‐111f-­‐4a3a-­‐af6c-­‐39d0e8e88084 
RAC1 
UN 
200.xxx.xxx.71 
30.36 
GB 
256 
100,0% 
120939bd-­‐a6b4-­‐4d88-­‐b2cf-­‐dbf79d93181c 
RAC1 
UN 
200.xxx.xxx.74 
27.93 
GB 
256 
100,0% 
c821b8f7-­‐2224-­‐4512-­‐8a0e-­‐0371460d900e 
RAC1
Hardware Infrastructure v1.0 
4 Servers 
• Centos 5.9 
• 2x Intel(R) Xeon(R) CPU E5606 @ 2.13GHz (4 cores) 
• 24GB / 32GB RAM 
• 1x SATA 500gb (OS) 
• 1x SSD CSSD-F120GB2 (data and commit logs) 
• Apache Cassandra v1.0.6
Hardware Infrastructure v2.0 
6 Servers 
• 2 Intel (R) Xeon (R) CPU @3.1GHz 
• 128 GB of total RAM Memory per Server 
6 Virtual Machines (one per physical server) 
• Running Cent OS 6.5 
• 32 GB of RAM per VM 
• 1 Intel (R) Xeon (R) CPU @3.1GHz 
• 2 SSD Disks Model : CSSD-F120GBGT 
• Configured as RAID0 
• Apache Cassandra 1.2.13 
VMs
Keyspace 
Keyspace: 
SBSPlatform: 
Replication 
Strategy: 
org.apache.cassandra.locator.NetworkTopologyStrategy 
Options: 
[DC2:3, 
DC1:3] 
cassandra-cli : describe 
Column 
Families: 
ColumnFamily: 
subscription 
ColumnFamily: 
delivery_ticket 
ColumnFamily: 
hard_limit_control 
ColumnFamily: 
hard_limit_rules 
ColumnFamily: 
idx_config_subsc 
ColumnFamily: 
user_directives
Column Family Status 
Column 
Family: 
subscription 
./nodetool cfstats SBSPlatform 
Space 
used 
(total): 
13499012297 
Number 
of 
Keys 
(estimate): 
46.369.536 
Read 
Count: 
5599788263 
/ 
Read 
Latency: 
0,497 
ms. 
Write 
Count: 
5212858995 
/ 
Write 
Latency: 
0,017 
ms. 
Compacted 
row 
mean 
size: 
576 
Column 
Family: 
hard_limit_control 
Space 
used 
(total): 
7812531598 
Number 
of 
Keys 
(estimate): 
44.785.024 
Read 
Count: 
3987345295 
/ 
Read 
Latency: 
0,509 
ms. 
Write 
Count: 
11646786043 
/ 
Write 
Latency: 
0,021 
ms. 
Compacted 
row 
mean 
size: 
188
Overall cluster response time 
Node 
1 
-­‐ 
: 
200.xxx.xxx.71 
load_avg: 
0.39 
write_latency(us): 
900.8 
read_latency(us): 
553.6 
Node 
2 
-­‐ 
: 
200.xxx.xxx.72 
load_avg: 
0.51 
write_latency(us): 
874.1 
read_latency(us): 
620.5 
Node 
3 
-­‐ 
: 
200.xxx.xxx.74 
load_avg: 
0.35 
write_latency(us): 
834.87 
read_latency(us): 
515.6 
Node 
4 
-­‐ 
: 
200.xxx.xxx.73 
load_avg: 
0.35 
write_latency(us): 
900.87 
read_latency(us): 
700.6 
Node 
1 
-­‐ 
: 
200.xxx.xxx.71 
load_avg: 
0.63 
write_latency(us): 
806.3 
read_latency(us): 
882.3 
Node 
2 
-­‐ 
: 
200.xxx.xxx.72 
load_avg: 
0.37 
write_latency(us): 
802.8 
read_latency(us): 
969.0 
Node 
3 
-­‐ 
: 
200.xxx.xxx.74 
load_avg: 
0.62 
write_latency(us): 
965.7 
read_latency(us): 
887.43 
Now: 
2014-­‐08-­‐30 
14:49:15 
Total 
Reads/second: 
13262 
Total 
Writes/second: 
9529 
DATACENTER 2 DATACENTER 1
O.S. and Software Customizations 
According to the Cassandra Docs the Recommended Settings for Production 
• Java 1.7 + JNA 
• Disable Swap 
• NTP server in all servers
O.S. - limits.conf 
# 
number 
of 
open 
files 
root 
soft 
nofile 
100000 
root 
hard 
nofile 
100000 
* 
soft 
nofile 
100000 
* 
hard 
nofile 
100000 
# 
allocated 
memory 
root 
soft 
memlock 
unlimited 
root 
hard 
memlock 
unlimited 
* 
soft 
memlock 
unlimited 
* 
hard 
memlock 
unlimited 
# 
addressing 
(virtual 
memory) 
root 
soft 
as 
unlimited 
root 
hard 
as 
unlimited 
* 
soft 
as 
unlimited 
* 
hard 
as 
unlimited 
# 
number 
of 
open 
processes 
root 
soft 
nproc 
unlimited 
root 
hard 
nproc 
unlimited 
* 
soft 
nproc 
unlimited 
* 
hard 
nproc 
unlimited
Daily Cluster Operations 
Total Reads 
Total Writes 
> 1 billion
Conclusion: Why Cassandra? 
• Good performance for Reads 
• Excellent performance for Writes 
• Read and Write throughput highly scalable (linear) 
• Supports GEO distributed information 
• Fault Tolerant 
• Tunable consistency per client 
• FOSS (Free and Open Source Software) + Support
Thank You! 
Questions? 
eiti.kimura@movile.com 
eitikimura 
facebook.com/eiti.kimura
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bumping up Startup Business with Apache

More Related Content

Viewers also liked (20)

PDF
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
DataStax Academy
 
PDF
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
DataStax Academy
 
PDF
Apache Cassandra at Narmal 2014
DataStax Academy
 
PDF
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
DataStax Academy
 
PDF
Introduction to Dating Modeling for Cassandra
DataStax Academy
 
PPTX
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
DataStax Academy
 
PDF
Production Ready Cassandra (Beginner)
DataStax Academy
 
PDF
Cassandra Summit 2014: Monitor Everything!
DataStax Academy
 
PDF
Coursera's Adoption of Cassandra
DataStax Academy
 
PDF
New features in 3.0
DataStax Academy
 
PDF
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
DataStax Academy
 
PDF
The Last Pickle: Distributed Tracing from Application to Database
DataStax Academy
 
PDF
Introduction to .Net Driver
DataStax Academy
 
PPTX
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
PDF
Playlists at Spotify
DataStax Academy
 
PPTX
Lessons Learned with Cassandra and Spark at the US Patent and Trademark Office
DataStax Academy
 
PDF
Oracle to Cassandra Core Concepts Guide Pt. 2
DataStax Academy
 
PPTX
Using Event-Driven Architectures with Cassandra
DataStax Academy
 
PDF
Signal Digital: The Skinny on Wide Rows
DataStax Academy
 
PDF
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
DataStax Academy
 
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
DataStax Academy
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
DataStax Academy
 
Apache Cassandra at Narmal 2014
DataStax Academy
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
DataStax Academy
 
Introduction to Dating Modeling for Cassandra
DataStax Academy
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
DataStax Academy
 
Production Ready Cassandra (Beginner)
DataStax Academy
 
Cassandra Summit 2014: Monitor Everything!
DataStax Academy
 
Coursera's Adoption of Cassandra
DataStax Academy
 
New features in 3.0
DataStax Academy
 
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
DataStax Academy
 
The Last Pickle: Distributed Tracing from Application to Database
DataStax Academy
 
Introduction to .Net Driver
DataStax Academy
 
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
Playlists at Spotify
DataStax Academy
 
Lessons Learned with Cassandra and Spark at the US Patent and Trademark Office
DataStax Academy
 
Oracle to Cassandra Core Concepts Guide Pt. 2
DataStax Academy
 
Using Event-Driven Architectures with Cassandra
DataStax Academy
 
Signal Digital: The Skinny on Wide Rows
DataStax Academy
 
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
DataStax Academy
 

Similar to Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bumping up Startup Business with Apache (20)

PDF
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
PDF
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
PPTX
Cassandra implementation for collecting data and presenting data
Chen Robert
 
PPTX
Minnebar 2013 - Scaling with Cassandra
Jeff Bollinger
 
PDF
Cassandra's Odyssey @ Netflix
Roopa Tangirala
 
PPTX
Introduction to cassandra
Tarun Garg
 
PPTX
M6d cassandrapresentation
Edward Capriolo
 
PDF
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
PPT
5266732.ppt
hothyfa
 
PPTX
Cassandra from the trenches: migrating Netflix
Jason Brown
 
PPTX
Cassandra from the trenches: migrating Netflix (update)
Jason Brown
 
PPTX
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
PDF
Netflix at-disney-09-26-2014
Monal Daxini
 
PDF
Five Lessons in Distributed Databases
jbellis
 
PDF
About "Apache Cassandra"
Jihyun Ahn
 
PPT
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
PDF
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
DataStax Academy
 
PDF
apidays LIVE Singapore 2022_Redesigning Data Architecture.pdf
apidays
 
PDF
Introduction to Apache Cassandra
Robert Stupp
 
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
Cassandra implementation for collecting data and presenting data
Chen Robert
 
Minnebar 2013 - Scaling with Cassandra
Jeff Bollinger
 
Cassandra's Odyssey @ Netflix
Roopa Tangirala
 
Introduction to cassandra
Tarun Garg
 
M6d cassandrapresentation
Edward Capriolo
 
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
5266732.ppt
hothyfa
 
Cassandra from the trenches: migrating Netflix
Jason Brown
 
Cassandra from the trenches: migrating Netflix (update)
Jason Brown
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
Netflix at-disney-09-26-2014
Monal Daxini
 
Five Lessons in Distributed Databases
jbellis
 
About "Apache Cassandra"
Jihyun Ahn
 
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
DataStax Academy
 
apidays LIVE Singapore 2022_Redesigning Data Architecture.pdf
apidays
 
Introduction to Apache Cassandra
Robert Stupp
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
PPTX
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
PDF
Cassandra 3.0 Data Modeling
DataStax Academy
 
PPTX
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
PDF
Data Modeling for Apache Cassandra
DataStax Academy
 
PDF
Coursera Cassandra Driver
DataStax Academy
 
PDF
Production Ready Cassandra
DataStax Academy
 
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
PDF
Standing Up Your First Cluster
DataStax Academy
 
PDF
Real Time Analytics with Dse
DataStax Academy
 
PDF
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Cassandra Core Concepts
DataStax Academy
 
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
PPTX
Bad Habits Die Hard
DataStax Academy
 
PDF
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Advanced Cassandra
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
DataStax Academy
 
Ad

Recently uploaded (20)

PPTX
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PPTX
Using Google Data Studio (Looker Studio) to Create Effective and Easy Data Re...
Orage Technologies
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PPTX
Machine Learning Benefits Across Industries
SynapseIndia
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
PPTX
Earn Agentblazer Status with Slack Community Patna.pptx
SanjeetMishra29
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Using Google Data Studio (Looker Studio) to Create Effective and Easy Data Re...
Orage Technologies
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Machine Learning Benefits Across Industries
SynapseIndia
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
Earn Agentblazer Status with Slack Community Patna.pptx
SanjeetMishra29
 
Market Insight : ETH Dominance Returns
CIFDAQ
 

Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bumping up Startup Business with Apache

  • 1. Movile is the industry leader for development of mobile content and commerce platforms in Latin America. With products for mobile phones, smartphones and tablets. Games, on-line education, entertainment apps for adults and kids and many options for buying with confidence and comfort. All of that comes to you through Movile. For companies, Movile delivers complete products, integrating transactions in M-Payments and content distribution, fast and with quality.
  • 3. Subscription and Billing Platform a.k.a SBS • It is a service API • responsible to manage users subscriptions • charge users in carriers • renew subscriptions • “can not” stop anyway • should be as performatic as possible
  • 4. Some platform numbers Renewal Engine: ~ 52,1 million of billing tries a day • about 603 request/s • 1,5 billion billing tries per month 50 million subscriptions ~ 50 request/s Operations: ★ subscribe ★ cancel ★ profile
  • 5. Data Distribution Subscriptions by Country 1% 2% 4% 25% 68% Others Colombia Argentina Mexico Brazil
  • 6. Platform Architecture “There isn’t just one way to state a system’s architecture; rather, there are multiple architectures in a system, and the view of what is architecturally significant is one that can change over a system’s lifetime.” - Patterns of Enterprise Application Architecture Martin Fowler
  • 8. • scalabe • high availability • high performance
  • 9. Very High Usage • veryyyyy slow... system response • overall throughput decreased • low availability, single point of failure • Even worse than stopping is to only work sometimes
  • 10. Improved Distributed Design A Cassandra Based Solution • the operations are distributed across the nodes • we achieved linear scalability
  • 11. Improved Distributed Design A Cassandra Based Solution • the performance issues were solved • the availability has improved • there is no longer a single point of failure
  • 12. C* Data Modeling • Dernormalization: Writes are cheap, reads are expensive, so insert data in every arrangement that you need to read • Don't be afraid of denormalization • There are different ways to model your solution, there is no right or wrong way • plan your queries and how you need to get the information before modeling. Use it as driver for modeling decisions
  • 13. Data Model V1 CREATE TABLE subscription ( subscription-id text PRIMARY KEY, phone-number text, config-id int, … enabled boolean, creation-date timestamp ); CREATE TABLE user_subscriptions ( phone-number text, subscription-id text, PRIMARY KEY (phone-number, subscription-id) );
  • 14. Data Model V1 user_subscriptions phone-number subscription-id 551900001212 subs-093123 551911114567 subs-002202 551911114567 subs-002203 551911114567 subs-002204 subscriptions subscription-id phone-number config-id . . . enabled creation-date subs-093123 551900001212 342 . . . true 2013-08-01 subs-002202 551911114567 567 . . . false 2014-06-27 subs-002203 551911114567 678 . . . true 2014-07-05 subs-002204 551911114567 654 . . . true 2014-08-07
  • 15. Data Model V1 – Quering Profile user_subscriptions phone-number subscription-id 551900001212 subs-093123 551911114567 subs-002202 551911114567 subs-002203 551911114567 subs-002204 #cql> _ 1st step • check the index table to get the ids of subscriptions for a given user
  • 16. Data Model V1 – Quering Profile #cql> SELECT * FROM user_subscriptions WHERE phone-­‐number = 551911114567; user_subscriptions phone-number subscription-id 551900001212 subs-093123 551911114567 subs-002202 551911114567 subs-002203 551911114567 subs-002204 551911114567 subs-002202 551911114567 subs-002203 551911114567 subs-002204
  • 17. Data Model V1 – Quering Profile #cql> _ 2nd step • query all the user’s subscriptions by its id 551911114567 subs-002202 551911114567 subs-002203 551911114567 subs-002204
  • 18. Data Model V1 – Quering Profile #cql> SELECT * FROM subscriptions WHERE subscription-­‐id = ‘subs-­‐002204’; #cql> SELECT * FROM subscriptions WHERE subscription-­‐id = ‘subs-­‐002203’; #cql> SELECT * FROM subscriptions WHERE subscription-­‐id = ‘subs-­‐002202’; subscriptions subscription-id phone-number config-id . . . enabled creation-date subs-093123 551900001212 342 . . . true 2013-08-01 subs-002202 551911114567 567 . . . false 2014-06-27 subs-002203 551911114567 678 . . . true 2014-07-05 subs-002204 551911114567 654 . . . true 2014-08-07
  • 19. Data Model V2 CREATE TABLE subscription ( phone-number text, subscription-id text, serialized blob, PRIMARY KEY(phone-number, subscription-id) );
  • 20. Data Model V2 subscriptions phone-number subscription-id serialized-data 551900001212 subs-093123 array [1,1,0,1,1,1,1,0,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 551911114567 subs-002202 array [0,1,0,1,1,0,1,1,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 551911114567 subs-002203 array [0,1,0,0,1,1,1,0,0,1,0,10,1,1,0,1,1,0,1,1,1,1,1,1,1,0] 551911114567 subs-002203 array [1,0,0,1,1,1,1,0,1,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,1] 542154231121 subs-320012 array [1,1,1,1,1,0,1,0,1,0,0,10,1,0,0,1,1,0,1,1,1,0,0,1,0,1]
  • 21. Data Model V2 – Quering Profile #cql> SELECT * FROM subscriptions WHERE phone-­‐number = ‘551911114567’; 551911114567 subs-002202 array [0,1,0,1,1,0,1,1,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 551911114567 subs-002203 array [0,1,0,0,1,1,1,0,0,1,0,10,1,1,0,1,1,0,1,1,1,1,1,1,1,0] 551911114567 subs-002204 array [1,0,0,1,1,1,1,0,1,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,1] subscriptions phone-number subscription-id serialized-data 551900001212 subs-093123 array [1,1,0,1,1,1,1,0,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 551911114567 subs-002202 array [0,1,0,1,1,0,1,1,0,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,0] 551911114567 subs-002203 array [0,1,0,0,1,1,1,0,0,1,0,10,1,1,0,1,1,0,1,1,1,1,1,1,1,0] 551911114567 subs-002204 array [1,0,0,1,1,1,1,0,1,0,0,10,1,1,0,1,1,0,1,1,1,0,0,0,1,1] 542154231121 subs-320012 array [1,1,1,1,1,0,1,0,1,0,0,10,1,0,0,1,1,0,1,1,1,0,0,1,0,1]
  • 22. Storage Strategy • we tried some various ways to store information • it optimizes network traffic as well 2500 2000 1500 1000 500 0 Object Representation 1 Object size in bytes Types of object representation XML Java Byte Array JSON protobuff
  • 23. Database volume • the database size, decreases considerably • less data to handle, more performance XML Java Byte Array JSON protobuff 0.00 20.00 40.00 60.00 80.00 100.00 Data Size in GB Storage Strategy Subscription Data Volume
  • 24. C* New Data Model • performance increased significantly • reduced complexity: from 2 tables to 1, simpler, lighter • reduced number of remote calls • V1 • 1 query to the index table • X queries (one per index returned) • V2 • 1 query brings all data • data volume reduced
  • 25. Cassandra Cluster Configuration • Geographically distributed • 2 data centers in São Paulo Brazil
  • 26. Ring Topology nodetool status Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -­‐-­‐ Address Load Tokens Owns (effective) Host ID Rack UN 200.xxx.xxx.73 29.58 GB 256 76,7% b9f890b6-­‐6137-­‐4359-­‐90c2-­‐74f87ce1676d RAC1 UN 200.xxx.xxx.72 29.8 GB 256 74,5% ec7fa873-­‐edd9-­‐4cb9-­‐938d-­‐60f1c9b8f742 RAC1 UN 200.xxx.xxx.71 30.76 GB 256 76,1% 1091799e-­‐0617-­‐42dd-­‐a396-­‐363f10c03295 RAC1 UN 200.xxx.xxx.74 26.68 GB 256 72,7% 984b848b-­‐0ecb-­‐4db3-­‐a1fe-­‐c9b088c295f6 RAC1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -­‐-­‐ Address Load Tokens Owns (effective) Host ID Rack UN 200.xxx.xxx.72 28.99 GB 256 100,0% f9b820d6-­‐111f-­‐4a3a-­‐af6c-­‐39d0e8e88084 RAC1 UN 200.xxx.xxx.71 30.36 GB 256 100,0% 120939bd-­‐a6b4-­‐4d88-­‐b2cf-­‐dbf79d93181c RAC1 UN 200.xxx.xxx.74 27.93 GB 256 100,0% c821b8f7-­‐2224-­‐4512-­‐8a0e-­‐0371460d900e RAC1
  • 27. Hardware Infrastructure v1.0 4 Servers • Centos 5.9 • 2x Intel(R) Xeon(R) CPU E5606 @ 2.13GHz (4 cores) • 24GB / 32GB RAM • 1x SATA 500gb (OS) • 1x SSD CSSD-F120GB2 (data and commit logs) • Apache Cassandra v1.0.6
  • 28. Hardware Infrastructure v2.0 6 Servers • 2 Intel (R) Xeon (R) CPU @3.1GHz • 128 GB of total RAM Memory per Server 6 Virtual Machines (one per physical server) • Running Cent OS 6.5 • 32 GB of RAM per VM • 1 Intel (R) Xeon (R) CPU @3.1GHz • 2 SSD Disks Model : CSSD-F120GBGT • Configured as RAID0 • Apache Cassandra 1.2.13 VMs
  • 29. Keyspace Keyspace: SBSPlatform: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Options: [DC2:3, DC1:3] cassandra-cli : describe Column Families: ColumnFamily: subscription ColumnFamily: delivery_ticket ColumnFamily: hard_limit_control ColumnFamily: hard_limit_rules ColumnFamily: idx_config_subsc ColumnFamily: user_directives
  • 30. Column Family Status Column Family: subscription ./nodetool cfstats SBSPlatform Space used (total): 13499012297 Number of Keys (estimate): 46.369.536 Read Count: 5599788263 / Read Latency: 0,497 ms. Write Count: 5212858995 / Write Latency: 0,017 ms. Compacted row mean size: 576 Column Family: hard_limit_control Space used (total): 7812531598 Number of Keys (estimate): 44.785.024 Read Count: 3987345295 / Read Latency: 0,509 ms. Write Count: 11646786043 / Write Latency: 0,021 ms. Compacted row mean size: 188
  • 31. Overall cluster response time Node 1 -­‐ : 200.xxx.xxx.71 load_avg: 0.39 write_latency(us): 900.8 read_latency(us): 553.6 Node 2 -­‐ : 200.xxx.xxx.72 load_avg: 0.51 write_latency(us): 874.1 read_latency(us): 620.5 Node 3 -­‐ : 200.xxx.xxx.74 load_avg: 0.35 write_latency(us): 834.87 read_latency(us): 515.6 Node 4 -­‐ : 200.xxx.xxx.73 load_avg: 0.35 write_latency(us): 900.87 read_latency(us): 700.6 Node 1 -­‐ : 200.xxx.xxx.71 load_avg: 0.63 write_latency(us): 806.3 read_latency(us): 882.3 Node 2 -­‐ : 200.xxx.xxx.72 load_avg: 0.37 write_latency(us): 802.8 read_latency(us): 969.0 Node 3 -­‐ : 200.xxx.xxx.74 load_avg: 0.62 write_latency(us): 965.7 read_latency(us): 887.43 Now: 2014-­‐08-­‐30 14:49:15 Total Reads/second: 13262 Total Writes/second: 9529 DATACENTER 2 DATACENTER 1
  • 32. O.S. and Software Customizations According to the Cassandra Docs the Recommended Settings for Production • Java 1.7 + JNA • Disable Swap • NTP server in all servers
  • 33. O.S. - limits.conf # number of open files root soft nofile 100000 root hard nofile 100000 * soft nofile 100000 * hard nofile 100000 # allocated memory root soft memlock unlimited root hard memlock unlimited * soft memlock unlimited * hard memlock unlimited # addressing (virtual memory) root soft as unlimited root hard as unlimited * soft as unlimited * hard as unlimited # number of open processes root soft nproc unlimited root hard nproc unlimited * soft nproc unlimited * hard nproc unlimited
  • 34. Daily Cluster Operations Total Reads Total Writes > 1 billion
  • 35. Conclusion: Why Cassandra? • Good performance for Reads • Excellent performance for Writes • Read and Write throughput highly scalable (linear) • Supports GEO distributed information • Fault Tolerant • Tunable consistency per client • FOSS (Free and Open Source Software) + Support
  • 36. Thank You! Questions? [email protected] eitikimura facebook.com/eiti.kimura