SlideShare a Scribd company logo
Proprietary and
WHILE THE OLD PRESENTATION IS STILL AVAILABLE AFTER THIS SLIDE,
I THINK YOU SHOULD SEE THE NEW ONE INSTEAD! PLEASE CLICK ON THE LINK:
http://bit.ly/1MXxlBL	
WARNING:
There is a more recent and updated version of this talk โ€จ
available on slideshare, itโ€™s called:
โ€œFROM OBVIOUS TO INGENIOUS: โ€จ
INCREMENTALLY SCALING WEB APPLICATIONS ON POSTGRESQLโ€
presented at
Proprietary and
Con๏ฌdential
Konstantin Gredeskoulโ€จ
CTO, Wanelo.com
12-Step Program for Scaling Web
Applications on PostgreSQL
@kig
@kigster
Proprietary and
And why should you care?
What does it mean, to scale on top of
PostgreSQL?
Proprietary and
Scaling means supporting more
work load concurrently, where work
is often interchanged with users
But why on PostgreSQL?
Because NoNoSQL is hawt! (again)
Proprietary and
Relational databases are great at
supporting constant change in
software
They are not as great in โ€œpure
scalingโ€, like RIAK or Cassandra
So the choice critically depends on
what you are trying to build
Proprietary and
Large majority of web applications
are represented extremely well by
the relational model
So if I need to build a new product
or a service, my default choice
would be PostgreSQL for critical
data, + whatever else as needed
Proprietary and
This presentation is a walk-through ๏ฌlled
with practical solutions
Itโ€™s based on a story of scaling
wanelo.com to sustain 10s of thousand
concurrent users, 3k req/sec
Not Twitter/Facebook scale, but stillโ€ฆ
So letโ€™s explore the application to learn a
bit about wanelo for our scalability journey
Proprietary and
Founded in 2010, Wanelo (โ€œwah-nee-loh,โ€ from
Want, Need, Love) is a community and a social
network for all of the world's shopping.
Wanelo is a home to 12M products, millions of
users, 200K+ stores, and products on Wanelo
have been saved into collections โ€จ
over 2B times
โ€ข move fast with product development
โ€ข scale as needed, stay ahead of the curve
โ€ข keep overall costs low
โ€ข but spend where it matters
โ€ข automate everything
โ€ข avoid reinventing the wheel
โ€ข learn as we go
โ€ข remain in control of our infrastructure
Early on we wanted to:
Heroku or Not?
Proprietary and
Assuming we want full control of our
application layer, places like Heroku arenโ€™t a
great ๏ฌt
But Heroku can be a great place to start. It
all depends on the size and complexity of the
app we are building.
Ours would have been cost prohibitive.
Foundations of web apps
Proprietary and
โ€ข app server (we use unicorn)
โ€ข scalable web server in front (we use nginx)
โ€ข database (we use postgresql)
โ€ข hosting environment (we use Joyent Cloud)
โ€ข deployment tools (capistrano)
โ€ข server con๏ฌguration tools (we use chef)
โ€ข programming language + framework (RoR)
โ€ข many others, such as monitoring, alerting
Letโ€™s reviewโ€ฆ Basic Web App
Proprietary and
/var/pgsql/data
incoming
http
PostgreSQL
Server
/home/user/app/current/public
nginx Unicorn / Passenger
Ruby VM
N x Unicorns
Ruby VM
โ€ข no redundancy, no caching (yet)
โ€ข can only process N concurrent requests
โ€ข nginx will serve static assets, deal with slow clients
โ€ข web sessions probably in the DB or cookie
First optimizations: โ€จ
cheap early on, well worth it
Proprietary and
โ€ข Personalization via AJAX, so controller actions
can be cached entirely using caches_action
โ€ข Page returned unpersonalized, additional AJAX
request loads personalization
A few more basic performance
tweaks that go a long way
Proprietary and
โ€ข Install 2+ memcached servers for caching and
use Dalli gem to connect to it for redundancy
โ€ข Switch to memcached-based web sessions. Use
sessions sparingly, assume transient nature
โ€ข Setup CDN for asset_host and any user
generated content. We use fastly.com
โ€ข Redis is also an option, but I prefer memcached
for redundancy
Proprietary and
browser PostgreSQL
Server
/home/user/app/current/public
nginx Unicorn / Passenger
Ruby VM
N x Unicorns
Ruby VM
memcachedCDN
cache images, JS
Caching goes a long wayโ€ฆ
โ€ข geo distribute and cache your UGC and CSS/JS assets
โ€ข cache html and serialize objects in memcached
โ€ข can increase TTL to alleviate load, if tra๏ฌƒc spikes
Proprietary and
Adding basic redundancy
โ€ข Multiple app servers require haproxy
between nginx and unicorn
โ€ข Multiple long-running tasks (such as
posting to Facebook or Twitter) require
background job processing framework
โ€ข Multiple load balancers require DNS
round robin and short TTL (dyn.com)
Proprietary and
PostgreSQL
Unicorn / Passenger
Ruby VM (times N)
haproxy
incoming http
DNS round robin
or failover / HA solution
nginx
memcached
redis
CDN
cache images, JS
Load Balancers
App Servers
single DB
Object Store
User Generated
Content
Sidekiq / Resque
Background WorkersData stores
Transient to
Permanent
this architecture can horizontally scale up as
far the database at itโ€™s center
every other component can be scaled by
adding more of it, to handle more tra๏ฌƒc
Proprietary and
As long as we can scale the data
store on the backend, we can scale
the app!
Mostly :)
At some point we may hit a limit on TCP/IP
network throughput, # of connections, but
this is at a whole another scale level
The tra๏ฌƒc keeps climbingโ€ฆ
Performance limits are near
Proprietary and
โ€ข First signs of performance problems start creeping up
โ€ข Symptoms of read scalability problems
โ€ข Pages load slowly or timeout
โ€ข Users are getting 503 Service Unavailable
โ€ข Database is slammed (very high CPU or read IO)
โ€ข Symptoms of write scalability problems
โ€ข Database write IO is maxed out, CPU is not
โ€ข Update operations are waiting on each other, piling up
โ€ข Application โ€œlocks upโ€, timeouts
โ€ข Replicas are not catching up
โ€ข Some pages load (cached?), some donโ€™t
Proprietary and
Both situations may easily result in
downtime
Proprietary and
Even though we
achieved 99.99% uptime
in 2013, in 2014 we had
a couple short
downtimes caused by
overloaded replica that
lasted around 5 minutes.
But users quickly
noticeโ€ฆ
Proprietary and
Proprietary and
Perhaps not :)
Proprietary and
Common patterns for scaling high tra๏ฌƒc web
applications, based on wanelo.com
โ€จ
12-Step Program
for curing your dependency on slow application latency
Proprietary and
โ€ข For small / fast HTTP services, 10-12ms or lower
โ€ข If your app is high tra๏ฌƒc (100K+ RPM) I
recommend 80ms or lower
Whatโ€™s a good latency?
Proprietary and
โ€ข RubyVM (30ms) + Garbage collection (6ms) is CPU
burn, easy to scale by adding more app servers
โ€ข Web services + Solr (25ms), memcached (15ms),
database (6ms) are all waiting on IO
CPU burn vs Waiting on IO?
Proprietary and
Step 1: โ€จ
Add More Cache!
Moar Cache!!!
Proprietary and
โ€ขAnything that can be cached, should be
โ€ขCache hit = many database hits avoided
โ€ขHit rate of 17% still saves DB hits
โ€ขWe can cache many types of thingsโ€ฆ
โ€ขCache is cheap and fast (memcached)
Cache many types of things
Proprietary and
โ€ข caches_action in controllers is very e๏ฌ€ective
โ€ข fragment caches of reusable widgets
โ€ข we use gem Compositor for JSON API. We cache
serialized object fragments, grab them from
memcached using multi_get and merge them
โ€ข Shopify open sourced IdentityCache, which
caches AR models, so you can Product.fetch(id)
https://github.com/wanelo/compositorโ€จ
https://github.com/Shopify/identity_cache
But Caching has itโ€™s issues
Proprietary and
โ€ข Expiring cache is not easy
โ€ข CacheSweepers in Rails help
โ€ข We found ourselves doing 4000 memcached
deletes in a single request!
โ€ข Could defer expiring caches to background jobs,
or use TTL if possible
โ€ข But we can cache even outside of our app: โ€จ
we cache JSON API responses using CDN (fastly.com)
Proprietary and
Step 2: โ€จ
Optimize SQL
SQL Optimization
โ€ข Find slow SQL (>100ms) and either remove it, cache
the hell out of it, or ๏ฌx/rewrite the query
โ€ข Enable slow query log in postgresql.conf:โ€จ
โ€จ
log_min_duration_statement	=	80โ€จ
log_temp_files	=	0									
โ€ข pg_stat_statements is an invaluable contrib module:โ€จ
โ€จ
Fixing Slow Query
Proprietary and
Run explain plan to understand how DB runs the query
Are there adequate indexes for the query? Is the database using
appropriate index? Has the table been recently analyzed?
Can a complex join be simpli๏ฌed into a subselect?
Can this query use an index-only scan?
Can โ€œorder byโ€ column be added to the index?
pg_stat_user_indexes and pg_stat_user_tables for seq scans,
unused indexes, cache info
SQL Optimization, ctd
Proprietary and
โ€ข Instrumentation software such as NewRelic shows slow queries, with
explain plans, and time consuming transactions
SQL Optimization: Example
Proprietary and
Proprietary and
One day, I noticed lots of temp ๏ฌles
created in the postgres.log
Proprietary and
Letโ€™s run this queryโ€ฆ
This join takes a whole second to return :(
Proprietary and
โ€ข Follows tableโ€ฆ
Proprietary and
โ€ข Stories tableโ€ฆ
Proprietary and
So our index is partial, only on state = โ€˜activeโ€™
So this query is a full table scanโ€ฆ
But there state isnโ€™t used in the query, a bug?
Letโ€™s add state = โ€˜activeโ€™
It was meant to be there anyway
Proprietary and
After adding extra condition IO drops signi๏ฌcantly:
Proprietary and
Step 3: โ€จ
Upgrade Hardware and RAM
Hardware + RAM
Proprietary and
โ€ข Sounds obvious, but better or faster hardware is an
obvious choice when scaling out
โ€ข Large RAM will be used as ๏ฌle system cache
โ€ข On Joyentโ€™s SmartOS ARC FS cache is very e๏ฌ€ective	
โ€ข shared_buffers	should be set to 25% of RAM or 12GB,
whichever is smaller
โ€ข Using fast SSD disk array can make a huge di๏ฌ€erence
โ€ข Joyentโ€™s native 16-disk RAID managed by ZFS instead
of controller provides excellent performance
Hardware in the cloud
Proprietary and
โ€ข SSD o๏ฌ€erings from Joyent and AWS
โ€ข Joyents โ€œmaxโ€ SSD node $12.9/hr
โ€ข AWS โ€œmaxโ€ SSD node $6.8/hr
So whoโ€™s better?
Proprietary and
โ€ข JOYENT
โ€ข 16 SSD drives: RAID10 + 2
โ€ข SSD Make: DCS3700
โ€ข CPU: E5-2690โ€จ
2.9GHz
โ€ข AWS
โ€ข 8 SSD drives
โ€ข SSD Make: ?
โ€ข CPU: E5-2670โ€จ
2.6Ghz
Perhaps you get what you pay for after allโ€ฆ.
Proprietary and
Step 4: โ€จ
Scale Reads by Replication
Scale Reads by Replication
Proprietary and
โ€ข postgresql.conf (both master & replica)
โ€ข These settings have been tuned for SmartOS and our
application requirements (thanks PGExperts!)
How to distribute reads?
Proprietary and
โ€ข Some people have success using this setup for reads:โ€จ
app haproxy pgBouncer replicaโ€จ
pgBouncer replica
โ€ข Iโ€™d like to try this method eventually, but we choose to
deal with distributing read tra๏ฌƒc at the application level
โ€ข We tried many ruby-based solutions that claimed to do
this well, but many werenโ€™t production ready
Proprietary and
โ€ข Makara is a ruby gem from
TaskRabbit that we ported
from MySQL to PostgreSQL
for sending reads to replicas
โ€ข Was the simplest library to
understand, and port to PG
โ€ข Worked in the multi-threaded
environment of Sidekiq
Background Workers
โ€ข automatically retries if replica
goes down
โ€ข load balances with weights
โ€ข Was running in production
Special considerations
Proprietary and
โ€ข Application must be tuned to support eventual
consistency. Data may not yet be on replica!
โ€ข Must explicitly force fetch from the master DB when
itโ€™s critical (i.e. after a user accountโ€™s creation)
โ€ข We often use below pattern of ๏ฌrst trying the fetch, if
nothing found retry on master db
Replicas can specialize
Proprietary and
โ€ข Background Workers can use dedicated replica not
shared with the app servers, to optimize hit rate for
๏ฌle system cache (ARC) on both replicas
PostgreSQL
Master
Unicorn / Passenger
Ruby VM (times N)
App Servers Sidekiq / Resque
Background Workers
PostgreSQL
Replica 1
PostgreSQL
Replica 2
PostgreSQL
Replica 3
ARC cache warm with
queries from web traf๏ฌc
ARC cache warm with
background job queries
Big heavy reads go there
Proprietary and
โ€ข Long heavy queries should run by the background jobs
against a dedicated replica, to isolate their e๏ฌ€ect on
web tra๏ฌƒc
PostgreSQL
Master
Sidekiq / Resque
Background Workers
PostgreSQL
Replica 1
โ€ข Each type of load will produce a unique set of data
cached by the ๏ฌle system
Proprietary and
Step 5: โ€จ
Use more appropriate tools
Leveraging other tools
Proprietary and
Not every type of data is well suited for storing in a relational
DB, even though initially it may be convenient
โ€ข Redis is a great data store for transient or semi-
persistent data with list, hash or set semantics
โ€ข We use it for ActivityFeed by precomputing each feed at write
time. But we can regenerate it if the data is lost from Redis
โ€ข We use twemproxy in front of Redis which provides automatic
horizontal sharding and connection pooling.
โ€ข We run clusters of 256 redis shards across many virtual zones;
sharded redis instances use many cores, instead of one
โ€ข Solr is great for full text search, and deep paginated
sorted lists, such as trending, or related products
Proprietary and
True story: applying WAL logs onโ€จ
replicas creates signi๏ฌcant disk write load
But we still have single master DB taking
all the writesโ€ฆ
Replicas are unable to both serve live tra๏ฌƒc and โ€จ
catch up on replication. They fall behind.
Back to PostgreSQL
Proprietary and
When replicas fall behind, application generates
errors, unable to ๏ฌnd data it expects
Proprietary and
Step 6: โ€จ
Move write-heavy tables out:
Replace with non-DB solutions
Move event log out
Proprietary and
โ€ข We were appending all user events into this table
โ€ข We were generating millions of rows per day!
โ€ข We solved it by replacing user event recording system to
use rsyslog, appending to ASCII ๏ฌles
โ€ข We discovered from pg_stat_user_tables top table by
write volume was user_events
Itโ€™s cheap, reliable and scalable
We now use Joyentโ€™s Manta to analyze this data in
parallel. Manta is an object store + native compute on
Proprietary and
For more information about how we migrated
user events to a ๏ฌle-based append-only log, and
analyze it with Manta, please read
http://wanelo.ly/event-collection
Proprietary and
Step 7: โ€จ
Tune PostgreSQL and your
Filesystem
Tuning ZFS
Proprietary and
โ€ข Problem: zones (virtual hosts) with โ€œwrite
problemsโ€ appeared to be writing 16 times
more data to disk, compared to what virtual ๏ฌle
system reports
โ€ข vfsstat says 8Mb/sec write volume
โ€ข So whatโ€™s going on?
โ€ข iostat says 128Mb/sec is actually written to disk
Proprietary and
โ€ข Turns out default ZFS block size is 128Kb,
and PostgreSQL page size is 8Kb.
โ€ข Every small write that touched a page, had to
write 128Kb of a ZFS block to the disk
Tuning Filesystem
โ€ข This may be good for huge sequential writes,
but not for random access, lots of tiny writes
Proprietary and
โ€ข Solution: Joyent changed ZFS block size for our zone,
iostat write volume dropped to 8Mb/sec
โ€ข We also added commit_delay
Tuning ZFS & PgSQL
Proprietary and
โ€ข Many such settings are pre-de๏ฌned in our open-source
Chef cookbook for installing PostgreSQL from sources
Installing and Con๏ฌguring PG
โ€ข https://github.com/wanelo-chef/postgres
โ€ข It installs PG in eg /opt/local/postgresql-9.3.2
โ€ข It con๏ฌgures itโ€™s data in /var/pgsql/data93
โ€ข It allows seamless and safe upgrades of minor or major
versions of PostgreSQL, never overwriting binaries
Additional resources online
Proprietary and
โ€ข Josh Berkusโ€™s โ€œ5 steps to PostgreSQL
Performanceโ€ on SlideShare is fantastic
โ€ข PostgreSQL wiki pages on performance tuning
are excellent
โ€ข Run pgBench to determine and compare
performance of systems
http://www.slideshare.net/PGExperts/๏ฌve-steps-perform2013
http://wiki.postgresql.org/wiki/Performance_Optimization
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Proprietary and
Step 8: โ€จ
Bu๏ฌ€er and serialize frequent
updates
Counters, countersโ€ฆ
Proprietary and
โ€ข Problem: products.saves_count is
incremented every time someone saves a
product (by 1)
โ€ข At 100s of inserts/sec, thatโ€™s a lot of updates
How can we reduce number of writes and โ€จ
lock contention?
โ€ข Worse: 100s of concurrent requests trying to
obtain a row level lock on the same popular
product
Bu๏ฌ€ering and serializing
Proprietary and
โ€ข Sidekiq background job framework has two
inter-related features:
โ€ข scheduling in the future (say 10 minutes ahead)
โ€ข UniqueJob extension
โ€ข Once every 10 minutes popular products are updated by
adding a value stored in Redis to the database value, and
resetting Redis value to 0
โ€ข We increment a counter in redis, and enqueue a job
that says โ€œupdate product in 10 minutesโ€
Bu๏ฌ€ering explained
Proprietary and
Save Product
Save Product
Save Product
1. enqueue update
request for product
with a delay
PostgreSQL
Update Request already
on the queue
3. Process Job
Redis Cache
2. increment
counter
4. Read & Reset to 0
5. Update Product
Bu๏ฌ€ering conclusions
Proprietary and
โ€ข If not, to achieve read consistency, we can
display the count as database value + redis
value at read time
โ€ข If we show objects from the database, they
might be sometimes behind on the counter. It
might be okโ€ฆ
Proprietary and
Step 9: โ€จ
Optimize DB schema
MVCC does copy on write
Proprietary and
โ€ข Problem: PostgreSQL rewrites the row for most updates (some
exceptions exist, ie non-indexed column, a counter, timestamp)
โ€ข But we often index these so we can sort by them
โ€ข Rails and Hibernateโ€™s partial updates are not helping
โ€ข Are we updating User on each request? โ€จ
โ€จ
โ€ข So updates can become expensive on wide tables
Schema tricks
Proprietary and
โ€ข Solution: split wide tables into several 1-1
tables to reduce update impact
โ€ข Much less vacuuming required when smaller
tables are frequently updated
Proprietary and
Donโ€™t update anything on each request :)
id
email
encrypted_password
reset_password_token
reset_password_sent_at
remember_created_at
sign_in_count
current_sign_in_at
last_sign_in_at
current_sign_in_ip
last_sign_in_ip
con๏ฌrmation_token
con๏ฌrmed_at
con๏ฌrmation_sent_at
uncon๏ฌrmed_email
failed_attempts
unlock_token
locked_at
authentication_token
created_at
updated_at
username
avatar
state
followers_count
saves_count
collections_count
stores_count
following_count
stories_count
Users
id
email
created_at
username
avatar
state
Users
user_id
encrypted_password
reset_password_token
reset_password_sent_at
remember_created_at
sign_in_count
current_sign_in_at
last_sign_in_at
current_sign_in_ip
last_sign_in_ip
con๏ฌrmation_token
con๏ฌrmed_at
con๏ฌrmation_sent_at
uncon๏ฌrmed_email
failed_attempts
unlock_token
locked_at
authentication_token
updated_at
UserLogins
user_id
followers_count
saves_count
collections_count
stores_count
following_count
stories_count
UserCounts
refactor
Proprietary and
Step 10: โ€จ
Shard Busy Tables Vertically
Vertical sharding
Proprietary and
โ€ข Heavy tables with too many writes, can be
moved into their own separate database
โ€ข For us it was saves: now @ 2B+ rows
โ€ข At hundreds of inserts per second, and 4 indexes,
we were feeling the pain
โ€ข It turns out moving a single table (in Rails) out is
a not a huge e๏ฌ€ort: it took our team 3 days
Vertical sharding - how to
Proprietary and
โ€ข Update code to point to the new database
โ€ข Implement any dynamic Rails association
methods as real methods with 2 fetches
โ€ข ie. save.products becomes a method on Save
model, lookup up Products by IDs
โ€ข Update development and test setup with two
primary databases and ๏ฌx all the tests
Proprietary and
Web App
PostgreSQL
Master (Main Schema)
PostgreSQL
Replica (Main Schema)
Vertically Sharded Database
PostgreSQL
Master (Split Table)
Here the application
connects to main master
DB + replicas, and a single
dedicated DB for the busy
table we moved
Vertical sharding, deploying
Proprietary and
Drop in write IO on the main DB after splitting o๏ฌ€
the high IO table into a dedicated compute node
Proprietary and
For a complete and more detailed account of
our vertical sharding e๏ฌ€ort, please read our
blog post:
http://wanelo.ly/vertical-sharding
Proprietary and
Step 11: โ€จ
Wrap busy tables with services
Splitting o๏ฌ€ services
Proprietary and
โ€ข Vertical Sharding is a great precursor to a
micro-services architecture
โ€ข New service: Sinatra, client and server libs,
updated tests & development, CI, deployment
without changing db schema
โ€ข 2-3 weeks a pair of engineers level of e๏ฌ€ort
โ€ข We already have Saves in another database,
letโ€™s migrate it to a light-weight HTTP service
Adapter pattern to the rescue
Proprietary and
Main App
Unicorn w/ Rails
PostgreSQL
HTTP
Client Adapter
Service App
Unicorn w/Sinatra
Native
Client Adaptor
โ€ข We used Adapter pattern to write two client
adapters: native and HTTP, so we can use the lib,
but not yet switch to HTTP
Services conclusions
Proprietary and
โ€ข Now we can independently scale service
backend, in particular reads by using replicas
โ€ข This prepares us for the next inevitable step:
horizontal sharding
โ€ข At a cost of added request latency, lots of extra
code, extra runtime infrastructure, and 2
weeks of work
โ€ข Do this only if you absolutely have to
Proprietary and
Step 12: โ€จ
Shard Services Backend
Horizontally
Horizontal sharding in ruby
Proprietary and
โ€ข We wanted to stick with PostgreSQL for critical
data such as saves
โ€ข Really liked Instagramโ€™s approach with schemas
โ€ข Built our own schema-based sharding in ruby,
on top of Sequel gem, and open sourced it
โ€ข It supports mapping of physical to logical shards,
and connection pooling
https://github.com/wanelo/sequel-schema-sharding
Schema design for sharding
Proprietary and
https://github.com/wanelo/sequel-schema-sharding
user_id
product_id
collection_id
created_at
index__on_user_id_and_collection_id
UserSaves Sharded by user_id
product_id
user_id
updated_at
index__on_product_id_and_user_id
index__on_product_id_and_updated_at
ProductSaves Sharded by product_id
We needed two lookups, by user_id
and by product_id hence we needed
two tables, independently sharded
Since saves is a join table between
user, product, collection, we did not
need unique ID generated
Composite base62 encoded ID:
fpua-1BrV-1kKEt
Spreading your shards
Proprietary and
โ€ข We split saves into 8192 logical shards,
distributed across 8 PostgreSQL databases
โ€ข Running on 8 virtual zones
spanning 2 physical SSD
servers, 4 per compute node
โ€ข Each database has 1024
schemas (twice, because we
sharded saves into two tables)
https://github.com/wanelo/sequel-schema-sharding
2 x 32-core 256GB RAM
16-drive SSD RAID10+2
PostgreSQL 9.3
1
3 4
2
Proprietary and
Sample
con๏ฌguration of
shard mapping to
physical nodes
with read
replicas,
supported by the
library
Proprietary and
How can we migrate the data from old non-
sharded backend to the new sharded backend
without a long downtime?
New records go to both
Proprietary and
HTTP Service
Old Non-Sharded Backend
New Sharded Backend
1
3 4
2
Read/Write
Background
Worker
Enqueue
Sidekiq Queue
Create Save
Proprietary and
HTTP Service
Old Non-Sharded Backend
New Sharded Backend
1
3 4
2
Read/Write
Background
Worker
Enqueue
Sidekiq Queue
Create Save
Migration Script
Migrate old rows
We migrated several times before we got this rightโ€ฆ
Proprietary and
Swap old and new backends
HTTP Service
Old Non-Sharded Backend
New Sharded Backend
1
3 4
2Read/Write
Background
Worker
Enqueue
Sidekiq Queue
Create Save
Horizontal sharding conclusions
Proprietary and
โ€ข This is the ๏ฌnal destination of any scalable
architecture: just add more boxes
โ€ข Pretty sure we can now scale to 1,000, or 10,000
inserts/second by scaling out
โ€ข Took 2 months of 2 engineers, including migration,
but zero downtime. Itโ€™s an advanced level e๏ฌ€ort and
our engineers really nailed this.
https://github.com/wanelo/sequel-schema-sharding
Putting it all together
Proprietary and
โ€ข This infrastructure complexity is not free
โ€ข It requires new automation, monitoring,
graphing, maintenance and upgrades, and brings
with it a new source of bugs
โ€ข In addition, micro-services can be โ€œownedโ€
by small teams in the future, achieving
organizational autonomy
โ€ข But the advantages are clear when scaling is
one of the requirements
Proprietary and
Systems Diagram
incoming http
requests
8-core 8GB zones
haproxy
nginx
Fastly CDN
cache images, JS
Load Balancers
Amazon S3
Product Images
User Pro๏ฌle Pictures
32-core 256GB 16-drive SSD RAID10+2
Supermicro "Richmond"
SSD Make: Intel DCS3700,
CPU: Intel E5-2690, 2.9GHz
PostgreSQL 9.2
Master
Primary Database Schema
4-core 16GB zones
memcached
User and Product Saves, Horizontally Sharded, Replicated
32-core 256GB RAM
16-drive SSD RAID10+2
PostgreSQL 9.3
1
3 4
2
Read Replicas (non SSD)
2
4 2
1
Read Replica (SSD)
PostgreSQL
Async Replicas
32-core 32GB high-CPU instances
Unicorn
Main Web/API App,
Ruby 2.0
Unicorn
Saves Service
haproxy
pgbouncer
iPhone, Android, Desktop clients Makara distributes DB
load across 3 replicas
and 1 master
MemCached Cluster
Redis Clusters for various custom
user feeds, such as product feed
1-core 1GB zones
twemproxy
Redis Proxy Cluster
16GB high-mem 4-core zones
32 redis instances per server
redis-001
redis-256
8GB High CPU zones
Solr Replica
8GB High CPU zone
Solr Master
App Servers + Admin Servers
Cluster of MemCached Servers
is accessed via Dali fault tolerant library
one or more can go down
Apache Solr Clusters
32-core 32GB high-CPU instances
Sidekiq Background
Worker
Unicorn
Saves Service
haproxy
pgbouncer
to DBs
Solr Reads
Solr Updates
Background Worker Nodes
redis
Redis Sidekiq
Jobs Queue / Bus
Systems Status: Dashboard Monitoring & Graphing with Circonus, NewRelic, statsd, nagios
Backend Stack & Key Vendors
Proprietary and
โ–  MRI Ruby, jRuby, Sinatra, Ruby on Rails
โ–  PostgreSQL, Solr, redis, twemproxyโ€จ
memcached, nginx, haproxy, pgbouncer
โ–  Joyent Cloud, SmartOS, Manta Object Storeโ€จ
ZFS, ARC Cache, superb IO, SMF, Zones, dTrace, humans
โ–  DNSMadeEasy, MessageBus, Chef, SiftScience
โ–  LeanPlum, MixPanel, Graphite analytics, A/B Testing
โ–  AWS S3 + Fastly CDN for user / product images
โ–  Circonus, NewRelic, statsd, Boundary, โ€จ
PagerDuty, nagios:trending / monitoring / alerting
Proprietary and
We are hiring!
DevOps, Ruby, Scalability, iOS & Android
Talk to me after the presentation if you are interested in working
on real scalability problems, and on a product used and loved by millions :)
http://wanelo.com/about/play
Or email play@wanelo.com
Thanks!
โ€จ
github.com/wanelo
github.com/wanelo-chef
โ€จ
wanelo technical blog (srsly awsm)
building.wanelo.com
Proprietary and
@kig
@kig
@kigster
Ad

More Related Content

What's hot (20)

Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
Eugene Fidelin
ย 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
distributedtracing
ย 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonรฉr
ย 
Fig 9-02
Fig 9-02Fig 9-02
Fig 9-02
Hironobu Suzuki
ย 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
ย 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
ย 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
Abdul Manaf
ย 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
PGConf APAC
ย 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with Debezium
Mike Fowler
ย 
Introduction to KSQL: Streaming SQL for Apache Kafkaยฎ
Introduction to KSQL: Streaming SQL for Apache KafkaยฎIntroduction to KSQL: Streaming SQL for Apache Kafkaยฎ
Introduction to KSQL: Streaming SQL for Apache Kafkaยฎ
confluent
ย 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
confluent
ย 
Easy Java Integration Testing with Testcontainersโ€‹
Easy Java Integration Testing with Testcontainersโ€‹Easy Java Integration Testing with Testcontainersโ€‹
Easy Java Integration Testing with Testcontainersโ€‹
Payara
ย 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
MongoDB
ย 
POUG2022_OracleDbNestInsideOut.pptx
POUG2022_OracleDbNestInsideOut.pptxPOUG2022_OracleDbNestInsideOut.pptx
POUG2022_OracleDbNestInsideOut.pptx
Mahmoud Hatem
ย 
MySQL Innovation Day Chicago - MySQL HA So Easy : That's insane !!
MySQL Innovation Day Chicago  - MySQL HA So Easy : That's insane !!MySQL Innovation Day Chicago  - MySQL HA So Easy : That's insane !!
MySQL Innovation Day Chicago - MySQL HA So Easy : That's insane !!
Frederic Descamps
ย 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Igor Anishchenko
ย 
Percona Live 2022 - The Evolution of a MySQL Database System
Percona Live 2022 - The Evolution of a MySQL Database SystemPercona Live 2022 - The Evolution of a MySQL Database System
Percona Live 2022 - The Evolution of a MySQL Database System
Frederic Descamps
ย 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
EDB
ย 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
Patrick McFadin
ย 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
ย 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
Eugene Fidelin
ย 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonรฉr
ย 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
ย 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
ย 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
Abdul Manaf
ย 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
PGConf APAC
ย 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with Debezium
Mike Fowler
ย 
Introduction to KSQL: Streaming SQL for Apache Kafkaยฎ
Introduction to KSQL: Streaming SQL for Apache KafkaยฎIntroduction to KSQL: Streaming SQL for Apache Kafkaยฎ
Introduction to KSQL: Streaming SQL for Apache Kafkaยฎ
confluent
ย 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
confluent
ย 
Easy Java Integration Testing with Testcontainersโ€‹
Easy Java Integration Testing with Testcontainersโ€‹Easy Java Integration Testing with Testcontainersโ€‹
Easy Java Integration Testing with Testcontainersโ€‹
Payara
ย 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
MongoDB
ย 
POUG2022_OracleDbNestInsideOut.pptx
POUG2022_OracleDbNestInsideOut.pptxPOUG2022_OracleDbNestInsideOut.pptx
POUG2022_OracleDbNestInsideOut.pptx
Mahmoud Hatem
ย 
MySQL Innovation Day Chicago - MySQL HA So Easy : That's insane !!
MySQL Innovation Day Chicago  - MySQL HA So Easy : That's insane !!MySQL Innovation Day Chicago  - MySQL HA So Easy : That's insane !!
MySQL Innovation Day Chicago - MySQL HA So Easy : That's insane !!
Frederic Descamps
ย 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Igor Anishchenko
ย 
Percona Live 2022 - The Evolution of a MySQL Database System
Percona Live 2022 - The Evolution of a MySQL Database SystemPercona Live 2022 - The Evolution of a MySQL Database System
Percona Live 2022 - The Evolution of a MySQL Database System
Frederic Descamps
ย 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
EDB
ย 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
Patrick McFadin
ย 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
ย 

Viewers also liked (20)

From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
Konstantin Gredeskoul
ย 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
ย 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
ย 
Lessons PostgreSQL learned from commercial databases, and didnโ€™t
Lessons PostgreSQL learned from commercial databases, and didnโ€™tLessons PostgreSQL learned from commercial databases, and didnโ€™t
Lessons PostgreSQL learned from commercial databases, and didnโ€™t
PGConf APAC
ย 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
ย 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
Vladislav Bauer
ย 
Scaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six MonthsScaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six Months
Konstantin Gredeskoul
ย 
ๆทบๅ…ฅๆทบๅ‡บ MySQL & PostgreSQL
ๆทบๅ…ฅๆทบๅ‡บ MySQL & PostgreSQLๆทบๅ…ฅๆทบๅ‡บ MySQL & PostgreSQL
ๆทบๅ…ฅๆทบๅ‡บ MySQL & PostgreSQL
Yi-Feng Tzeng
ย 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB
ย 
Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)
Konstantin Gredeskoul
ย 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
Ross Snyder
ย 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
Sean Chittenden
ย 
PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?
Ohyama Masanori
ย 
(JP) GPGPUใŒPostgreSQLใ‚’ๅŠ ้€Ÿใ™ใ‚‹
(JP) GPGPUใŒPostgreSQLใ‚’ๅŠ ้€Ÿใ™ใ‚‹(JP) GPGPUใŒPostgreSQLใ‚’ๅŠ ้€Ÿใ™ใ‚‹
(JP) GPGPUใŒPostgreSQLใ‚’ๅŠ ้€Ÿใ™ใ‚‹
Kohei KaiGai
ย 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
Konstantin Gredeskoul
ย 
็ฌฌ51ๅ›žNDS PostgreSQLใฎใƒ†ใ‚™ใƒผใ‚ฟๅž‹ #nds51
็ฌฌ51ๅ›žNDS PostgreSQLใฎใƒ†ใ‚™ใƒผใ‚ฟๅž‹ #nds51็ฌฌ51ๅ›žNDS PostgreSQLใฎใƒ†ใ‚™ใƒผใ‚ฟๅž‹ #nds51
็ฌฌ51ๅ›žNDS PostgreSQLใฎใƒ†ใ‚™ใƒผใ‚ฟๅž‹ #nds51
civicpg
ย 
Caching, Memcached And Rails
Caching, Memcached And RailsCaching, Memcached And Rails
Caching, Memcached And Rails
guestac752c
ย 
CTO School Meetup - Jan 2013 Becoming Better Technical Leader
CTO School Meetup - Jan 2013   Becoming Better Technical LeaderCTO School Meetup - Jan 2013   Becoming Better Technical Leader
CTO School Meetup - Jan 2013 Becoming Better Technical Leader
Jean Barmash
ย 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
ย 
System integration through queues
System integration through queuesSystem integration through queues
System integration through queues
Gianluca Padovani
ย 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
Konstantin Gredeskoul
ย 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
ย 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
ย 
Lessons PostgreSQL learned from commercial databases, and didnโ€™t
Lessons PostgreSQL learned from commercial databases, and didnโ€™tLessons PostgreSQL learned from commercial databases, and didnโ€™t
Lessons PostgreSQL learned from commercial databases, and didnโ€™t
PGConf APAC
ย 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
Vladislav Bauer
ย 
Scaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six MonthsScaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six Months
Konstantin Gredeskoul
ย 
ๆทบๅ…ฅๆทบๅ‡บ MySQL & PostgreSQL
ๆทบๅ…ฅๆทบๅ‡บ MySQL & PostgreSQLๆทบๅ…ฅๆทบๅ‡บ MySQL & PostgreSQL
ๆทบๅ…ฅๆทบๅ‡บ MySQL & PostgreSQL
Yi-Feng Tzeng
ย 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB
ย 
Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)
Konstantin Gredeskoul
ย 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
Ross Snyder
ย 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
Sean Chittenden
ย 
PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?
Ohyama Masanori
ย 
(JP) GPGPUใŒPostgreSQLใ‚’ๅŠ ้€Ÿใ™ใ‚‹
(JP) GPGPUใŒPostgreSQLใ‚’ๅŠ ้€Ÿใ™ใ‚‹(JP) GPGPUใŒPostgreSQLใ‚’ๅŠ ้€Ÿใ™ใ‚‹
(JP) GPGPUใŒPostgreSQLใ‚’ๅŠ ้€Ÿใ™ใ‚‹
Kohei KaiGai
ย 
็ฌฌ51ๅ›žNDS PostgreSQLใฎใƒ†ใ‚™ใƒผใ‚ฟๅž‹ #nds51
็ฌฌ51ๅ›žNDS PostgreSQLใฎใƒ†ใ‚™ใƒผใ‚ฟๅž‹ #nds51็ฌฌ51ๅ›žNDS PostgreSQLใฎใƒ†ใ‚™ใƒผใ‚ฟๅž‹ #nds51
็ฌฌ51ๅ›žNDS PostgreSQLใฎใƒ†ใ‚™ใƒผใ‚ฟๅž‹ #nds51
civicpg
ย 
Caching, Memcached And Rails
Caching, Memcached And RailsCaching, Memcached And Rails
Caching, Memcached And Rails
guestac752c
ย 
CTO School Meetup - Jan 2013 Becoming Better Technical Leader
CTO School Meetup - Jan 2013   Becoming Better Technical LeaderCTO School Meetup - Jan 2013   Becoming Better Technical Leader
CTO School Meetup - Jan 2013 Becoming Better Technical Leader
Jean Barmash
ย 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
ย 
System integration through queues
System integration through queuesSystem integration through queues
System integration through queues
Gianluca Padovani
ย 
Ad

Similar to 12-Step Program for Scaling Web Applications on PostgreSQL (20)

Optimization of modern web applications
Optimization of modern web applicationsOptimization of modern web applications
Optimization of modern web applications
Eugene Lazutkin
ย 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
ย 
PaaS with Java
PaaS with JavaPaaS with Java
PaaS with Java
Eberhard Wolff
ย 
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
AD113  Speed Up Your Applications w/ Nginx and PageSpeedAD113  Speed Up Your Applications w/ Nginx and PageSpeed
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
edm00se
ย 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
Aleksandar Bozinovski
ย 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
Matteo Moretti
ย 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
Chapter Three
ย 
Profiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty DetailsProfiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty Details
Achievers Tech
ย 
Cloud Platforms for Java
Cloud Platforms for JavaCloud Platforms for Java
Cloud Platforms for Java
3Pillar Global
ย 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
ย 
Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and Joomla
George Wilson
ย 
Performance stack
Performance stackPerformance stack
Performance stack
Shayne Bartlett
ย 
Orlando DNN Usergroup Pres 12/06/11
Orlando DNN Usergroup Pres 12/06/11Orlando DNN Usergroup Pres 12/06/11
Orlando DNN Usergroup Pres 12/06/11
Jess Coburn
ย 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
cherryhillco
ย 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
Grig Gheorghiu
ย 
Navigating SAPโ€™s Integration Options (Mastering SAP Technologies 2013)
Navigating SAPโ€™s Integration Options (Mastering SAP Technologies 2013)Navigating SAPโ€™s Integration Options (Mastering SAP Technologies 2013)
Navigating SAPโ€™s Integration Options (Mastering SAP Technologies 2013)
Sascha Wenninger
ย 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
ย 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
ย 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
Dr. Ketan Parmar
ย 
Getting Started on Google Cloud Platform
Getting Started on Google Cloud PlatformGetting Started on Google Cloud Platform
Getting Started on Google Cloud Platform
Aaron Taylor
ย 
Optimization of modern web applications
Optimization of modern web applicationsOptimization of modern web applications
Optimization of modern web applications
Eugene Lazutkin
ย 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
ย 
PaaS with Java
PaaS with JavaPaaS with Java
PaaS with Java
Eberhard Wolff
ย 
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
AD113  Speed Up Your Applications w/ Nginx and PageSpeedAD113  Speed Up Your Applications w/ Nginx and PageSpeed
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
edm00se
ย 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
Aleksandar Bozinovski
ย 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
Matteo Moretti
ย 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
Chapter Three
ย 
Profiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty DetailsProfiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty Details
Achievers Tech
ย 
Cloud Platforms for Java
Cloud Platforms for JavaCloud Platforms for Java
Cloud Platforms for Java
3Pillar Global
ย 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
ย 
Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and Joomla
George Wilson
ย 
Performance stack
Performance stackPerformance stack
Performance stack
Shayne Bartlett
ย 
Orlando DNN Usergroup Pres 12/06/11
Orlando DNN Usergroup Pres 12/06/11Orlando DNN Usergroup Pres 12/06/11
Orlando DNN Usergroup Pres 12/06/11
Jess Coburn
ย 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
cherryhillco
ย 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
Grig Gheorghiu
ย 
Navigating SAPโ€™s Integration Options (Mastering SAP Technologies 2013)
Navigating SAPโ€™s Integration Options (Mastering SAP Technologies 2013)Navigating SAPโ€™s Integration Options (Mastering SAP Technologies 2013)
Navigating SAPโ€™s Integration Options (Mastering SAP Technologies 2013)
Sascha Wenninger
ย 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
ย 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
ย 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
Dr. Ketan Parmar
ย 
Getting Started on Google Cloud Platform
Getting Started on Google Cloud PlatformGetting Started on Google Cloud Platform
Getting Started on Google Cloud Platform
Aaron Taylor
ย 
Ad

Recently uploaded (20)

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
ย 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
ย 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
ย 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
ย 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
ย 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
ย 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
ย 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
ย 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
ย 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
ย 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
ย 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
ย 
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
ย 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
ย 
Buckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug LogsBuckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug Logs
Lynda Kane
ย 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
ย 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
ย 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
ย 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
ย 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
ย 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
ย 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
ย 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
ย 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
ย 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
ย 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
ย 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
ย 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
ย 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
ย 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
ย 
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership โ€” the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
ย 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
ย 
Buckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug LogsBuckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug Logs
Lynda Kane
ย 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
ย 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
ย 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
ย 

12-Step Program for Scaling Web Applications on PostgreSQL

  • 1. Proprietary and WHILE THE OLD PRESENTATION IS STILL AVAILABLE AFTER THIS SLIDE, I THINK YOU SHOULD SEE THE NEW ONE INSTEAD! PLEASE CLICK ON THE LINK: http://bit.ly/1MXxlBL WARNING: There is a more recent and updated version of this talk โ€จ available on slideshare, itโ€™s called: โ€œFROM OBVIOUS TO INGENIOUS: โ€จ INCREMENTALLY SCALING WEB APPLICATIONS ON POSTGRESQLโ€ presented at
  • 2. Proprietary and Con๏ฌdential Konstantin Gredeskoulโ€จ CTO, Wanelo.com 12-Step Program for Scaling Web Applications on PostgreSQL @kig @kigster
  • 3. Proprietary and And why should you care? What does it mean, to scale on top of PostgreSQL?
  • 4. Proprietary and Scaling means supporting more work load concurrently, where work is often interchanged with users But why on PostgreSQL? Because NoNoSQL is hawt! (again)
  • 5. Proprietary and Relational databases are great at supporting constant change in software They are not as great in โ€œpure scalingโ€, like RIAK or Cassandra So the choice critically depends on what you are trying to build
  • 6. Proprietary and Large majority of web applications are represented extremely well by the relational model So if I need to build a new product or a service, my default choice would be PostgreSQL for critical data, + whatever else as needed
  • 7. Proprietary and This presentation is a walk-through ๏ฌlled with practical solutions Itโ€™s based on a story of scaling wanelo.com to sustain 10s of thousand concurrent users, 3k req/sec Not Twitter/Facebook scale, but stillโ€ฆ So letโ€™s explore the application to learn a bit about wanelo for our scalability journey
  • 8. Proprietary and Founded in 2010, Wanelo (โ€œwah-nee-loh,โ€ from Want, Need, Love) is a community and a social network for all of the world's shopping. Wanelo is a home to 12M products, millions of users, 200K+ stores, and products on Wanelo have been saved into collections โ€จ over 2B times
  • 9. โ€ข move fast with product development โ€ข scale as needed, stay ahead of the curve โ€ข keep overall costs low โ€ข but spend where it matters โ€ข automate everything โ€ข avoid reinventing the wheel โ€ข learn as we go โ€ข remain in control of our infrastructure Early on we wanted to:
  • 10. Heroku or Not? Proprietary and Assuming we want full control of our application layer, places like Heroku arenโ€™t a great ๏ฌt But Heroku can be a great place to start. It all depends on the size and complexity of the app we are building. Ours would have been cost prohibitive.
  • 11. Foundations of web apps Proprietary and โ€ข app server (we use unicorn) โ€ข scalable web server in front (we use nginx) โ€ข database (we use postgresql) โ€ข hosting environment (we use Joyent Cloud) โ€ข deployment tools (capistrano) โ€ข server con๏ฌguration tools (we use chef) โ€ข programming language + framework (RoR) โ€ข many others, such as monitoring, alerting
  • 12. Letโ€™s reviewโ€ฆ Basic Web App Proprietary and /var/pgsql/data incoming http PostgreSQL Server /home/user/app/current/public nginx Unicorn / Passenger Ruby VM N x Unicorns Ruby VM โ€ข no redundancy, no caching (yet) โ€ข can only process N concurrent requests โ€ข nginx will serve static assets, deal with slow clients โ€ข web sessions probably in the DB or cookie
  • 13. First optimizations: โ€จ cheap early on, well worth it Proprietary and โ€ข Personalization via AJAX, so controller actions can be cached entirely using caches_action โ€ข Page returned unpersonalized, additional AJAX request loads personalization
  • 14. A few more basic performance tweaks that go a long way Proprietary and โ€ข Install 2+ memcached servers for caching and use Dalli gem to connect to it for redundancy โ€ข Switch to memcached-based web sessions. Use sessions sparingly, assume transient nature โ€ข Setup CDN for asset_host and any user generated content. We use fastly.com โ€ข Redis is also an option, but I prefer memcached for redundancy
  • 15. Proprietary and browser PostgreSQL Server /home/user/app/current/public nginx Unicorn / Passenger Ruby VM N x Unicorns Ruby VM memcachedCDN cache images, JS Caching goes a long wayโ€ฆ โ€ข geo distribute and cache your UGC and CSS/JS assets โ€ข cache html and serialize objects in memcached โ€ข can increase TTL to alleviate load, if tra๏ฌƒc spikes
  • 16. Proprietary and Adding basic redundancy โ€ข Multiple app servers require haproxy between nginx and unicorn โ€ข Multiple long-running tasks (such as posting to Facebook or Twitter) require background job processing framework โ€ข Multiple load balancers require DNS round robin and short TTL (dyn.com)
  • 17. Proprietary and PostgreSQL Unicorn / Passenger Ruby VM (times N) haproxy incoming http DNS round robin or failover / HA solution nginx memcached redis CDN cache images, JS Load Balancers App Servers single DB Object Store User Generated Content Sidekiq / Resque Background WorkersData stores Transient to Permanent this architecture can horizontally scale up as far the database at itโ€™s center every other component can be scaled by adding more of it, to handle more tra๏ฌƒc
  • 18. Proprietary and As long as we can scale the data store on the backend, we can scale the app! Mostly :) At some point we may hit a limit on TCP/IP network throughput, # of connections, but this is at a whole another scale level
  • 19. The tra๏ฌƒc keeps climbingโ€ฆ
  • 20. Performance limits are near Proprietary and โ€ข First signs of performance problems start creeping up โ€ข Symptoms of read scalability problems โ€ข Pages load slowly or timeout โ€ข Users are getting 503 Service Unavailable โ€ข Database is slammed (very high CPU or read IO) โ€ข Symptoms of write scalability problems โ€ข Database write IO is maxed out, CPU is not โ€ข Update operations are waiting on each other, piling up โ€ข Application โ€œlocks upโ€, timeouts โ€ข Replicas are not catching up โ€ข Some pages load (cached?), some donโ€™t
  • 21. Proprietary and Both situations may easily result in downtime
  • 22. Proprietary and Even though we achieved 99.99% uptime in 2013, in 2014 we had a couple short downtimes caused by overloaded replica that lasted around 5 minutes. But users quickly noticeโ€ฆ
  • 25. Proprietary and Common patterns for scaling high tra๏ฌƒc web applications, based on wanelo.com โ€จ 12-Step Program for curing your dependency on slow application latency
  • 26. Proprietary and โ€ข For small / fast HTTP services, 10-12ms or lower โ€ข If your app is high tra๏ฌƒc (100K+ RPM) I recommend 80ms or lower Whatโ€™s a good latency?
  • 27. Proprietary and โ€ข RubyVM (30ms) + Garbage collection (6ms) is CPU burn, easy to scale by adding more app servers โ€ข Web services + Solr (25ms), memcached (15ms), database (6ms) are all waiting on IO CPU burn vs Waiting on IO?
  • 28. Proprietary and Step 1: โ€จ Add More Cache!
  • 29. Moar Cache!!! Proprietary and โ€ขAnything that can be cached, should be โ€ขCache hit = many database hits avoided โ€ขHit rate of 17% still saves DB hits โ€ขWe can cache many types of thingsโ€ฆ โ€ขCache is cheap and fast (memcached)
  • 30. Cache many types of things Proprietary and โ€ข caches_action in controllers is very e๏ฌ€ective โ€ข fragment caches of reusable widgets โ€ข we use gem Compositor for JSON API. We cache serialized object fragments, grab them from memcached using multi_get and merge them โ€ข Shopify open sourced IdentityCache, which caches AR models, so you can Product.fetch(id) https://github.com/wanelo/compositorโ€จ https://github.com/Shopify/identity_cache
  • 31. But Caching has itโ€™s issues Proprietary and โ€ข Expiring cache is not easy โ€ข CacheSweepers in Rails help โ€ข We found ourselves doing 4000 memcached deletes in a single request! โ€ข Could defer expiring caches to background jobs, or use TTL if possible โ€ข But we can cache even outside of our app: โ€จ we cache JSON API responses using CDN (fastly.com)
  • 32. Proprietary and Step 2: โ€จ Optimize SQL
  • 33. SQL Optimization โ€ข Find slow SQL (>100ms) and either remove it, cache the hell out of it, or ๏ฌx/rewrite the query โ€ข Enable slow query log in postgresql.conf:โ€จ โ€จ log_min_duration_statement = 80โ€จ log_temp_files = 0 โ€ข pg_stat_statements is an invaluable contrib module:โ€จ โ€จ
  • 34. Fixing Slow Query Proprietary and Run explain plan to understand how DB runs the query Are there adequate indexes for the query? Is the database using appropriate index? Has the table been recently analyzed? Can a complex join be simpli๏ฌed into a subselect? Can this query use an index-only scan? Can โ€œorder byโ€ column be added to the index? pg_stat_user_indexes and pg_stat_user_tables for seq scans, unused indexes, cache info
  • 35. SQL Optimization, ctd Proprietary and โ€ข Instrumentation software such as NewRelic shows slow queries, with explain plans, and time consuming transactions
  • 37. Proprietary and One day, I noticed lots of temp ๏ฌles created in the postgres.log
  • 38. Proprietary and Letโ€™s run this queryโ€ฆ This join takes a whole second to return :(
  • 41. Proprietary and So our index is partial, only on state = โ€˜activeโ€™ So this query is a full table scanโ€ฆ But there state isnโ€™t used in the query, a bug? Letโ€™s add state = โ€˜activeโ€™ It was meant to be there anyway
  • 42. Proprietary and After adding extra condition IO drops signi๏ฌcantly:
  • 43. Proprietary and Step 3: โ€จ Upgrade Hardware and RAM
  • 44. Hardware + RAM Proprietary and โ€ข Sounds obvious, but better or faster hardware is an obvious choice when scaling out โ€ข Large RAM will be used as ๏ฌle system cache โ€ข On Joyentโ€™s SmartOS ARC FS cache is very e๏ฌ€ective โ€ข shared_buffers should be set to 25% of RAM or 12GB, whichever is smaller โ€ข Using fast SSD disk array can make a huge di๏ฌ€erence โ€ข Joyentโ€™s native 16-disk RAID managed by ZFS instead of controller provides excellent performance
  • 45. Hardware in the cloud Proprietary and โ€ข SSD o๏ฌ€erings from Joyent and AWS โ€ข Joyents โ€œmaxโ€ SSD node $12.9/hr โ€ข AWS โ€œmaxโ€ SSD node $6.8/hr
  • 46. So whoโ€™s better? Proprietary and โ€ข JOYENT โ€ข 16 SSD drives: RAID10 + 2 โ€ข SSD Make: DCS3700 โ€ข CPU: E5-2690โ€จ 2.9GHz โ€ข AWS โ€ข 8 SSD drives โ€ข SSD Make: ? โ€ข CPU: E5-2670โ€จ 2.6Ghz Perhaps you get what you pay for after allโ€ฆ.
  • 47. Proprietary and Step 4: โ€จ Scale Reads by Replication
  • 48. Scale Reads by Replication Proprietary and โ€ข postgresql.conf (both master & replica) โ€ข These settings have been tuned for SmartOS and our application requirements (thanks PGExperts!)
  • 49. How to distribute reads? Proprietary and โ€ข Some people have success using this setup for reads:โ€จ app haproxy pgBouncer replicaโ€จ pgBouncer replica โ€ข Iโ€™d like to try this method eventually, but we choose to deal with distributing read tra๏ฌƒc at the application level โ€ข We tried many ruby-based solutions that claimed to do this well, but many werenโ€™t production ready
  • 50. Proprietary and โ€ข Makara is a ruby gem from TaskRabbit that we ported from MySQL to PostgreSQL for sending reads to replicas โ€ข Was the simplest library to understand, and port to PG โ€ข Worked in the multi-threaded environment of Sidekiq Background Workers โ€ข automatically retries if replica goes down โ€ข load balances with weights โ€ข Was running in production
  • 51. Special considerations Proprietary and โ€ข Application must be tuned to support eventual consistency. Data may not yet be on replica! โ€ข Must explicitly force fetch from the master DB when itโ€™s critical (i.e. after a user accountโ€™s creation) โ€ข We often use below pattern of ๏ฌrst trying the fetch, if nothing found retry on master db
  • 52. Replicas can specialize Proprietary and โ€ข Background Workers can use dedicated replica not shared with the app servers, to optimize hit rate for ๏ฌle system cache (ARC) on both replicas PostgreSQL Master Unicorn / Passenger Ruby VM (times N) App Servers Sidekiq / Resque Background Workers PostgreSQL Replica 1 PostgreSQL Replica 2 PostgreSQL Replica 3 ARC cache warm with queries from web traf๏ฌc ARC cache warm with background job queries
  • 53. Big heavy reads go there Proprietary and โ€ข Long heavy queries should run by the background jobs against a dedicated replica, to isolate their e๏ฌ€ect on web tra๏ฌƒc PostgreSQL Master Sidekiq / Resque Background Workers PostgreSQL Replica 1 โ€ข Each type of load will produce a unique set of data cached by the ๏ฌle system
  • 54. Proprietary and Step 5: โ€จ Use more appropriate tools
  • 55. Leveraging other tools Proprietary and Not every type of data is well suited for storing in a relational DB, even though initially it may be convenient โ€ข Redis is a great data store for transient or semi- persistent data with list, hash or set semantics โ€ข We use it for ActivityFeed by precomputing each feed at write time. But we can regenerate it if the data is lost from Redis โ€ข We use twemproxy in front of Redis which provides automatic horizontal sharding and connection pooling. โ€ข We run clusters of 256 redis shards across many virtual zones; sharded redis instances use many cores, instead of one โ€ข Solr is great for full text search, and deep paginated sorted lists, such as trending, or related products
  • 56. Proprietary and True story: applying WAL logs onโ€จ replicas creates signi๏ฌcant disk write load But we still have single master DB taking all the writesโ€ฆ Replicas are unable to both serve live tra๏ฌƒc and โ€จ catch up on replication. They fall behind. Back to PostgreSQL
  • 57. Proprietary and When replicas fall behind, application generates errors, unable to ๏ฌnd data it expects
  • 58. Proprietary and Step 6: โ€จ Move write-heavy tables out: Replace with non-DB solutions
  • 59. Move event log out Proprietary and โ€ข We were appending all user events into this table โ€ข We were generating millions of rows per day! โ€ข We solved it by replacing user event recording system to use rsyslog, appending to ASCII ๏ฌles โ€ข We discovered from pg_stat_user_tables top table by write volume was user_events Itโ€™s cheap, reliable and scalable We now use Joyentโ€™s Manta to analyze this data in parallel. Manta is an object store + native compute on
  • 60. Proprietary and For more information about how we migrated user events to a ๏ฌle-based append-only log, and analyze it with Manta, please read http://wanelo.ly/event-collection
  • 61. Proprietary and Step 7: โ€จ Tune PostgreSQL and your Filesystem
  • 62. Tuning ZFS Proprietary and โ€ข Problem: zones (virtual hosts) with โ€œwrite problemsโ€ appeared to be writing 16 times more data to disk, compared to what virtual ๏ฌle system reports โ€ข vfsstat says 8Mb/sec write volume โ€ข So whatโ€™s going on? โ€ข iostat says 128Mb/sec is actually written to disk
  • 63. Proprietary and โ€ข Turns out default ZFS block size is 128Kb, and PostgreSQL page size is 8Kb. โ€ข Every small write that touched a page, had to write 128Kb of a ZFS block to the disk Tuning Filesystem โ€ข This may be good for huge sequential writes, but not for random access, lots of tiny writes
  • 64. Proprietary and โ€ข Solution: Joyent changed ZFS block size for our zone, iostat write volume dropped to 8Mb/sec โ€ข We also added commit_delay Tuning ZFS & PgSQL
  • 65. Proprietary and โ€ข Many such settings are pre-de๏ฌned in our open-source Chef cookbook for installing PostgreSQL from sources Installing and Con๏ฌguring PG โ€ข https://github.com/wanelo-chef/postgres โ€ข It installs PG in eg /opt/local/postgresql-9.3.2 โ€ข It con๏ฌgures itโ€™s data in /var/pgsql/data93 โ€ข It allows seamless and safe upgrades of minor or major versions of PostgreSQL, never overwriting binaries
  • 66. Additional resources online Proprietary and โ€ข Josh Berkusโ€™s โ€œ5 steps to PostgreSQL Performanceโ€ on SlideShare is fantastic โ€ข PostgreSQL wiki pages on performance tuning are excellent โ€ข Run pgBench to determine and compare performance of systems http://www.slideshare.net/PGExperts/๏ฌve-steps-perform2013 http://wiki.postgresql.org/wiki/Performance_Optimization http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
  • 67. Proprietary and Step 8: โ€จ Bu๏ฌ€er and serialize frequent updates
  • 68. Counters, countersโ€ฆ Proprietary and โ€ข Problem: products.saves_count is incremented every time someone saves a product (by 1) โ€ข At 100s of inserts/sec, thatโ€™s a lot of updates How can we reduce number of writes and โ€จ lock contention? โ€ข Worse: 100s of concurrent requests trying to obtain a row level lock on the same popular product
  • 69. Bu๏ฌ€ering and serializing Proprietary and โ€ข Sidekiq background job framework has two inter-related features: โ€ข scheduling in the future (say 10 minutes ahead) โ€ข UniqueJob extension โ€ข Once every 10 minutes popular products are updated by adding a value stored in Redis to the database value, and resetting Redis value to 0 โ€ข We increment a counter in redis, and enqueue a job that says โ€œupdate product in 10 minutesโ€
  • 70. Bu๏ฌ€ering explained Proprietary and Save Product Save Product Save Product 1. enqueue update request for product with a delay PostgreSQL Update Request already on the queue 3. Process Job Redis Cache 2. increment counter 4. Read & Reset to 0 5. Update Product
  • 71. Bu๏ฌ€ering conclusions Proprietary and โ€ข If not, to achieve read consistency, we can display the count as database value + redis value at read time โ€ข If we show objects from the database, they might be sometimes behind on the counter. It might be okโ€ฆ
  • 72. Proprietary and Step 9: โ€จ Optimize DB schema
  • 73. MVCC does copy on write Proprietary and โ€ข Problem: PostgreSQL rewrites the row for most updates (some exceptions exist, ie non-indexed column, a counter, timestamp) โ€ข But we often index these so we can sort by them โ€ข Rails and Hibernateโ€™s partial updates are not helping โ€ข Are we updating User on each request? โ€จ โ€จ โ€ข So updates can become expensive on wide tables
  • 74. Schema tricks Proprietary and โ€ข Solution: split wide tables into several 1-1 tables to reduce update impact โ€ข Much less vacuuming required when smaller tables are frequently updated
  • 75. Proprietary and Donโ€™t update anything on each request :) id email encrypted_password reset_password_token reset_password_sent_at remember_created_at sign_in_count current_sign_in_at last_sign_in_at current_sign_in_ip last_sign_in_ip con๏ฌrmation_token con๏ฌrmed_at con๏ฌrmation_sent_at uncon๏ฌrmed_email failed_attempts unlock_token locked_at authentication_token created_at updated_at username avatar state followers_count saves_count collections_count stores_count following_count stories_count Users id email created_at username avatar state Users user_id encrypted_password reset_password_token reset_password_sent_at remember_created_at sign_in_count current_sign_in_at last_sign_in_at current_sign_in_ip last_sign_in_ip con๏ฌrmation_token con๏ฌrmed_at con๏ฌrmation_sent_at uncon๏ฌrmed_email failed_attempts unlock_token locked_at authentication_token updated_at UserLogins user_id followers_count saves_count collections_count stores_count following_count stories_count UserCounts refactor
  • 76. Proprietary and Step 10: โ€จ Shard Busy Tables Vertically
  • 77. Vertical sharding Proprietary and โ€ข Heavy tables with too many writes, can be moved into their own separate database โ€ข For us it was saves: now @ 2B+ rows โ€ข At hundreds of inserts per second, and 4 indexes, we were feeling the pain โ€ข It turns out moving a single table (in Rails) out is a not a huge e๏ฌ€ort: it took our team 3 days
  • 78. Vertical sharding - how to Proprietary and โ€ข Update code to point to the new database โ€ข Implement any dynamic Rails association methods as real methods with 2 fetches โ€ข ie. save.products becomes a method on Save model, lookup up Products by IDs โ€ข Update development and test setup with two primary databases and ๏ฌx all the tests
  • 79. Proprietary and Web App PostgreSQL Master (Main Schema) PostgreSQL Replica (Main Schema) Vertically Sharded Database PostgreSQL Master (Split Table) Here the application connects to main master DB + replicas, and a single dedicated DB for the busy table we moved
  • 80. Vertical sharding, deploying Proprietary and Drop in write IO on the main DB after splitting o๏ฌ€ the high IO table into a dedicated compute node
  • 81. Proprietary and For a complete and more detailed account of our vertical sharding e๏ฌ€ort, please read our blog post: http://wanelo.ly/vertical-sharding
  • 82. Proprietary and Step 11: โ€จ Wrap busy tables with services
  • 83. Splitting o๏ฌ€ services Proprietary and โ€ข Vertical Sharding is a great precursor to a micro-services architecture โ€ข New service: Sinatra, client and server libs, updated tests & development, CI, deployment without changing db schema โ€ข 2-3 weeks a pair of engineers level of e๏ฌ€ort โ€ข We already have Saves in another database, letโ€™s migrate it to a light-weight HTTP service
  • 84. Adapter pattern to the rescue Proprietary and Main App Unicorn w/ Rails PostgreSQL HTTP Client Adapter Service App Unicorn w/Sinatra Native Client Adaptor โ€ข We used Adapter pattern to write two client adapters: native and HTTP, so we can use the lib, but not yet switch to HTTP
  • 85. Services conclusions Proprietary and โ€ข Now we can independently scale service backend, in particular reads by using replicas โ€ข This prepares us for the next inevitable step: horizontal sharding โ€ข At a cost of added request latency, lots of extra code, extra runtime infrastructure, and 2 weeks of work โ€ข Do this only if you absolutely have to
  • 86. Proprietary and Step 12: โ€จ Shard Services Backend Horizontally
  • 87. Horizontal sharding in ruby Proprietary and โ€ข We wanted to stick with PostgreSQL for critical data such as saves โ€ข Really liked Instagramโ€™s approach with schemas โ€ข Built our own schema-based sharding in ruby, on top of Sequel gem, and open sourced it โ€ข It supports mapping of physical to logical shards, and connection pooling https://github.com/wanelo/sequel-schema-sharding
  • 88. Schema design for sharding Proprietary and https://github.com/wanelo/sequel-schema-sharding user_id product_id collection_id created_at index__on_user_id_and_collection_id UserSaves Sharded by user_id product_id user_id updated_at index__on_product_id_and_user_id index__on_product_id_and_updated_at ProductSaves Sharded by product_id We needed two lookups, by user_id and by product_id hence we needed two tables, independently sharded Since saves is a join table between user, product, collection, we did not need unique ID generated Composite base62 encoded ID: fpua-1BrV-1kKEt
  • 89. Spreading your shards Proprietary and โ€ข We split saves into 8192 logical shards, distributed across 8 PostgreSQL databases โ€ข Running on 8 virtual zones spanning 2 physical SSD servers, 4 per compute node โ€ข Each database has 1024 schemas (twice, because we sharded saves into two tables) https://github.com/wanelo/sequel-schema-sharding 2 x 32-core 256GB RAM 16-drive SSD RAID10+2 PostgreSQL 9.3 1 3 4 2
  • 90. Proprietary and Sample con๏ฌguration of shard mapping to physical nodes with read replicas, supported by the library
  • 91. Proprietary and How can we migrate the data from old non- sharded backend to the new sharded backend without a long downtime?
  • 92. New records go to both Proprietary and HTTP Service Old Non-Sharded Backend New Sharded Backend 1 3 4 2 Read/Write Background Worker Enqueue Sidekiq Queue Create Save
  • 93. Proprietary and HTTP Service Old Non-Sharded Backend New Sharded Backend 1 3 4 2 Read/Write Background Worker Enqueue Sidekiq Queue Create Save Migration Script Migrate old rows We migrated several times before we got this rightโ€ฆ
  • 94. Proprietary and Swap old and new backends HTTP Service Old Non-Sharded Backend New Sharded Backend 1 3 4 2Read/Write Background Worker Enqueue Sidekiq Queue Create Save
  • 95. Horizontal sharding conclusions Proprietary and โ€ข This is the ๏ฌnal destination of any scalable architecture: just add more boxes โ€ข Pretty sure we can now scale to 1,000, or 10,000 inserts/second by scaling out โ€ข Took 2 months of 2 engineers, including migration, but zero downtime. Itโ€™s an advanced level e๏ฌ€ort and our engineers really nailed this. https://github.com/wanelo/sequel-schema-sharding
  • 96. Putting it all together Proprietary and โ€ข This infrastructure complexity is not free โ€ข It requires new automation, monitoring, graphing, maintenance and upgrades, and brings with it a new source of bugs โ€ข In addition, micro-services can be โ€œownedโ€ by small teams in the future, achieving organizational autonomy โ€ข But the advantages are clear when scaling is one of the requirements
  • 97. Proprietary and Systems Diagram incoming http requests 8-core 8GB zones haproxy nginx Fastly CDN cache images, JS Load Balancers Amazon S3 Product Images User Pro๏ฌle Pictures 32-core 256GB 16-drive SSD RAID10+2 Supermicro "Richmond" SSD Make: Intel DCS3700, CPU: Intel E5-2690, 2.9GHz PostgreSQL 9.2 Master Primary Database Schema 4-core 16GB zones memcached User and Product Saves, Horizontally Sharded, Replicated 32-core 256GB RAM 16-drive SSD RAID10+2 PostgreSQL 9.3 1 3 4 2 Read Replicas (non SSD) 2 4 2 1 Read Replica (SSD) PostgreSQL Async Replicas 32-core 32GB high-CPU instances Unicorn Main Web/API App, Ruby 2.0 Unicorn Saves Service haproxy pgbouncer iPhone, Android, Desktop clients Makara distributes DB load across 3 replicas and 1 master MemCached Cluster Redis Clusters for various custom user feeds, such as product feed 1-core 1GB zones twemproxy Redis Proxy Cluster 16GB high-mem 4-core zones 32 redis instances per server redis-001 redis-256 8GB High CPU zones Solr Replica 8GB High CPU zone Solr Master App Servers + Admin Servers Cluster of MemCached Servers is accessed via Dali fault tolerant library one or more can go down Apache Solr Clusters 32-core 32GB high-CPU instances Sidekiq Background Worker Unicorn Saves Service haproxy pgbouncer to DBs Solr Reads Solr Updates Background Worker Nodes redis Redis Sidekiq Jobs Queue / Bus
  • 98. Systems Status: Dashboard Monitoring & Graphing with Circonus, NewRelic, statsd, nagios
  • 99. Backend Stack & Key Vendors Proprietary and โ–  MRI Ruby, jRuby, Sinatra, Ruby on Rails โ–  PostgreSQL, Solr, redis, twemproxyโ€จ memcached, nginx, haproxy, pgbouncer โ–  Joyent Cloud, SmartOS, Manta Object Storeโ€จ ZFS, ARC Cache, superb IO, SMF, Zones, dTrace, humans โ–  DNSMadeEasy, MessageBus, Chef, SiftScience โ–  LeanPlum, MixPanel, Graphite analytics, A/B Testing โ–  AWS S3 + Fastly CDN for user / product images โ–  Circonus, NewRelic, statsd, Boundary, โ€จ PagerDuty, nagios:trending / monitoring / alerting
  • 100. Proprietary and We are hiring! DevOps, Ruby, Scalability, iOS & Android Talk to me after the presentation if you are interested in working on real scalability problems, and on a product used and loved by millions :) http://wanelo.com/about/play Or email [email protected]
  • 101. Thanks! โ€จ github.com/wanelo github.com/wanelo-chef โ€จ wanelo technical blog (srsly awsm) building.wanelo.com Proprietary and @kig @kig @kigster