NoSQL at Twitter (NoSQL EU 2010)

NoSQL at Twitter
Kevin Weil -- @kevinweil
Analytics Lead, Twitter

April 21, 2010

TM

Introduction
‣ How We Arrived at NoSQL: A Crash Course
‣ Collecting Data (Scribe)
‣ Storing and Analyzing Data (Hadoop)
‣ Rapid Learning over Big Data (Pig)
‣ And More: Cassandra, HBase, FlockDB

My Background
‣ Studied Mathematics and Physics at Harvard, Physics at
Stanford
‣ Tropos Networks (city-wide wireless): mesh routing algorithms,
GBs of data
‣ Cooliris (web media): Hadoop and Pig for analytics, TBs of data
‣ Twitter: Hadoop, Pig, HBase, Cassandra, machine learning,
visualization, social graph analysis, soon to be PBs data

Data, Data Everywhere
‣ Twitter users generate a lot of data
‣ Anybody want to guess?

‣ 7 TB/day (2+ PB/yr)

‣ 10,000 CDs/day

‣ 10,000 CDs/day
‣ 5 million floppy disks

‣ 10,000 CDs/day
‣ 300 GB while I give this talk

‣ 10,000 CDs/day
‣ 300 GB while I give this talk
‣ And doubling multiple times per year

Syslog?
‣ Started with syslog-ng
‣ As our volume grew, it didn’t scale

Syslog?
‣ Started with syslog-ng
‣ As our volume grew, it didn’t scale
‣ Resources overwhelmed
‣ Lost data

Scribe
‣ Surprise! FB had same problem, built and open-sourced Scribe
‣ Log collection framework over Thrift
‣ You write log lines, with categories
‣ It does the rest

Scribe
‣ Runs locally; reliable in network outage

FE FE FE

Scribe
‣ Nodes only know downstream
FE FE FE
writer; hierarchical, scalable

Agg Agg

Scribe
‣ Nodes only know downstream
FE FE FE
writer; hierarchical, scalable
‣ Pluggable outputs

Agg Agg

File HDFS

Scribe at Twitter
‣ Solved our problem, opened new vistas
‣ Currently 30 different categories logged from multiple sources
‣ FE: Javascript, Ruby on Rails
‣ Middle tier: Ruby on Rails, Scala
‣ Backend: Scala, Java, C++

Scribe at Twitter
‣ We’ve contributed to it as we’ve used it
‣ Improved logging, monitoring, writing to HDFS, compression
‣ Continuing to work with FB on patches
‣ GSoC project! Help make it more awesome.

• http://github.com/traviscrawford/scribe
• http://wiki.developers.facebook.com/index.php/User:GSoC

How do you store 7TB/day?
‣ Single machine?
‣ What’s HD write speed?

‣ Single machine?
‣ ~80 MB/s

‣ Single machine?
‣ ~80 MB/s
‣ 24.3 hours to write 7 TB

‣ Single machine?
‣ ~80 MB/s
‣ 24.3 hours to write 7 TB
‣ Uh oh.

Where do I put 7TB/day?
‣ Need a cluster of machines

Where do I put 7TB/day?
‣ Need a cluster of machines

‣ ... which adds new layers of complexity

Hadoop
‣ Distributed file system
‣ Automatic replication, fault tolerance

Hadoop
‣ Distributed file system
‣ Automatic replication, fault tolerance
‣ MapReduce-based parallel computation
‣ Key-value based computation interface allows for wide
applicability

Hadoop
‣ Open source: top-level Apache project
‣ Scalable: Y! has a 4000 node cluster
‣ Powerful: sorted 1TB of random integers in 62 seconds

‣ Easy packaging: free Cloudera RPMs

MapReduce Workflow
Inputs

Map
Shufﬂe/Sort ‣ Challenge: how many tweets per user,
given tweets table?
Map
Outputs ‣ Input: key=row, value=tweet info
Map Reduce
‣ Map: output key=user_id, value=1
Map Reduce
‣ Shuffle: sort by user_id
Map Reduce
‣ Reduce: for each user_id, sum
Map ‣ Output: user_id, tweet count
Map ‣ With 2x machines, runs 2x faster

Two Analysis Challenges
‣ 1. Compute friendships in Twitter’s social graph
‣ grep, awk? No way.
‣ Data is in MySQL... self join on an n-billion row table?
‣ n,000,000,000 x n,000,000,000 = ?

‣ 1. Compute friendships in Twitter’s social graph
‣ grep, awk? No way.
‣ Data is in MySQL... self join on an n-billion row table?
‣ n,000,000,000 x n,000,000,000 = ?
‣ I don’t know either.

‣ 2. Large-scale grouping and counting?
‣ select count(*) from users? Maybe...
‣ select count(*) from tweets? Uh...
‣ Imagine joining them...
‣ ... and grouping...
‣ ... and sorting...

Back to Hadoop
‣ Didn’t we have a cluster of machines?

Back to Hadoop
‣ Hadoop makes it easy to distribute the
calculation
‣ Purpose-built for parallel computation
‣ Just a slight mindset adjustment

Back to Hadoop
‣ Hadoop makes it easy to distribute the
calculation
‣ Purpose-built for parallel computation
‣ Just a slight mindset adjustment
‣ But a fun and valuable one!

Analysis at scale
‣ Now we’re rolling
‣ Count all tweets: 12 billion, 5 minutes
‣ Hit FlockDB in parallel to assemble social graph aggregates
‣ Run pagerank across users to calculate reputations

But...
‣ Analysis typically in Java
‣ “I need less Java in my life, not more.”

But...
‣ Single-input, two-stage data flow is rigid

But...
‣ Projections, filters: custom code

But...
‣ Joins are lengthy, error-prone

But...
‣ n-stage jobs hard to manage

But...
‣ n-stage jobs hard to manage
‣ Exploration requires compilation!

Pig
‣ High-level language
‣ Transformations on sets of records
‣ Process data one step at a time
‣ Easier than SQL?

Why Pig?
‣ Because I bet you can read the following script.

A Real Pig Script

‣ Now, just for fun... the same calculation in vanilla Hadoop MapReduce.

Pig Democratizes Large-scale Data
Analysis
‣ The Pig version is:
‣ 5% of the code

Analysis
‣ 5% of the code
‣ 5% of the time

Analysis
‣ 5% of the code
‣ 5% of the time
‣ Within 25% of the execution time

One Thing I’ve Learned
‣ It’s easy to answer questions
‣ It’s hard to ask the right questions


‣ Value the system that promotes innovation, iteration


‣ Value the system that promotes innovation, iteration
‣ More minds contributing = more value from your data

The Hadoop Ecosystem at Twitter
‣ Running Cloudera’s free distro, CDH2 and Hadoop 0.20.1

‣ Heavily modified Scribe writing LZO-compressed to HDFS
‣ LZO: fast, splittable compression, ideal for HDFS*

‣ * http://www.github.com/kevinweil/hadoop-lzo
‣

‣ Data either as flat files (logs) or in protocol buffer format (newer
logs, structured data, etc)
‣ Libs for reading/writing/more open-sourced as elephant-bird**

‣ ** http://www.github.com/kevinweil/elephant-bird

‣ Some Java-based MapReduce, a little Hadoop streaming


‣ Some Java-based MapReduce, some HBase, Hadoop streaming
‣ Most analysis, and most interesting analyses, done in Pig

Data?
‣ Semi-structured: apache logs
(search, .com, mobile), search query
logs, RoR logs, mysql query logs, A/B
testing logs, signup flow logging, and
on...

Data?
on...
‣ Structured: tweets, users, blocks,
phones, favorites, saved searches,
retweets, geo, authentications, sms,
3rd party clients, followings

Data?
on...
‣ Structured: tweets, users, blocks,
phones, favorites, saved searches,
retweets, geo, authentications, sms,
3rd party clients, followings
‣ Entangled: the social graph

Counting Big Data
‣ standard counts, min, max, std dev
‣ How many requests do we serve in a day?

Counting Big Data
‣ What is the average latency? 95% latency?
‣

Counting Big Data
‣ Group by response code. What is the hourly distribution?
‣

Counting Big Data
‣ How many searches happen each day on Twitter?
‣

Counting Big Data
‣ How many unique queries, how many unique users?
‣

Counting Big Data
‣ How many unique queries, how many unique users?
‣ What is their geographic distribution?

Counting Big Data
‣ Where are users querying from? The API, the front page, their
profile page, etc?
‣

Correlating Big Data
‣ probabilities, covariance, influence
‣ How does usage differ for mobile users?

‣ How about for users with 3rd party desktop clients?

‣ Cohort analyses

‣ Cohort analyses
‣ Site problems: what goes wrong at the same time?

‣ Cohort analyses
‣ Which features get users hooked?

‣ Cohort analyses
‣ Which features do successful users use often?

‣ Cohort analyses
‣ Search corrections, search suggestions

‣ Cohort analyses
‣ Search corrections, search suggestions
‣ A/B testing

‣ What is the correlation between users with registered phones
and users that tweet?

Research on Big Data
‣ prediction, graph analysis, natural language
‣ What can we tell about a user from their tweets?

‣ From the tweets of those they follow?

‣ From the tweets of their followers?

‣ From the ratio of followers/following?

‣ What graph structures lead to successful networks?

‣ What graph structures lead to successful networks?
‣ User reputation

‣ Sentiment analysis

‣ What features get a tweet retweeted?

‣ How deep is the corresponding retweet tree?

‣ Long-term duplicate detection

‣ Machine learning

‣ Language detection

‣ Language detection
‣ ... the list goes on.

‣ How well can we detect bots and other non-human tweeters?

HBase
‣ BigTable clone on top of HDFS
‣ Distributed, column-oriented, no datatypes
‣ Unlike the rest of HDFS, designed for low-latency
‣ Importantly, data is mutable

HBase at Twitter
‣ We began building real products based on Hadoop
‣ People search

HBase at Twitter
‣ People search
‣ Old version: offline process on a single node

HBase at Twitter
‣ People search
‣ New version: complex user calculations,
hit extra services in real time, custom indexing

HBase at Twitter
‣ People search
‣ New version: complex user calculations,
hit extra services in real time, custom indexing
‣ Underlying data is mutable
‣ Mutable layer on top of HDFS --> HBase

People Search
‣ Import user data into HBase

People Search
‣ Periodic MapReduce job reading from HBase
‣ Hits FlockDB, multiple other internal services in mapper
‣ Custom partitioning

People Search
‣ Data sucked across to sharded, replicated, horizontally
scalable, in-memory, low-latency Scala service
‣ Build a trie, do case folding/normalization, suggestions, etc

People Search
‣ Data sucked across to sharded, replicated, horizontally
scalable, in-memory, low-latency Scala service
‣ Build a trie, do case folding/normalization, suggestions, etc
‣ See http://www.slideshare.net/al3x/building-distributed-systems-
in-scala for more

HBase
‣ More products now being built on top of it
‣ Flexible, easy to connect to MapReduce/Pig

HBase vs Cassandra
‣ “Their origins reveal their strengths and weaknesses”

HBase vs Cassandra
‣ HBase built on top of batch-oriented system, not low latency

HBase vs Cassandra
‣ Cassandra built from ground up for low latency

HBase vs Cassandra
‣ HBase easy to connect to batch jobs as input and output

HBase vs Cassandra
‣ Cassandra not so much (but we’re working on it)

HBase vs Cassandra
‣ Cassandra not so much (but we’re working on it)
‣ HBase has SPOF in the namenode

HBase vs Cassandra
‣ Your mileage may vary
‣ At Twitter: HBase for analytics, analysis, dataset generation
‣ Cassandra for online systems

HBase vs Cassandra
‣ Your mileage may vary
‣ At Twitter: HBase for analytics, analysis, dataset generation
‣ Cassandra for online systems

‣ As with all NoSQL systems: strengths in different situations

FlockDB
‣ Realtime, distributed
social graph store
‣ NOT optimized for data mining

‣ Note: the following slides largely come from @nk’s more
complete talk at http://www.slideshare.net/nkallen/
q-con-3770885

FlockDB
‣ Realtime, distributed Intersection
Temporal

social graph store
‣ NOT optimized for data mining
‣ Who follows who (nearly 8
Counts
orders of magnitude!)
‣ Intersection/set operations
‣ Cardinality
‣ Temporal index

Set operations?
‣ This tweet needs to
be delivered to people
who follow both
@aplusk (4.7M
followers) and
@foursquare (53K followers)

Original solution
‣ MySQL table source_id destination-id

‣ Indices on source_id
20 12

and destination_id
29 12
‣ Couldn’t handle write
34 16
throughput
‣ Indices too large for RAM

Next Try
‣ MySQL still
‣ Denormalized
‣ Byte-packed
‣ Chunked
‣ Still temporally ordered

Next Try
‣ Problems
‣ O(n) deletes
‣ Data consistency challenges
‣ Inefficient intersections
‣ All of these manifested strongly
for huge users like @aplusk
or @lancearmstrong

FlockDB
‣ MySQL underneath still (like PNUTS from Y!)
‣ Partitioned by user_id, gizzard handles sharding/partitioning
‣ Edges stored in both directions, indexed by (src, dest)
‣ Denormalized counts stored

Forward Backward

source_id destination_id updated_at x destination_id source_id updated_at x

20 12 20:50:14 x 12 20 20:50:14 x

20 13 20:51:32 12 32 20:51:32

20 16 12 16

FlockDB Timings
‣ Counts: 1ms

FlockDB Timings
‣ Counts: 1ms
‣ Temporal Query: 2ms

FlockDB Timings
‣ Counts: 1ms
‣ Writes: 1ms for journal, 16ms for durability

FlockDB Timings
‣ Counts: 1ms
‣ Writes: 1ms for journal, 16ms for durability
‣ Full walks: 100 edges/ms

FlockDB is Open Source
‣ We will maintain a community at
‣ http://www.github.com/twitter/flockdb
‣ http://www.github.com/twitter/gizzard

‣ See Nick Kallen’s QCon talk for more
‣ http://www.slideshare.net/nkallen/q-
con-3770885

Cassandra
‣ Why Cassandra, for Twitter?

Cassandra
‣ Old/current: vertically, horizontally partitioned MySQL

Cassandra
‣ All kinds of caching layers, all application managed

Cassandra
‣ Alter table impossible, leads to bitfields, piggyback tables

Cassandra
‣ Hardware intensive, error prone, etc

Cassandra
‣ Not to mention, we hit MySQL write limits sometimes

Cassandra

‣ First goal: move all tweets to Cassandra

Cassandra
‣ Decentralized, fault-tolerant


Cassandra
‣ Flexible schema


Cassandra
‣ Flexible schema
‣ Elastic


Cassandra
‣ Flexible schema
‣ Elastic
‣ High write throughput


Eventually Consistent?
‣ Twitter is already eventually consistent

‣ Your system may be even worse

‣ Ryan’s new term: “potential consistency”
‣ Do you have write-through caching?
‣ Do you ever have MySQL replication failures?

‣ There is no automatic consistency repair there, unlike Cassandra

‣ There is no automatic consistency repair there, unlike Cassandra

‣ http://www.slideshare.net/ryansking/scaling-
twitter-with-cassandra

Rolling out Cassandra
‣ 1. Integrate Cassandra alongside MySQL
‣ 100% reads/writes to MySQL
‣ Dynamic switches for % dark reads/writes to Cassandra

‣ 2. Turn up traffic to Cassandra

‣ 3. Find something that’s broken, set switch to 0%

‣ 4. Fix it

‣ 4. Fix it
‣ 5. GOTO 2

Cassandra for Realtime Analytics
‣ Starting a project around realtime analytics
‣ Cassandra as the backing store
‣ Using, developing, testing Digg’s atomic incr patches
‣ More soon.

That was a lot of slides
‣ Thanks for sticking with me.

Questions? Follow me at
twitter.com/kevinweil

TM

NoSQL at Twitter (NoSQL EU 2010)

Recommended

More Related Content

What's hot (20)

Similar to NoSQL at Twitter (NoSQL EU 2010) (20)

Recently uploaded (20)

NoSQL at Twitter (NoSQL EU 2010)