SlideShare a Scribd company logo
Why Wordnik went Non-Relational

            Tony Tam
            @fehguy
What this Talk is About

• 5 Key reasons why Wordnik migrated into
  a Non-Relational database
• Process for selection, migration
• Optimizations and tips from living
  survivors of the battle field
Why Should You Care?

• MongoDB user for 2 years
• Lessons learned, analysis, benefits from
  process
• We migrated from MySQL to MongoDB
  with no downtime
• We have interesting/challenging data
  needs, likely relevant to you
More on Wordnik

• World’s fastest updating English dictionary
 •   Based on input of text up to 8k words/second
 •   Word Graph as basis to our analysis
     •   Synchronous & asynchronous processing
• 10’s of Billions of documents in NR
  storage
• 20M daily REST API calls, billions served
 •   Powered by Swagger OSS API framework

                Powered API
                              swagger.wordnik.com
Architectural History

• 2008: Wordnik was born as a LAMP AWS
  EC2 stack
• 2009: Introduced public REST
  API, powered wordnik.com, partner APIs
• 2009: drank NoSQL cool-aid
• 2010: Scala
• 2011: Micro SOA
Non-relational by Necessity

• Moved to NR because of ―4S‖
 •   Speed
 •   Stability
 •   Scaling
 •   Simplicity
• But…
 •   MySQL can go a LONG way
     •   Takes right team, right reasons (+ patience)
 •   NR offerings simply too compelling to focus on
     scaling MySQL
Wordnik’s 5 Whys for NoSQL
Why #1: Speed bumps with MySQL

• Inserting data fast (50k recs/second)
  caused MySQL mayhem
 •   Maintaining indexes largely to blame
 •   Operations for consistency unnecessary but
     "cannot be turned off‖
• Devised twisted schemes to avoid client
  blocking
 •   Aka the ―master/slave tango‖
Why #2: Retrieval Complexity

• Objects typically mapped to tables
  •   Object Hierarchy always => inner + outer joins
• Lots of static data, so why join?
  •   “Noun” is not getting renamed in my code’s
      lifetime!
  •   Logic like this is probably in application logic
• Since storage is cheap
  •   I’ll choose speed
Why #2: Retrieval Complexity




                    One definition = 10+ joins


                            50
                         requests
                            per
                         second!
Why #2: Retrieval Complexity
• Embed objects in rows ―sort of works‖
 •   Filtering gets really nasty
 •   Native XML in MySQL?
     •   If a full table-scan is OK…


• OK, then cache it!
 •   Layers of caching introduced layers of complexity
     •   Stale data/corruption
     •   Object versionitis
     •   Cache stampedes
Why #3: Object Modeling

• Object models being compromised for
  sake of persistence
  •   This is backwards!
  •   Extra abstraction for the wrong reason
• OK, then performance suffers
  •   In-application joins across objects
  •   ―Who ran the fetch all query against production?!‖
      –any sysadmin

• ―My zillionth ORM layer that only I
  understand‖ (and can maintain)
Why #4: Scaling

• Needed "cloud friendly storage"
 •   Easy up, easy down!
     •   Startup: Sync your data, and announce to
         clients when ready for business
     •   Shutdown: Announce your departure and leave
• Adding MySQL instances was a dance
 •   Snapshot + bin files
 mysql> change master to
 MASTER_HOST='db1', MASTER_USER='xxx', MASTER_
 PASSWORD='xxx', MASTER_LOG_FILE='master-
 relay.000431', MASTER_LOG_POS=1035435402;
Why #4: Scaling

• What about those VMs?
 •   So convenient! But… they kind of suck
 •   Can the database succeed on a VM?
• VM Performance:
 •   Memory, CPU or I/O—Pick only one
 •   Can your database really reduce CPU or disk I/O
     with lots of RAM?
Why #5: Big Picture
• BI tools use relational constraints for discovery
  •   Is this the right reason for them?
  •   Can we work around this?
  •   Let’s have a BI tool revolution, too!
• True service architecture makes relational
  constraints impractical/impossible
• Distributed sharding makes relational
  constraints impractical/impossible
Why #5: Big Picture

• Is your app smarter than your database?
  •   The logic line is probably blurry!
• What does count(*) really mean when you
  add 5k records/sec?
  •   Maybe eventual consistency is not so bad…
• 2PC?      Do some reading and decide!
http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
Ok, I’m in!

• I thought deciding was easy!?
 •   Many quickly maturing products
 •   Divergent features tackle different needs
• Wordnik spent 8 weeks researching and
  testing NoSQL solutions
 •   This is a long time! (for a startup)
 •   Wrote ODM classes and migrated our data
• Surprise!     There were surprises
 •   Be prepared to compromise
Choice Made, Now What?
• We went with MongoDB ***
 •   Fastest to implement
 •   Most reliable
 •   Best community
• Why?
 •   Why #1: Fast loading/retrieval
 •   Why #2: Fast ODM (50 tps => 1000 tps!)
 •   Why #3: Document Models === Object models
 •   Why #4: MMF => Kernel-managed memory + RS
 •   Why #5: It’s 2011, is there no progress?
More on Why MongoDB

• Testing, testing, testing
  •   Used our migration tools to load test
      •   Read from MySQL, write to MongoDB
  •   We loaded 5+ billion documents, many times over
• In the end, one server could…
  •   Insert 100k records/sec sustained
  •   Read 250k records/sec sustained
  •   Support concurrent loading/reading
Migration & Testing

• Iterated ODM mapping multiple times
 •   Some issues
     •   Type Safety
 cur.next.get("iWasAnIntOnce").asInstanceOf[Long]

     •   Dates as Strings
 obj.put("a_date", "2011-12-31") !=
 obj.put("a_date", new Date("2011-12-31"))

     •   Storage Size
 obj.put("very_long_field_name", true) >>
 obj.put("vsfn", true)
Migration & Testing

• Expect data model iterations
 •   Wordnik migrated table to Mongo collection "as-is‖
     •   Easier to migrate, test
     •   _id field used same MySQL PK
 •   Auto Increment?
     •   Used MySQL to ―check-out‖ sequences
         •   One row per mongo collection
         •   Run out of sequences => get more
     •   Need exclusive locks here!
Migration & Testing

• Sequence generator in-process
 SequenceGenerator.checkout("doc_metadata,100")

• Sequence generator as web service
 •   Centralized UID management
Migration & Testing

• Expect data access pattern iterations
 •   So much more flexibility!
     •   Reach into objects
     > db.dictionary_entry.find({"hdr.sr":"cmu"})

 •   Access to a whole object tree at query time
 •   Overwrite a whole object at once… when desired
     •   Not always! This clobbers the whole record
     > db.foo.save({_id:18727353,foo:"bar"})

     •   Update a single field:
     > db.foo.update({_id:18727353},{$set:{foo:"bar"}})
Flip the Switch

• Migrate production with zero downtime
  •   We temporarily halted loading data
  •   Added a switch to flip between MySQL/MongoDB
  •   Instrument, monitor, flip it, analyze, flip back
• Profiling your code is key
  •   What is slow?
  •   Build this in your app from day 1
Flip the Switch
Flip the Switch

• Storage selected at runtime
 val h = shouldUseMongoDb match {
   case true => new MongoDbSentenceDAO
   case _ => new MySQLDbSentenceDAO
 }
 h.find(...)

• Hot-swappable storage via configuration
 •   It worked!
Then What?

• Watch our deployment, many iterations to
  mapping layer
 •   Settled on in-house, type-safe mapper
 https://github.com/fehguy/mongodb-benchmark-tools

• Some gotchas (of course)
 •   Locking issues on long-running updates (more in a
     minute)
• We want more of this!
 •   Migrated shared files to Mongo GridFS
 •   Easy-IT
Performance + Optimization

• Loading data is fast!
  •   Fixed collection padding, similarly-sized records
  •   Tail of collection is always in memory
  •   Append faster than MySQL in every case tested
• But... random access started getting slow
  •   Indexes in RAM? Yes
  •   Data in RAM? No, > 2TB per server
  •   Limited by disk I/O /seek performance
  •   EC2 + EBS for storage?
Performance + Optimization

• Moved to physical data center
 •   DAS & 72GB RAM => great uncached
     performance
• Good move?        Depends on use case
 •   If ―access anything anytime‖, not many options
 •   You want to support this?
Performance + Optimization

• Inserts are fast, how about updates?
 •   Well… update => find object, update it, save
 •   Lock acquired at ―find‖, released after ―save‖
     •   If hitting disk, lock time could be large
• Easy answer, pre-fetch on update
 •   Oh, and NEVER do ―update all records‖ against a
     large collection
Performance + Optimization

• Indexes
 •   Can't always keep index in ram. MMF "does it's
     thing"
 •   Right-balanced b-tree keeps necessary index hot
 •   Indexes hit disk => mute your pager
                                               1
                                               7




                                                      1   2
                                                      5   7
More Mongo, Please!

    • We modeled our word graph in mongo




• 50M Nodes
• 80M Edges
• 80 S edge fetch
More Mongo, Please!

• Analytics rolled-up from aggregation jobs
 •   Send to Hadoop, load to mongo for fast access
What’s next

• Liberate our models
 •   stop worrying about how to store them (for the
     most part)
• New features almost always NR
• Some MySQL left
 •   Less on each release
Questions?
•   See more about Wordnik APIs
                    http://developer.wordnik.com

•   Migrating from MySQL to MongoDB
http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik

•   Maintaining your MongoDB Installation
               http://www.slideshare.net/fehguy/mongo-sv-tony-tam

•   Swagger API Framework
                          http://swagger.wordnik.com

•   Mapping Benchmark
               https://github.com/fehguy/mongodb-benchmark-tools

•   Wordnik OSS Tools
                    https://github.com/wordnik/wordnik-oss
Ad

More Related Content

What's hot (20)

The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Caserta
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprise
kayalvizhi kandasamy
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
DATAVERSITY
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
CCG
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
Syed Jahanzaib Bin Hassan - JBH Syed
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
Caserta
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Erik Fransen
 
Future of Analytics: Drivers of Change
Future of Analytics: Drivers of ChangeFuture of Analytics: Drivers of Change
Future of Analytics: Drivers of Change
CCG
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture Requirements
DATAVERSITY
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
Caserta
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
Syed Jahanzaib Bin Hassan - JBH Syed
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Caserta
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual Workshop
CCG
 
Optimizing for Costs in the Cloud
Optimizing for Costs in the CloudOptimizing for Costs in the Cloud
Optimizing for Costs in the Cloud
Amazon Web Services LATAM
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactHow to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
DATAVERSITY
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Caserta
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprise
kayalvizhi kandasamy
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
DATAVERSITY
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
CCG
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
Caserta
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Erik Fransen
 
Future of Analytics: Drivers of Change
Future of Analytics: Drivers of ChangeFuture of Analytics: Drivers of Change
Future of Analytics: Drivers of Change
CCG
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture Requirements
DATAVERSITY
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
Caserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Caserta
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual Workshop
CCG
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactHow to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
DATAVERSITY
 

Similar to A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational? (20)

What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
DATAVERSITY
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
Tony Tam
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
Tony Tam
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
Tony Tam
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
Reuven Lerner
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
Grig Gheorghiu
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
mathraq
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
Community Engine
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
Concentric Sky
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatis
Will Iverson
 
Why we love ArangoDB. The hunt for the right NosQL Database
Why we love ArangoDB. The hunt for the right NosQL DatabaseWhy we love ArangoDB. The hunt for the right NosQL Database
Why we love ArangoDB. The hunt for the right NosQL Database
Andreas Jung
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
SATOSHI TAGOMORI
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
Scality
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
HabileLabs
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
DATAVERSITY
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
Tony Tam
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
Tony Tam
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
Tony Tam
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
Grig Gheorghiu
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
mathraq
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
Community Engine
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
Concentric Sky
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatis
Will Iverson
 
Why we love ArangoDB. The hunt for the right NosQL Database
Why we love ArangoDB. The hunt for the right NosQL DatabaseWhy we love ArangoDB. The hunt for the right NosQL Database
Why we love ArangoDB. The hunt for the right NosQL Database
Andreas Jung
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
SATOSHI TAGOMORI
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
Scality
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
HabileLabs
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Ad

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Ad

Recently uploaded (20)

Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?

  • 1. Why Wordnik went Non-Relational Tony Tam @fehguy
  • 2. What this Talk is About • 5 Key reasons why Wordnik migrated into a Non-Relational database • Process for selection, migration • Optimizations and tips from living survivors of the battle field
  • 3. Why Should You Care? • MongoDB user for 2 years • Lessons learned, analysis, benefits from process • We migrated from MySQL to MongoDB with no downtime • We have interesting/challenging data needs, likely relevant to you
  • 4. More on Wordnik • World’s fastest updating English dictionary • Based on input of text up to 8k words/second • Word Graph as basis to our analysis • Synchronous & asynchronous processing • 10’s of Billions of documents in NR storage • 20M daily REST API calls, billions served • Powered by Swagger OSS API framework Powered API swagger.wordnik.com
  • 5. Architectural History • 2008: Wordnik was born as a LAMP AWS EC2 stack • 2009: Introduced public REST API, powered wordnik.com, partner APIs • 2009: drank NoSQL cool-aid • 2010: Scala • 2011: Micro SOA
  • 6. Non-relational by Necessity • Moved to NR because of ―4S‖ • Speed • Stability • Scaling • Simplicity • But… • MySQL can go a LONG way • Takes right team, right reasons (+ patience) • NR offerings simply too compelling to focus on scaling MySQL
  • 7. Wordnik’s 5 Whys for NoSQL
  • 8. Why #1: Speed bumps with MySQL • Inserting data fast (50k recs/second) caused MySQL mayhem • Maintaining indexes largely to blame • Operations for consistency unnecessary but "cannot be turned off‖ • Devised twisted schemes to avoid client blocking • Aka the ―master/slave tango‖
  • 9. Why #2: Retrieval Complexity • Objects typically mapped to tables • Object Hierarchy always => inner + outer joins • Lots of static data, so why join? • “Noun” is not getting renamed in my code’s lifetime! • Logic like this is probably in application logic • Since storage is cheap • I’ll choose speed
  • 10. Why #2: Retrieval Complexity One definition = 10+ joins 50 requests per second!
  • 11. Why #2: Retrieval Complexity • Embed objects in rows ―sort of works‖ • Filtering gets really nasty • Native XML in MySQL? • If a full table-scan is OK… • OK, then cache it! • Layers of caching introduced layers of complexity • Stale data/corruption • Object versionitis • Cache stampedes
  • 12. Why #3: Object Modeling • Object models being compromised for sake of persistence • This is backwards! • Extra abstraction for the wrong reason • OK, then performance suffers • In-application joins across objects • ―Who ran the fetch all query against production?!‖ –any sysadmin • ―My zillionth ORM layer that only I understand‖ (and can maintain)
  • 13. Why #4: Scaling • Needed "cloud friendly storage" • Easy up, easy down! • Startup: Sync your data, and announce to clients when ready for business • Shutdown: Announce your departure and leave • Adding MySQL instances was a dance • Snapshot + bin files mysql> change master to MASTER_HOST='db1', MASTER_USER='xxx', MASTER_ PASSWORD='xxx', MASTER_LOG_FILE='master- relay.000431', MASTER_LOG_POS=1035435402;
  • 14. Why #4: Scaling • What about those VMs? • So convenient! But… they kind of suck • Can the database succeed on a VM? • VM Performance: • Memory, CPU or I/O—Pick only one • Can your database really reduce CPU or disk I/O with lots of RAM?
  • 15. Why #5: Big Picture • BI tools use relational constraints for discovery • Is this the right reason for them? • Can we work around this? • Let’s have a BI tool revolution, too! • True service architecture makes relational constraints impractical/impossible • Distributed sharding makes relational constraints impractical/impossible
  • 16. Why #5: Big Picture • Is your app smarter than your database? • The logic line is probably blurry! • What does count(*) really mean when you add 5k records/sec? • Maybe eventual consistency is not so bad… • 2PC? Do some reading and decide! http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
  • 17. Ok, I’m in! • I thought deciding was easy!? • Many quickly maturing products • Divergent features tackle different needs • Wordnik spent 8 weeks researching and testing NoSQL solutions • This is a long time! (for a startup) • Wrote ODM classes and migrated our data • Surprise! There were surprises • Be prepared to compromise
  • 18. Choice Made, Now What? • We went with MongoDB *** • Fastest to implement • Most reliable • Best community • Why? • Why #1: Fast loading/retrieval • Why #2: Fast ODM (50 tps => 1000 tps!) • Why #3: Document Models === Object models • Why #4: MMF => Kernel-managed memory + RS • Why #5: It’s 2011, is there no progress?
  • 19. More on Why MongoDB • Testing, testing, testing • Used our migration tools to load test • Read from MySQL, write to MongoDB • We loaded 5+ billion documents, many times over • In the end, one server could… • Insert 100k records/sec sustained • Read 250k records/sec sustained • Support concurrent loading/reading
  • 20. Migration & Testing • Iterated ODM mapping multiple times • Some issues • Type Safety cur.next.get("iWasAnIntOnce").asInstanceOf[Long] • Dates as Strings obj.put("a_date", "2011-12-31") != obj.put("a_date", new Date("2011-12-31")) • Storage Size obj.put("very_long_field_name", true) >> obj.put("vsfn", true)
  • 21. Migration & Testing • Expect data model iterations • Wordnik migrated table to Mongo collection "as-is‖ • Easier to migrate, test • _id field used same MySQL PK • Auto Increment? • Used MySQL to ―check-out‖ sequences • One row per mongo collection • Run out of sequences => get more • Need exclusive locks here!
  • 22. Migration & Testing • Sequence generator in-process SequenceGenerator.checkout("doc_metadata,100") • Sequence generator as web service • Centralized UID management
  • 23. Migration & Testing • Expect data access pattern iterations • So much more flexibility! • Reach into objects > db.dictionary_entry.find({"hdr.sr":"cmu"}) • Access to a whole object tree at query time • Overwrite a whole object at once… when desired • Not always! This clobbers the whole record > db.foo.save({_id:18727353,foo:"bar"}) • Update a single field: > db.foo.update({_id:18727353},{$set:{foo:"bar"}})
  • 24. Flip the Switch • Migrate production with zero downtime • We temporarily halted loading data • Added a switch to flip between MySQL/MongoDB • Instrument, monitor, flip it, analyze, flip back • Profiling your code is key • What is slow? • Build this in your app from day 1
  • 26. Flip the Switch • Storage selected at runtime val h = shouldUseMongoDb match { case true => new MongoDbSentenceDAO case _ => new MySQLDbSentenceDAO } h.find(...) • Hot-swappable storage via configuration • It worked!
  • 27. Then What? • Watch our deployment, many iterations to mapping layer • Settled on in-house, type-safe mapper https://github.com/fehguy/mongodb-benchmark-tools • Some gotchas (of course) • Locking issues on long-running updates (more in a minute) • We want more of this! • Migrated shared files to Mongo GridFS • Easy-IT
  • 28. Performance + Optimization • Loading data is fast! • Fixed collection padding, similarly-sized records • Tail of collection is always in memory • Append faster than MySQL in every case tested • But... random access started getting slow • Indexes in RAM? Yes • Data in RAM? No, > 2TB per server • Limited by disk I/O /seek performance • EC2 + EBS for storage?
  • 29. Performance + Optimization • Moved to physical data center • DAS & 72GB RAM => great uncached performance • Good move? Depends on use case • If ―access anything anytime‖, not many options • You want to support this?
  • 30. Performance + Optimization • Inserts are fast, how about updates? • Well… update => find object, update it, save • Lock acquired at ―find‖, released after ―save‖ • If hitting disk, lock time could be large • Easy answer, pre-fetch on update • Oh, and NEVER do ―update all records‖ against a large collection
  • 31. Performance + Optimization • Indexes • Can't always keep index in ram. MMF "does it's thing" • Right-balanced b-tree keeps necessary index hot • Indexes hit disk => mute your pager 1 7 1 2 5 7
  • 32. More Mongo, Please! • We modeled our word graph in mongo • 50M Nodes • 80M Edges • 80 S edge fetch
  • 33. More Mongo, Please! • Analytics rolled-up from aggregation jobs • Send to Hadoop, load to mongo for fast access
  • 34. What’s next • Liberate our models • stop worrying about how to store them (for the most part) • New features almost always NR • Some MySQL left • Less on each release
  • 35. Questions? • See more about Wordnik APIs http://developer.wordnik.com • Migrating from MySQL to MongoDB http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik • Maintaining your MongoDB Installation http://www.slideshare.net/fehguy/mongo-sv-tony-tam • Swagger API Framework http://swagger.wordnik.com • Mapping Benchmark https://github.com/fehguy/mongodb-benchmark-tools • Wordnik OSS Tools https://github.com/wordnik/wordnik-oss