SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DAT204
How Thermo Fisher Is Reducing Mass
Spectrometry Experiment Times from Days to
Minutes with MongoDB & AWS
World leader in serving science
Revenues of $17 billion
50,000 employees
50 countries
A Mass Spectrometer tells you…
What’s in there and how much
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
Making the world cleaner and safer
Mars Organic Molecule
Analyzer (MOMA) will
take a modified Thermo
Linear Ion Trap Mass
Spectrometer to Mars
in 2020
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
What beer looks like in a mass spec
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
Demo
Instrument
MongoDB
MS Instrument
Connect
Demo: instrument connect
Demo: remote monitoring a mass spectrometer
Why does Thermo use MongoDB?
ThermoFisher apps using MongoDB
XML  MongoDB
Starting on MongoDBOracle  MongoDB
SQL Lite  MongoDB
Postgres  MongoDB
Amazon DynamoDB 
MongoDB Atlas
Scientific apps = humongous data
Big molecules = big data
instrument {
UserId : "dr.ennis@poldark.net",
MachineName : "TRACEFINDER8",
Location : "Austin",
AcquisitionStationName : "TSQ 8000",
LastErrorEventDate : "2016-09-05",
LastErrorEventValue : null,
RuntimeEstimate : {
MeasuredElaspedDuration : 0.21966,
Confidence : HighConfidence
},
RunManagerStatus : {
Status : "Acquire",
Sequence : "Testosterone",
SampleName : "Drugx",
VialPosition : "1",
Rawfile : "2pg_161029205505",
Instmethod : "1x.meth",
Instrument : "TSQ 8000",
IsPaused : false,
Operator : "Fred",
}
}
Why MongoDB was chosen
• Performance
• Developer productivity
• Cost effective
• Runs anywhere
• Rich feature set
• Achieved legal and regulatory approval
MongoDB is a Swiss army knife
• Hierarchical data
• Relational data
• Queues
• File storage
• Device state
Amazon SQS
Amazon S3
Amazon IoT
Join example
• Version 3.2 introduced the $lookup operator
• SQL query
• MongoDB C# driver query
MongoDB has caught
up to relational DBs
Notably, we show that the MUPG (match,
unwind, project, group) fragment is
already at least as expressive as full
relational algebra over (the relational view
of) a single collection, and in particular
able to express arbitrary joins.
– Bolzano University in Italy
Hash-Based Sharding
Roles
Kerberos
On-Prem Monitoring
2.4
GA 2013
2.6
GA 2014
3.0
GA 2015
3.2
GA 2015
Headline Features by Release
$out
Index Intersection
Text Search
Field-Level Redaction
LDAP & x509
Auditing
Document Validation
$lookup
Fast Failover
Simpler Scalability
Aggregation ++
Encryption At Rest
In-Memory Storage
Engine
BI Connector
MongoDB Compass
APM Integration
Profiler Visualization
Auto Index Builds
Backups to File
System
Doc-Level
Concurrency
Compression
Storage Engine API
≤50 replicas
Auditing ++
Ops Manager
Linearizable reads
Intra-cluster compression
Views
Log Redaction
Graph Processing
Decimal
Collations
Faceted Navigation
Spark Connector ++
Zones ++
Aggregation ++
Auto-balancing ++
ARM, Power, zSeries
BI Connector ++
Compass ++
Hardware Monitoring
Server Pool
LDAP Authorization
Encrypted Backups
Cloud Foundry Integration
3.4
GA 2016Atlas
The evolution of MongoDB
1.0
2009
MySQL vs. MongoDB
Database schema
MySQL
schema
MongoDB
schema
Inserting data: MongoDB vs. MySQL
• Inserting 1,615 chemical compound records into two parent-child tables.
• To optimize the MySQL query, we turned off foreign keys during insert and
used a string builder to create a bulk insert SQL statement. This improved
insert performance by a factor of 360.
• Compare to MongoDB.
Database Milliseconds Lines of code
MySQL not optimized 147,600 (2.5 minutes) 21
MySQL optimized 410 40
MongoDB 68 1
Inserting data: MongoDB vs. MySQL
Selecting data: MongoDB vs. MySQL
• Query 600,000 rows of SampleCompound result data
• To optimize the MySQL select query, we created a dictionary to lookup child
records for each parent, this improved performance by a factor of 300,
optimization effort: 2 engineers and 2 weeks.
Database Seconds Lines of code
MySQL not optimized 2,400 (4.1 minutes) 20
MySQL optimized 8.2 29
MongoDB 17.5 7
Update: MongoDB vs. MySQL
Migrating to MongoDB reduced code by 3.5x
SQLite MongoDB
Data Layer Lines of Code 4271 1260
MongoDB compared to DynamoDB
MongoDB DynamoDB
Anywhere AWS
Rich Ad-hoc Query Language + IDE No Ad-hoc query language
Many operators (Joins, Aggregation, etc.) Fewer operators
Excellent Performance Excellent Performance
Easy to deploy (with Atlas) Easy to Deploy each table
Adding tables requires no configuration
changes
Adding tables requires additional configuration
and cost
Easy to use from AWS services but not
natively integrated
Native integration with AWS Services: IAM,
VPC, Lambda, Kinesis
Released in 2009 Released in 2012
MongoDB vs. S3 performance
Download 220 KB object from MongoDB was 7x faster cold, and 3x faster when warm
MongoDB Amazon S3
Retrieve document first time
68 ms 468 ms
Retrieve document second time 13 ms 38 ms
MongoDB vs. S3 performance
MongoDB 11x faster than S3 in the use case of partial document loading
MongoDB S3
Data size 400 Bytes 2.1 MB
Performance 19 ms 214 ms
Reducing processing from
days to minutes
Frameworks used to parallelize algorithms
• AWS Lambda
• Docker and Amazon ECS
• Spark and Elastic Map Reduce
Parallel data processing
Why Atlas?
• Easy
• Performant
• Seamless Migration
• Robust
• No downtime, even when scaling up
Building MongoDB Atlas
on Amazon Web Services
Operations burden
PATCHES
UPGRADES
SECURITY
BACKUPS
RECOVERY
99.999% UPTIME
UPSCALE
DOWNSCALE
PERFORMANCE
UAT
STAGING
MONITORING
ALERTS
PROVISION
CONFIGURE
INSTALL
Automated Available On-Demand
Secure Highly Available Automated Backups
Elastically Scalable
Database as a service for MongoDB
Fully managed MongoDB clusters
Customer only needs to choose the
shape and size of the cluster
● Instance size (CPU and RAM)
● Replication factor
● Number of shards
● Disk space
● Disk speed
Screenshot of create dialog
Cluster features
VPC peering
IP address whitelist
SCRAM-SHA-1 authentication
readWriteAnyDatabase
enableSharding
clusterMonitor
SSL
Using well-known CA
Trust system CAs by default
Security features
Backup AutomationMonitoring
Key components
AWS Account X—Region Y
VPC (Customer N)
Availability Zone A Availability Zone B Availability Zone C
Subnet A Subnet B Subnet C
mongod—27017 mongod—27017 mongod—27017
Customer container with replica set
AWS Account X—Region Y
VPC (Customer N)
Availability Zone A Availability Zone B Availability Zone C
Subnet A Subnet B Subnet C
Customer container with sharded cluster
shard0
S
shard1
S
shard2 config
shard0
S
shard1
S
shard2 config
shard0
S
shard1
S
shard2 config
mongod—27017 mongod—27017 mongod—27017
One security group per VPC applied to
all Amazon EC2 instances
Three classes of security rules:
● MongoDB traffic between cluster
members
● MongoDB traffic between application
and clusters
● SSH traffic between production
support jump box and EC2 instance
App Server Jump Box
IP firewall using security groups
173.31.248.0/21
10.0.0.0/16
VPC peering
Your VPC
Elastic LB
CIDR Block: 10.0.0.0/16
Atlas VPC
AZ 1 AZ 2 AZ 3
CIDR Block: 172.31.248.0/21
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
We want prime to
be such a good
value, you’d be
irresponsible not
to be a member.
—Jeff Bezos
Questions?
Thank you!
Remember to complete
your evaluations!

More Related Content

What's hot (20)

PPTX
Getting Started with MongoDB Using the Microsoft Stack
MongoDB
 
PPTX
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
PPTX
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
PPTX
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
PPTX
Webinar: Live Data Visualisation with Tableau and MongoDB
MongoDB
 
PDF
MongoDB Launchpad 2016: MongoDB 3.4: Your Database Evolved
MongoDB
 
PPTX
Webinar: What's New in MongoDB 3.2
MongoDB
 
PDF
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
PDF
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB
 
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
PPTX
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
MongoDB
 
PPT
Migrating to MongoDB: Best Practices
MongoDB
 
PPTX
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
PDF
MongoDB and the Internet of Things
MongoDB
 
PPTX
Eagle6 Enterprise Situational Awareness
MongoDB
 
PDF
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
PDF
MongoDB Atlas Workshop - Singapore
Ashnikbiz
 
PDF
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB
 
PPTX
Python and MongoDB as a Market Data Platform by James Blackburn
PyData
 
Getting Started with MongoDB Using the Microsoft Stack
MongoDB
 
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
Webinar: Live Data Visualisation with Tableau and MongoDB
MongoDB
 
MongoDB Launchpad 2016: MongoDB 3.4: Your Database Evolved
MongoDB
 
Webinar: What's New in MongoDB 3.2
MongoDB
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
MongoDB
 
Migrating to MongoDB: Best Practices
MongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
MongoDB and the Internet of Things
MongoDB
 
Eagle6 Enterprise Situational Awareness
MongoDB
 
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
MongoDB Atlas Workshop - Singapore
Ashnikbiz
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB
 
Python and MongoDB as a Market Data Platform by James Blackburn
PyData
 

Viewers also liked (20)

PPTX
How Auto Trader enables the UK's largest digital automotive marketplace
MongoDB
 
PPTX
Webinar: Transitioning from SQL to MongoDB
MongoDB
 
PDF
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
PPTX
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
MongoDB
 
PDF
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
PPTX
Maximizing MongoDB Performance on AWS
MongoDB
 
PDF
The importance of efficient data management for Digital Transformation
MongoDB
 
PPTX
Back to Basics: My First MongoDB Application
MongoDB
 
PDF
MongoDB World 2016: Poster Sessions eBook
MongoDB
 
PPTX
Back to Basics Webinar 3: Introduction to Replica Sets
MongoDB
 
PPTX
Back to Basics 2017: Introduction to Sharding
MongoDB
 
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
PDF
The Rise of Microservices
MongoDB
 
PDF
Webinar: Working with Graph Data in MongoDB
MongoDB
 
PDF
MongoDB Europe 2016 - Welcome
MongoDB
 
PDF
Big Data Spain 2016: Keynote
MongoDB
 
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
PPTX
Running MongoDB on AWS
MongoDB
 
PPTX
Running MongoDB 3.0 on AWS
MongoDB
 
PDF
Big Dating at eHarmony
MongoDB
 
How Auto Trader enables the UK's largest digital automotive marketplace
MongoDB
 
Webinar: Transitioning from SQL to MongoDB
MongoDB
 
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
MongoDB
 
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
Maximizing MongoDB Performance on AWS
MongoDB
 
The importance of efficient data management for Digital Transformation
MongoDB
 
Back to Basics: My First MongoDB Application
MongoDB
 
MongoDB World 2016: Poster Sessions eBook
MongoDB
 
Back to Basics Webinar 3: Introduction to Replica Sets
MongoDB
 
Back to Basics 2017: Introduction to Sharding
MongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
The Rise of Microservices
MongoDB
 
Webinar: Working with Graph Data in MongoDB
MongoDB
 
MongoDB Europe 2016 - Welcome
MongoDB
 
Big Data Spain 2016: Keynote
MongoDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Running MongoDB on AWS
MongoDB
 
Running MongoDB 3.0 on AWS
MongoDB
 
Big Dating at eHarmony
MongoDB
 
Ad

Similar to How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS (20)

PPTX
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
MongoDB
 
PDF
MongoDB 4.0 새로운 기능 소개
Ha-Yang(White) Moon
 
PDF
10 - MongoDB
Kangaroot
 
PPTX
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
PDF
Online | MongoDB Atlas on GCP Workshop
Natasha Wilson
 
PDF
Mongo db 3.4 Overview
Norberto Leite
 
PDF
MongoDB - General Purpose Database
Ashnikbiz
 
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
PPTX
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
PPTX
Get More Out of MongoDB with TokuMX
Tim Callaghan
 
PPTX
Webinar: General Technical Overview of MongoDB for Ops Teams
MongoDB
 
PPTX
Is It Fast? : Measuring MongoDB Performance
Tim Callaghan
 
PPTX
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB
 
PPTX
MongoDB
Bembeng Arifin
 
PPTX
Onomi - MongoDB Introduction
Onomi
 
PDF
Final_CloudEventFrankfurt2017 (1).pdf
MongoDB
 
PPTX
MongoDB 3.4 webinar
Andrew Morgan
 
PPTX
Webinar: When to Use MongoDB
MongoDB
 
PPTX
Jumpstart: Introduction to MongoDB
MongoDB
 
PPTX
Introduction to MongoDB
MongoDB
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
MongoDB
 
MongoDB 4.0 새로운 기능 소개
Ha-Yang(White) Moon
 
10 - MongoDB
Kangaroot
 
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
Online | MongoDB Atlas on GCP Workshop
Natasha Wilson
 
Mongo db 3.4 Overview
Norberto Leite
 
MongoDB - General Purpose Database
Ashnikbiz
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
Get More Out of MongoDB with TokuMX
Tim Callaghan
 
Webinar: General Technical Overview of MongoDB for Ops Teams
MongoDB
 
Is It Fast? : Measuring MongoDB Performance
Tim Callaghan
 
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB
 
Onomi - MongoDB Introduction
Onomi
 
Final_CloudEventFrankfurt2017 (1).pdf
MongoDB
 
MongoDB 3.4 webinar
Andrew Morgan
 
Webinar: When to Use MongoDB
MongoDB
 
Jumpstart: Introduction to MongoDB
MongoDB
 
Introduction to MongoDB
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 

How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DAT204 How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes with MongoDB & AWS
  • 2. World leader in serving science Revenues of $17 billion 50,000 employees 50 countries
  • 3. A Mass Spectrometer tells you… What’s in there and how much
  • 5. Making the world cleaner and safer
  • 6. Mars Organic Molecule Analyzer (MOMA) will take a modified Thermo Linear Ion Trap Mass Spectrometer to Mars in 2020
  • 8. What beer looks like in a mass spec
  • 11. Demo
  • 13. Demo: remote monitoring a mass spectrometer
  • 14. Why does Thermo use MongoDB?
  • 15. ThermoFisher apps using MongoDB XML  MongoDB Starting on MongoDBOracle  MongoDB SQL Lite  MongoDB Postgres  MongoDB Amazon DynamoDB  MongoDB Atlas
  • 16. Scientific apps = humongous data
  • 17. Big molecules = big data
  • 18. instrument { UserId : "[email protected]", MachineName : "TRACEFINDER8", Location : "Austin", AcquisitionStationName : "TSQ 8000", LastErrorEventDate : "2016-09-05", LastErrorEventValue : null, RuntimeEstimate : { MeasuredElaspedDuration : 0.21966, Confidence : HighConfidence }, RunManagerStatus : { Status : "Acquire", Sequence : "Testosterone", SampleName : "Drugx", VialPosition : "1", Rawfile : "2pg_161029205505", Instmethod : "1x.meth", Instrument : "TSQ 8000", IsPaused : false, Operator : "Fred", } } Why MongoDB was chosen • Performance • Developer productivity • Cost effective • Runs anywhere • Rich feature set • Achieved legal and regulatory approval
  • 19. MongoDB is a Swiss army knife • Hierarchical data • Relational data • Queues • File storage • Device state Amazon SQS Amazon S3 Amazon IoT
  • 20. Join example • Version 3.2 introduced the $lookup operator • SQL query • MongoDB C# driver query
  • 21. MongoDB has caught up to relational DBs Notably, we show that the MUPG (match, unwind, project, group) fragment is already at least as expressive as full relational algebra over (the relational view of) a single collection, and in particular able to express arbitrary joins. – Bolzano University in Italy
  • 22. Hash-Based Sharding Roles Kerberos On-Prem Monitoring 2.4 GA 2013 2.6 GA 2014 3.0 GA 2015 3.2 GA 2015 Headline Features by Release $out Index Intersection Text Search Field-Level Redaction LDAP & x509 Auditing Document Validation $lookup Fast Failover Simpler Scalability Aggregation ++ Encryption At Rest In-Memory Storage Engine BI Connector MongoDB Compass APM Integration Profiler Visualization Auto Index Builds Backups to File System Doc-Level Concurrency Compression Storage Engine API ≤50 replicas Auditing ++ Ops Manager Linearizable reads Intra-cluster compression Views Log Redaction Graph Processing Decimal Collations Faceted Navigation Spark Connector ++ Zones ++ Aggregation ++ Auto-balancing ++ ARM, Power, zSeries BI Connector ++ Compass ++ Hardware Monitoring Server Pool LDAP Authorization Encrypted Backups Cloud Foundry Integration 3.4 GA 2016Atlas The evolution of MongoDB 1.0 2009
  • 25. Inserting data: MongoDB vs. MySQL • Inserting 1,615 chemical compound records into two parent-child tables. • To optimize the MySQL query, we turned off foreign keys during insert and used a string builder to create a bulk insert SQL statement. This improved insert performance by a factor of 360. • Compare to MongoDB. Database Milliseconds Lines of code MySQL not optimized 147,600 (2.5 minutes) 21 MySQL optimized 410 40 MongoDB 68 1
  • 27. Selecting data: MongoDB vs. MySQL • Query 600,000 rows of SampleCompound result data • To optimize the MySQL select query, we created a dictionary to lookup child records for each parent, this improved performance by a factor of 300, optimization effort: 2 engineers and 2 weeks. Database Seconds Lines of code MySQL not optimized 2,400 (4.1 minutes) 20 MySQL optimized 8.2 29 MongoDB 17.5 7
  • 29. Migrating to MongoDB reduced code by 3.5x SQLite MongoDB Data Layer Lines of Code 4271 1260
  • 30. MongoDB compared to DynamoDB MongoDB DynamoDB Anywhere AWS Rich Ad-hoc Query Language + IDE No Ad-hoc query language Many operators (Joins, Aggregation, etc.) Fewer operators Excellent Performance Excellent Performance Easy to deploy (with Atlas) Easy to Deploy each table Adding tables requires no configuration changes Adding tables requires additional configuration and cost Easy to use from AWS services but not natively integrated Native integration with AWS Services: IAM, VPC, Lambda, Kinesis Released in 2009 Released in 2012
  • 31. MongoDB vs. S3 performance Download 220 KB object from MongoDB was 7x faster cold, and 3x faster when warm MongoDB Amazon S3 Retrieve document first time 68 ms 468 ms Retrieve document second time 13 ms 38 ms
  • 32. MongoDB vs. S3 performance MongoDB 11x faster than S3 in the use case of partial document loading MongoDB S3 Data size 400 Bytes 2.1 MB Performance 19 ms 214 ms
  • 34. Frameworks used to parallelize algorithms • AWS Lambda • Docker and Amazon ECS • Spark and Elastic Map Reduce
  • 36. Why Atlas? • Easy • Performant • Seamless Migration • Robust • No downtime, even when scaling up
  • 37. Building MongoDB Atlas on Amazon Web Services
  • 39. Automated Available On-Demand Secure Highly Available Automated Backups Elastically Scalable Database as a service for MongoDB
  • 40. Fully managed MongoDB clusters Customer only needs to choose the shape and size of the cluster ● Instance size (CPU and RAM) ● Replication factor ● Number of shards ● Disk space ● Disk speed Screenshot of create dialog Cluster features
  • 41. VPC peering IP address whitelist SCRAM-SHA-1 authentication readWriteAnyDatabase enableSharding clusterMonitor SSL Using well-known CA Trust system CAs by default Security features
  • 43. AWS Account X—Region Y VPC (Customer N) Availability Zone A Availability Zone B Availability Zone C Subnet A Subnet B Subnet C mongod—27017 mongod—27017 mongod—27017 Customer container with replica set
  • 44. AWS Account X—Region Y VPC (Customer N) Availability Zone A Availability Zone B Availability Zone C Subnet A Subnet B Subnet C Customer container with sharded cluster shard0 S shard1 S shard2 config shard0 S shard1 S shard2 config shard0 S shard1 S shard2 config
  • 45. mongod—27017 mongod—27017 mongod—27017 One security group per VPC applied to all Amazon EC2 instances Three classes of security rules: ● MongoDB traffic between cluster members ● MongoDB traffic between application and clusters ● SSH traffic between production support jump box and EC2 instance App Server Jump Box IP firewall using security groups
  • 46. 173.31.248.0/21 10.0.0.0/16 VPC peering Your VPC Elastic LB CIDR Block: 10.0.0.0/16 Atlas VPC AZ 1 AZ 2 AZ 3 CIDR Block: 172.31.248.0/21
  • 48. We want prime to be such a good value, you’d be irresponsible not to be a member. —Jeff Bezos

Editor's Notes

  • #3: ThermoFisher is the biggest company you’ve never heard about, we strive to be the world leader in serving science. We have 50,000 employees around the world. Our goal is to make the world healthier, cleaner and safer.
  • #4: One of the products we make is a Mass Spectrometer. At the core of the instrument is ping-pong ball size metal cylinder called an Orbitrab. Which spins ionized molecules around for distances of several kilometers in a fraction of a second and measures their masses very accurately. It turns out there are quite a few applications for this capability.
  • #5: ThermoFisher Mass Spectrometry instruments are used to detect Pollutants, if it is bad for you, our instruments will detect it. One of our customers is the Karolinska institute in Sweden, (this is the same university responsible for giving out Nobel prizes) and they processes 100k samples per year serving all of Sweden. Each of their high resolution instruments produces 100TB data per year.
  • #6: For me, making the world a cleaner, safer place is personally meaningful. My son Landon was born with a Cleft lip and Pallet which is caused at least in part by exposure of the baby at a very early age (pea size) to some environmental condition: mercury, lead, a volatile organic. So preventing other children from being born with birth with defects and having safe and healthy lives is one thing that motivates me to come to work every day.
  • #7: The next mission to mars in 2020 will carry a mass spec known as the Mars Organic Molecule Analyzer, or MOMA, which contains a design based on a ThermoFisher Linear Ion Trap Mass Spectrometer. Mars rover is not running MongoDB, but maybe as the NASA trend continues for using commercial products and Thermo increasingly adapts MongoDB, MongoDB will ship on a Mars Rover some day. You definitely couldn’t run DynamoDB on the mars rover, but you could run Mongo. ---- http://science.gsfc.nasa.gov/sed/bio/veronica.t.pinnick https://ep70.eventpilot.us/web/planner.php?id=ASMS16 Mars Organic Molecule Analyzer (MOMA) Mass Spectrometer: Performance Testing in GC-MS and LD-MS Modes of Operation
  • #8: Our mass spectrometers are used in major sporting events to ensure an even playing field by detecting banned performance enhancing drugs. [optional] If an athlete is using synthetic Testosterone, the instruments are sufficiently sensitive and the analytical techniques sufficiently advanced to detect the difference between synthetic and natural testosterone. [extra] We have a marketing contract with CBS for any CSI TV shoes they use ThermoFisher equipment. [reference] http://www.nbcnews.com/storyline/2016-rio-summer-olympics/rio-olympics-top-anti-doping-scientist-cheats-will-probably-be-n573531
  • #9: So this is what beer looks like in a mass spec. This is 100 samples of various types of beer. Each one of the variations in these peaks represents the unique flavonoids that make a product unique and give it a distinct smell and flavor. Our mass spectrometers are used for product authenticity studies.
  • #10: Any MythBuster fans out there? Adam Savage actually spoke at the keynote of MongoDB world 2016 in New York, so that is why I am a Mongo fan, never mind the technical merits. In 2009 The Mythbusters Adam and Jamie use ThermoFisher Mass Spectrometer to determine if soda cans have rat pee on them. Really great episode, just search for “Rat Pee Soda”. In the experiment, they take 1000 soda cans and let rats run and pee all over them. And then take soda cans from local convenience stores and compare the two sets of cans using a black light. Using the black light, both sets look similar. Organic material glowing under the black light. However, when they take the rat pee cans and the convenience store cans to the Stanford analytical lab, the mass spectrometer is able to conclusively determine that no rat pee is found on the convenience store cans. [reference] Episode 135 http://www.dailymotion.com/video/x2n9enp (Starting at minute 7:30 Jamie and Adam visit Stanford lab and use Thermo Mass Specs)
  • #11: Jamie Says quote “These Mass Spectrometers are extremely accurate, they can detect down to a femptomole, and if it says they aren’t in there, its not in there.” Adam was very relieved by this result and drank a soda.
  • #13: To keep things interesting, I am going to do a live demo. This is always a risky proposition when trying to remote monitor a complex instrument that is more expensive than my house using a network that is potentially unpredictable. Let me focus on one of our application that just rolled out to production called “Instrument Connect” built using Mongo Atlas. “Instrument Connect” allows our customers to connect their mass spectrometers to the ThermoFisher cloud built on AWS. Customers can monitor instrument status from anywhere in the world and receive notification of any errors that occur. Instrument data is streamed up to the cloud where it can take advantage of the incredible processing power of the AWS cloud and users from around the world can collaborate on the experiments and results. The database which stores instrument status is MongoDB Atlas. We also built a prototype integration with Amazon Alexa allowing us to control the instrument with voice commands. [Demo outline] Open MS Instrument connect dashboard. Open Atlas Dashboard
  • #14: This is the mass spec we will be remote monitoring. I didn’t have the budget to bring the instrument with me on stage so I’ll use remote desktop. Humor: Apparently this is the only shirt I own.
  • #16: ThermoFisher is increasingly using MongoDB in its applications.
  • #18: Mass Spectrometers have become so sensitive that they can measure the mass of a molecule down to the electron. This results in a huge amount of data.
  • #19: Rich query language includes partial document updates,
  • #20: MongoDB can store many types of data. Using MongoDB allows us to simplify our infrastructure. It also allows us to use a single set of tools for managing our data and our applications.
  • #21: Now that MongoDB supports join operations, we can store both relational and document data in the same database. This greatly expands the type of application that can be built on MongoDB and simplifies our deployment since we only have one database rather than two.
  • #23: MongoDB has climbed to the number 4 slot on db-engines ranking of most popular databases. This is based on metrics including job postings, stack overflow questions and google searches. Mongo is only behind Oracle, MySql, and SqlServer. Oracle which was first released in 1979, Sql Server in 1989, MySql in 1999 and MongoDB in 2009. Remarkable that MongoDB has made up so much ground on relational database technology which is 40 years old and doesn’t show any sign of slowing down.
  • #24: Let me talk for a moment about some performance, scalability and cost comparisons that we did with MySql vs. MongoDB We apply the same scientific rigor as our customers when making a decision on which database to use.
  • #25: (remove fro AWS)
  • #26: MySQL not optimized: 21 lines MySQL optimized: 40 lines MongoDB: 1 line TODO: run test with larger data set.
  • #27: MySQL not optimized: 21 lines MySQL optimized: 40 lines MongoDB: 1 line If I were to reduce my presentation to one slide, this would be that slide. This is a staggeringly awesome improvement in developer productivity.
  • #28: MySQL not optimized: 21 lines MySQL optimized: 40 lines MongoDB: 1 line
  • #29: Similar number of lines of code and performance. SQL Injection: Nice advantage of MongoDB is that the queries are strongly typed and no chance of SQL injection. After all these years SQL injection is still the number one security threat. TODO: measure performance
  • #30: The application used in the major sporting event in summer Rio sporting event - TraceFinder switched from XML and SQLite to MongoDB. We could probably reduce this even further, but there is a dramatic decrease in cyclomatic complexity. TODO: measure cyclomatic complexity.
  • #31: Here I am contrasting DynamoDB and MongoDB, I think that as usual the answer to which database is best for your application is “It depends”, but here is some information to help you make that choice for your application. Both of these databases are very good and I think both will continue to grow in popularity at a much faster rate than relational databases. Like me product manager says, I don’t need better decisions, I need better choices. And as a customer it is great to have multiple good database choices. The most obvious difference is that Dynamo is an AWS-only service and Mongo runs anywhere. Rich query language: With MongoDB, I can answer questions like which instruments had the highest utilization last month or what is the pump pressure were I see my pumps begin to fail. And I can find this out without writing any code using ad-hoc queries. With MongoDB, I can take advantage of rich features like joins, document validation, strongly typed queries, decimal data type, views, graph queries, and grouping, the aggregation pipeline, map-reduce, native spark connector, etc. According to db-engines, MongoDB and DynamoDB are the two top rated database in the category of Document DBs. MongoDB has a score of 325 and DyamoDB a score of 29. Native Integration with AWS: For example, if you want to take advantage of the native triggers available to execute a Lambda statement
  • #32: Please don’t interpret this slide as you should always use MongoDB over S3. That would not be wise. S3 would far out perform MongoDB in other scenarios. In this particular case, MongoDB is a much better choice. This measurement was taken by running C# code from EC2 instance in AWS US-East region. The title of this slide might strike you as odd, comparing S3 with MongoDB. S3 is an powerful AWS service which can be used to store multi gigabyte files and tiny JSON objects. It is a key-value store but by carefully selecting keys you can use S3 like a simple database with tables and rows a set of S3 objects with the same key prefix can function like a database table, the advantage is that you have a very inexpensive, serverless, highly available database. But as your application gets more complex you miss out on the rich query capabilities of a full relational or document database. For our Real-time chromatogram we realized a couple orders of magnitude in savings in network and CPU consumption on our application servers by not having to download the entire S3 object and filter it down, we were able to do this instead on the database. [Reference] Performance measurement code: "C:\_git\CloudAgent\srcapi\Ironclad.Bootstrap\Repo\RealtimeChroDalBootstrap.cs" [Note] Serialzed JSON to S3 using Newtonsoft to S3 which is 20% larger objects compared with Mongo Bson. (storage on disk is even more of a contrast)
  • #33: Please don’t interpret this slide as you should always use MongoDB over S3. That would not be wise. S3 would far out perform MongoDB in other scenarios. In this particular case, MongoDB is a much better choice. This measurement was taken by running C# code from EC2 instance in AWS US-East region. The title of this slide might strike you as odd, comparing S3 with MongoDB. S3 is an powerful AWS service which can be used to store multi gigabyte files and tiny JSON objects. It is a key-value store but by carefully selecting keys you can use S3 like a simple database with tables and rows a set of S3 objects with the same key prefix can function like a database table, the advantage is that you have a very inexpensive, serverless, highly available database. But as your application gets more complex you miss out on the rich query capabilities of a full relational or document database. For our Real-time chromatogram we realized a couple orders of magnitude in savings in network and CPU consumption on our application servers by not having to download the entire S3 object and filter it down, we were able to do this instead on the database. [Reference] Performance measurement code: "C:\_git\CloudAgent\srcapi\Ironclad.Bootstrap\Repo\RealtimeChroDalBootstrap.cs" [Note] Serialzed JSON to S3 using Newtonsoft to S3 which is 20% larger objects compared with Mongo Bson. (storage on disk is even more of a contrast)
  • #35: With all the time we are saving writing and optimizing data layer code, we are able to invest in improving our algorithms, improving the user experience, and improving the processing infrastructure.
  • #37: We have used MongoDB in a single server configuration, but we did not have expertise in cluster management, and don’t necessarily want to. When Atlas was announced in July of this year, we immediately jumped on board, ease of deploying MongoDB was the one thing holding us back. With a weekend of work I switched from Dynamo to MongoDB in one day. Switching databases two months before going to production is not something I necessarily recommend. On Monday I asked my boss, did you notice anything different about the application, any downtime, he said now. I told him that I switched the database to Mongo, and he was not enthusiastic Switching has turned out to be a great decision, and we have other applications looking to make the same switch. Robustness: We were storing some of our data in DynamoDB and some in S3 because Dynamo is expensive. But you can do partial updates of S3 documents we since moved this data to MongoDB have improved the robustness and performance significantly. We had a significant number of outages on Dynamo because we didn’t have sufficient throughput on our Dynamo tables, due to short spikes in traffic Rate of development: Adding a new collection is much easier than adding a table in Dynamo. I don’t have to write a cloud formation script each time I want to provision a new table. I don’t have to provision read/write capacity for each table of the database independently. Data analytics: Writing ad-hoc queries to answer questions like “give me a count of instruments per user”, or “what are my most active instruments” was not possible with DynamoDB because it doesn’t (didn’t) have a standalone query language or IDE. Ability to run outside cloud as well as inside the cloud