SlideShare a Scribd company logo
How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
Moving to NoSQL FossFor.us used CouchDB (NoSQL) “ Just adding new fields was trivial, and was happening all the time” – Mark Ramm Scaling up to the level of SF.net needs research CouchDB MongoDB Tokyo Cabinet/Tyrant Cassandra... and others
Rewriting “Consume” Most traffic on SF.net hits 3 types of pages: Project Summary File Browser Download Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net Original goal is 1 MongoDB document per project  Later split release data because some projects have  lots  of releases Periodic updates via RSS and AMQP from “Develop”
Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node  performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
SF.net Downloads Allow non-sf.net projects to use SourceForge mirror network Stats calculated in Hadoop and stored/served from MongoDB Same deployment architecture as Consume (4 web, 1 db)
Allura  (SF.net “beta” devtools) Rewrite developer tools with new architecture Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come Single MongoDB replica set manually sharded by project Release early & often
What We Liked Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers Schemaless server allows fast schema evolution in development, making many migrations unnecessary Replication is  easy , making scalability and backups  easy Keep a “backup slave” running Kill backup slave, copy off database, bring back up the slave Automatic re-sync with master Query Language You mean I can have performance  without  map-reduce? GridFS
Pitfalls Too-large documents Store less per document Return only a few fields Ignoring indexing Watch your server log; bad queries show up there Ignoring your data’s schema Using many databases when one will do Using too many queries
Ming –  an “Object-Document Mapper?” Your data has a schema Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code Sometimes you  need  a “migration” Changing the structure/meaning of fields Adding indexes Sometimes lazy, sometimes eager Queuing up all your updates can be handy Python dicts are nice; objects are nicer
Ming Concepts Inspired by SQLAlchemy Group of classes to which you map your collections Each class defines its schema, including indexes Convenience methods for loading/saving objects and ensuring indexes are created Migrations Unit of Work –  great  for web applications MIM – “Mongo in Memory” nice for unit tests
Ming Example from   ming   import  schema from   ming.orm   import  MappedClass from   ming.orm   import  (FieldProperty, ForeignIdProperty,  RelationProperty) class   WikiPage (MappedClass): class   __mongometa__ : session  =  session name  =   'wiki_page'   _id  =  FieldProperty(schema . ObjectId) title  =  FieldProperty( str ) text  =  FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping
Open Source Ming http://sf.net/projects/merciless/ MIT License Allura http://sf.net/p/allura/ Apache License
Future Work mongos New Allura Tools Migrating legacy SF.net projects to Allura Stats all in MongoDB rather than Hadoop? Better APIs to access your project data
Questions?
Rick Copeland @rick446 [email_address]

More Related Content

What's hot (20)

PPTX
When to Use MongoDB...and When You Should Not...
MongoDB
 
PDF
Webinar: Faster Big Data Analytics with MongoDB
MongoDB
 
PPTX
How Insurance Companies Use MongoDB
MongoDB
 
PDF
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
PPTX
When to Use MongoDB
MongoDB
 
PPTX
Migrating from RDBMS to MongoDB
MongoDB
 
PPTX
Building a Scalable and Modern Infrastructure at CARFAX
MongoDB
 
PPTX
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB
 
PPTX
MongoDB in a Mainframe World
MongoDB
 
PDF
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
MongoDB
 
PPTX
Big Data: Guidelines and Examples for the Enterprise Decision Maker
MongoDB
 
PDF
Final_CloudEventFrankfurt2017 (1).pdf
MongoDB
 
PPT
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
PPTX
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB
 
PPTX
Introduction to MongoDB Enterprise
MongoDB
 
PPTX
Sizing Your MongoDB Cluster
MongoDB
 
PPTX
How leading financial services organisations are winning with tech
MongoDB
 
PPTX
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
PPTX
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB
 
PDF
MongoDB 3.2 Feature Preview
Norberto Leite
 
When to Use MongoDB...and When You Should Not...
MongoDB
 
Webinar: Faster Big Data Analytics with MongoDB
MongoDB
 
How Insurance Companies Use MongoDB
MongoDB
 
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
When to Use MongoDB
MongoDB
 
Migrating from RDBMS to MongoDB
MongoDB
 
Building a Scalable and Modern Infrastructure at CARFAX
MongoDB
 
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB
 
MongoDB in a Mainframe World
MongoDB
 
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
MongoDB
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
MongoDB
 
Final_CloudEventFrankfurt2017 (1).pdf
MongoDB
 
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB
 
Introduction to MongoDB Enterprise
MongoDB
 
Sizing Your MongoDB Cluster
MongoDB
 
How leading financial services organisations are winning with tech
MongoDB
 
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB
 
MongoDB 3.2 Feature Preview
Norberto Leite
 

Viewers also liked (13)

PPTX
eBay Cloud CMS based on NOSQL
Xu Jiang
 
PDF
No sql e as vantagens na utilização do mongodb
fabio perrella
 
PDF
Ebay: DB Capacity planning at eBay
DataStax Academy
 
PDF
An Elastic Metadata Store for eBay’s Media Platform
MongoDB
 
PPTX
NOSQL uma breve introdução
Wise Systems
 
PDF
Social Data and Log Analysis Using MongoDB
Takahiro Inoue
 
PDF
Artigo Nosql
Ademir Tadeu
 
KEY
Scaling with MongoDB
MongoDB
 
PPTX
Semantic Wiki: Social Semantic Web In Action:
Jesse Wang
 
KEY
NoSQL at Twitter (NoSQL EU 2010)
Kevin Weil
 
PDF
Building LinkedIn's Learning Platform with MongoDB
MongoDB
 
PPTX
MongoDB at eBay
MongoDB
 
eBay Cloud CMS based on NOSQL
Xu Jiang
 
No sql e as vantagens na utilização do mongodb
fabio perrella
 
Ebay: DB Capacity planning at eBay
DataStax Academy
 
An Elastic Metadata Store for eBay’s Media Platform
MongoDB
 
NOSQL uma breve introdução
Wise Systems
 
Social Data and Log Analysis Using MongoDB
Takahiro Inoue
 
Artigo Nosql
Ademir Tadeu
 
Scaling with MongoDB
MongoDB
 
Semantic Wiki: Social Semantic Web In Action:
Jesse Wang
 
NoSQL at Twitter (NoSQL EU 2010)
Kevin Weil
 
Building LinkedIn's Learning Platform with MongoDB
MongoDB
 
MongoDB at eBay
MongoDB
 
Ad

Similar to MongoATL: How Sourceforge is Using MongoDB (20)

PPT
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Rick Copeland
 
PPT
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rick Copeland
 
PDF
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
PPT
MongoDB Pros and Cons
johnrjenson
 
PPT
A Brief MongoDB Intro
Scott Hernandez
 
KEY
MongoDB at RuPy
Mike Dirolf
 
PDF
Mongo db transcript
foliba
 
KEY
Mongodb intro
christkv
 
KEY
The Care + Feeding of a Mongodb Cluster
Chris Henry
 
PPTX
Silicon Valley Code Camp: 2011 Introduction to MongoDB
Manish Pandit
 
KEY
Discover MongoDB - Israel
Michael Fiedler
 
PPTX
MongoDB Internals
Siraj Memon
 
PDF
Introduction to MongoDB
Mike Dirolf
 
KEY
MongoDB at ZPUGDC
Mike Dirolf
 
PPTX
MongoDB 2.4 and spring data
Jimmy Ray
 
PDF
MongoDB Basics
Sarang Shravagi
 
PDF
MongoDB at FrozenRails
Mike Dirolf
 
PPTX
Conceptos básicos. Seminario web 6: Despliegue de producción
MongoDB
 
PPT
Introduction to MongoDB
Ravi Teja
 
PPT
9. Document Oriented Databases
Fabio Fumarola
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Rick Copeland
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rick Copeland
 
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
MongoDB Pros and Cons
johnrjenson
 
A Brief MongoDB Intro
Scott Hernandez
 
MongoDB at RuPy
Mike Dirolf
 
Mongo db transcript
foliba
 
Mongodb intro
christkv
 
The Care + Feeding of a Mongodb Cluster
Chris Henry
 
Silicon Valley Code Camp: 2011 Introduction to MongoDB
Manish Pandit
 
Discover MongoDB - Israel
Michael Fiedler
 
MongoDB Internals
Siraj Memon
 
Introduction to MongoDB
Mike Dirolf
 
MongoDB at ZPUGDC
Mike Dirolf
 
MongoDB 2.4 and spring data
Jimmy Ray
 
MongoDB Basics
Sarang Shravagi
 
MongoDB at FrozenRails
Mike Dirolf
 
Conceptos básicos. Seminario web 6: Despliegue de producción
MongoDB
 
Introduction to MongoDB
Ravi Teja
 
9. Document Oriented Databases
Fabio Fumarola
 
Ad

More from Rick Copeland (10)

PDF
Python Functions (PyAtl Beginners Night)
Rick Copeland
 
KEY
Schema Design at Scale
Rick Copeland
 
KEY
Building Your First MongoDB Application
Rick Copeland
 
PPTX
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
PPTX
Chef on MongoDB and Pyramid
Rick Copeland
 
PPTX
Scaling with MongoDB
Rick Copeland
 
PDF
Chef on Python and MongoDB
Rick Copeland
 
PPT
Real-Time Python Web: Gevent and Socket.io
Rick Copeland
 
PPT
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
PPT
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Rick Copeland
 
Python Functions (PyAtl Beginners Night)
Rick Copeland
 
Schema Design at Scale
Rick Copeland
 
Building Your First MongoDB Application
Rick Copeland
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
Chef on MongoDB and Pyramid
Rick Copeland
 
Scaling with MongoDB
Rick Copeland
 
Chef on Python and MongoDB
Rick Copeland
 
Real-Time Python Web: Gevent and Socket.io
Rick Copeland
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Rick Copeland
 

Recently uploaded (20)

PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
The Future of Artificial Intelligence (AI)
Mukul
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 

MongoATL: How Sourceforge is Using MongoDB

  • 1. How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
  • 2. SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
  • 3. Moving to NoSQL FossFor.us used CouchDB (NoSQL) “ Just adding new fields was trivial, and was happening all the time” – Mark Ramm Scaling up to the level of SF.net needs research CouchDB MongoDB Tokyo Cabinet/Tyrant Cassandra... and others
  • 4. Rewriting “Consume” Most traffic on SF.net hits 3 types of pages: Project Summary File Browser Download Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net Original goal is 1 MongoDB document per project Later split release data because some projects have lots of releases Periodic updates via RSS and AMQP from “Develop”
  • 5. Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
  • 6. Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
  • 7. SF.net Downloads Allow non-sf.net projects to use SourceForge mirror network Stats calculated in Hadoop and stored/served from MongoDB Same deployment architecture as Consume (4 web, 1 db)
  • 8. Allura (SF.net “beta” devtools) Rewrite developer tools with new architecture Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come Single MongoDB replica set manually sharded by project Release early & often
  • 9. What We Liked Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers Schemaless server allows fast schema evolution in development, making many migrations unnecessary Replication is easy , making scalability and backups easy Keep a “backup slave” running Kill backup slave, copy off database, bring back up the slave Automatic re-sync with master Query Language You mean I can have performance without map-reduce? GridFS
  • 10. Pitfalls Too-large documents Store less per document Return only a few fields Ignoring indexing Watch your server log; bad queries show up there Ignoring your data’s schema Using many databases when one will do Using too many queries
  • 11. Ming – an “Object-Document Mapper?” Your data has a schema Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes Sometimes lazy, sometimes eager Queuing up all your updates can be handy Python dicts are nice; objects are nicer
  • 12. Ming Concepts Inspired by SQLAlchemy Group of classes to which you map your collections Each class defines its schema, including indexes Convenience methods for loading/saving objects and ensuring indexes are created Migrations Unit of Work – great for web applications MIM – “Mongo in Memory” nice for unit tests
  • 13. Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = 'wiki_page' _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping
  • 14. Open Source Ming http://sf.net/projects/merciless/ MIT License Allura http://sf.net/p/allura/ Apache License
  • 15. Future Work mongos New Allura Tools Migrating legacy SF.net projects to Allura Stats all in MongoDB rather than Hadoop? Better APIs to access your project data
  • 17. Rick Copeland @rick446 [email_address]