SlideShare a Scribd company logo
Pierre-Louis Gottfrois
Bastien Murzeau
Apéro Ruby Bordeaux, 8 novembre 2011
• Brève introduction


• Cas pratique


• Map / Reduce
Qu’est ce que mongoDB ?


 mongoDB est une base de donnée
        de type NoSQL,
          sans schéma
       document-oriented
sans-schéma

• Très utile en développements
  ‘agiles’ (itérations, rapidité de modifications,
  flexibilité pour les développeurs)

• Supporte des fonctionnalités qui seraient, en
  BDDs relationnelles :
 • quasi-impossible (stockage d’éléments non finis, ex. tags)

 • trop complexes pour ce qu’elles sont (migrations)
document-oriented
• mongoDB stocke des documents, pas de
  rows

 • les documents sont stockés sous forme de
   JSON; binary JSON

• la syntaxe de requêtage est aussi fournie que
  SQL

• le mécanisme de documents ‘embedded’
  résout bon nombre de problèmes rencontrés
document-oriented

• Les documents sont stockés dans une
 collection, en RoR = model


• une partie des ces données sont indexées
 pour optimiser les performances


• un document n’est pas une poubelle !
stockage de données
        volumineuses
• mongoDB (et autres NoSQL) sont plus
 performantes pour la scalabilité horizontale
 • ajout de serveurs pour augmenter la capacité
   de stockage («sharding»)
 • garantissant ainsi une meilleur disponibilité
 • load-balancing optimisé entre les nodes
 • augmentation transparente pour l’application
Cas pratique
• ORM devient ODM, la gem de référence mongoid
  • ou : mongoMapper, DataMapper
• Création d’une application a base de NoSQL MongoDB
  • rails new nosql
  • edition du Gemfile
    •   gem ‘mongoid’

    •   gem ‘bson_ext’

  • bundle install
  • rails generate mongoid:config
Cas pratique
• edition du config/application.rb
  • #require 'rails/all'
  • require "action_controller/railtie"
  • require "action_mailer/railtie"
  • require "active_resource/railtie"
  • require "rails/test_unit/railtie"
Cas pratique
class Subject
  include Mongoid::Document
  include Mongoid::Timestamps

  has_many :scores,     :as => :scorable, :dependent => :delete, :autosave => true
  has_many :requests,   :dependent => :delete
  belongs_to :author,   :class_name => 'User'




    class Conversation
      include Mongoid::Document
      include Mongoid::Timestamps


      field :public,            :type => Boolean, :default => false

      has_many :scores,         :as => :scorable, :dependent => :delete
      has_and_belongs_to_many   :subjects
      belongs_to :timeline
      embeds_many :messages
Map Reduce
Example


                               A “ticket” collection




{                       {                       {                       {
    “id” : 1,               “id” : 2,               “id” : 3,               “id” : 4,
    “day” : 20111017,       “day” : 20111017,       “day” : 20111017,       “day” : 20111017,
    “checkout” : 100        “checkout” : 42         “checkout” : 215        “checkout” : 73
}                       }                       }                       }
Problematic

• We want to
 • Calculate the ‘checkout’ sum of each object in our
    ticket’s collection

 • Be able to distribute this operation over the network
 • Be fast!
• We don’t want to
 • Go over all objects again when an update is made
Map : emit(checkout)

    The ‘map’ function emit (select) every checkout value
               of each object in our collection


          100                      42                     215                      73



{                       {                       {                       {
    “id” : 1,               “id” : 2,               “id” : 3,               “id” : 4,
    “day” : 20111017,       “day” : 20111017,       “day” : 20111017,       “day” : 20111017,
    “checkout” : 100        “checkout” : 42         “checkout” : 215        “checkout” : 73
}                       }                       }                       }
Reduce : sum(checkout)
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 215          “checkout” : 73
}                         }                         }                        }
Reduce function

 The ‘reduce’ function apply the algorithmic logic
 for each key/value received from ‘map’ function

This function has to be ‘idempotent’ to be called
      recursively or in a distributed system

reduce(k, A, B) == reduce(k, B, A)
reduce(k, A, B) == reduce(k, reduce(A, B))
Inherently Distributed
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 215          “checkout” : 73
}                         }                         }                        }
Distributed
Since ‘map’ function emits objects to be reduced
and ‘reduce’ function processes for each emitted
   objects independently, it can be distributed
            through multiple workers.




         map                     reduce
Logaritmic Update

For the same reason, when updating an object, we
    don’t have to reprocess for each obejcts.

   We can call ‘map’ function only on updated
                     objects.
Logaritmic Update
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logaritmic Update
                                                  430




                        142                                                 288




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logaritmic Update
                                                  430




                        142                                                 283




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logarithmic Update
                                                  425




                        142                                                 283




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Let’s do some code!
$> mongo

>   db.tickets.save({   "_id":   1,   "day":   20111017,   "checkout":   100 })
>   db.tickets.save({   "_id":   2,   "day":   20111017,   "checkout":   42 })
>   db.tickets.save({   "_id":   3,   "day":   20111017,   "checkout":   215 })
>   db.tickets.save({   "_id":   4,   "day":   20111017,   "checkout":   73 })

> db.tickets.count()
4

> db.tickets.find()
{ "_id" : 1, "day" : 20111017, "checkout" : 100 }
...

> db.tickets.find({ "_id": 1 })
{ "_id" : 1, "day" : 20111017, "checkout" : 100 }
> var map = function() {
... emit(null, this.checkout)
}

> var reduce = function(key, values) {
... var sum = 0
... for (var index in values) sum += values[index]
... return sum
}
Temporary Collection
> sumOfCheckouts = db.tickets.mapReduce(map, reduce)
{
  "result" : "tmp.mr.mapreduce_123456789_4",
  "timeMills" : 8,
  "counts" : { "input" : 4, "emit" : 4, "output" : 1 },
  "ok" : 1
}

> db.getCollectionNames()
[
  "tickets",
  "tmp.mr.mapreduce_123456789_4"
]

> db[sumOfCheckouts.result].find()
{ "_id" : null, "value" : 430 }
Persistent Collection
> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })

> db.getCollectionNames()
[
  "sumOfCheckouts",
  "tickets",
  "tmp.mr.mapreduce_123456789_4"
]

> db.sumOfCheckouts.find()
{ "_id" : null, "value" : 430 }

> db.sumOfCheckouts.findOne().value
430
Reduce by Date
> var map = function() {
... emit(this.date, this.checkout)
}

> var reduce = function(key, values) {
... var sum = 0
... for (var index in values) sum += values[index]
... return sum
}
> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })

> db.sumOfCheckouts.find()
{ "_id" : 20111017, "value" : 430 }
What we can do
Scored Subjects per
        User
Subject   User   Score
   1       1       2
   1       1       2
   1       2       2
   2       1       2
   2       2      10
   2       2       5
Scored Subjects per
   User (reduced)
Subject   User   Score

  1        1      4

  1        2      2

  2        1      2

  2        2      15
$> mongo

>   db.scores.save({   "_id":   1,   "subject_id":   1,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   2,   "subject_id":   1,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   3,   "subject_id":   1,   "user_id":   2,   "score":   2 })
>   db.scores.save({   "_id":   4,   "subject_id":   2,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   5,   "subject_id":   2,   "user_id":   2,   "score":   10 })
>   db.scores.save({   "_id":   6,   "subject_id":   2,   "user_id":   2,   "score":   5 })

> db.scores.count()
6

> db.scores.find()
{ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
...

> db.scores.find({ "_id": 1 })
{ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
> var map = function() {
... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id,
... user_id:this.user_id, score:this.score});
}

> var reduce = function(key, values) {
... var result = {user_id:"", subject_id:"", score:0};
... values.forEach(function (value) {result.score += value.score;result.user_id =
... value.user_id;result.subject_id = value.subject_id;});
... return result
}
ReducedScores
                         Collection
> db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" })

> db.getCollectionNames()
[
  "reduced_scores",
  "scores"
]

>   db.reduced_scores.find()
{   "_id" : "1-1", "value" :   {   "user_id"   :   1,   "subject_id"   :   1,   "score"   :   4 } }
{   "_id" : "1-2", "value" :   {   "user_id"   :   1,   "subject_id"   :   2,   "score"   :   2 } }
{   "_id" : "2-1", "value" :   {   "user_id"   :   2,   "subject_id"   :   1,   "score"   :   2 } }
{   "_id" : "2-2", "value" :   {   "user_id"   :   2,   "subject_id"   :   2,   "score"   :   15 } }

> db.reduced_scores.findOne().score
4
Dealing with Rails Query

ruby-1.9.2-p180 :007 > ReducedScores.first
 => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId('...'),
"subject_id"=>BSON::ObjectId('...'), "score"=>4.0}>

ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count
 => 2

ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value['score']
 => 4.0

ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value['score']
 => 2.0
Questions ?

More Related Content

Viewers also liked (17)

PDF
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
bndmr
 
PPT
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB
 
PDF
sshGate - RMLL 2011
Tauop
 
PPTX
MongoDB Deployment Checklist
MongoDB
 
PPTX
Automatisez votre gestion de MongoDB avec MMS
MongoDB
 
PDF
Le monitoring à l'heure de DevOps et Big Data
Claude Falguiere
 
PDF
Supervision
Souhaib El
 
PPTX
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
MongoDB
 
PDF
L\'authentification forte : Concept et Technologies
Ibrahima FALL
 
PPSX
Supervision de réseau informatique - Nagios
Aziz Rgd
 
PPTX
ElasticSearch : Architecture et Développement
Mohamed hedi Abidi
 
DOC
Rapport de stage nagios
hindif
 
PPT
PKI par la Pratique
Sylvain Maret
 
PPTX
MongoDB Schema Design: Practical Applications and Implications
MongoDB
 
PDF
Installer et configurer NAGIOS sous linux
Zakariyaa AIT ELMOUDEN
 
PDF
Présentation de ElasticSearch / Digital apéro du 12/11/2014
Silicon Comté
 
PDF
Tirer le meilleur de ses données avec ElasticSearch
Séven Le Mesle
 
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
bndmr
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB
 
sshGate - RMLL 2011
Tauop
 
MongoDB Deployment Checklist
MongoDB
 
Automatisez votre gestion de MongoDB avec MMS
MongoDB
 
Le monitoring à l'heure de DevOps et Big Data
Claude Falguiere
 
Supervision
Souhaib El
 
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
MongoDB
 
L\'authentification forte : Concept et Technologies
Ibrahima FALL
 
Supervision de réseau informatique - Nagios
Aziz Rgd
 
ElasticSearch : Architecture et Développement
Mohamed hedi Abidi
 
Rapport de stage nagios
hindif
 
PKI par la Pratique
Sylvain Maret
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB
 
Installer et configurer NAGIOS sous linux
Zakariyaa AIT ELMOUDEN
 
Présentation de ElasticSearch / Digital apéro du 12/11/2014
Silicon Comté
 
Tirer le meilleur de ses données avec ElasticSearch
Séven Le Mesle
 

Similar to Apéro RubyBdx - MongoDB - 8-11-2011 (20)

PDF
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
InfluxData
 
PPTX
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB
 
PDF
Search@airbnb
Mousom Gupta
 
PPTX
Operational Intelligence with MongoDB Webinar
MongoDB
 
PDF
MongoDB for Analytics
MongoDB
 
PPTX
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
PPTX
Advancing Scientific Data Support in ArcGIS
The HDF-EOS Tools and Information Center
 
PDF
You will learn RxJS in 2017
名辰 洪
 
PDF
What's new in GeoServer 2.2
GeoSolutions
 
PDF
The Art Of Readable Code
Baidu, Inc.
 
PPTX
IT Days - Parse huge JSON files in a streaming way.pptx
Andrei Negruti
 
PDF
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Databricks
 
PDF
Scaling up data science applications
Kexin Xie
 
PDF
Compose Async with RxJS
Kyung Yeol Kim
 
PDF
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Node
pdeschen
 
PDF
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
GeeksLab Odessa
 
PPTX
D3.js - A picture is worth a thousand words
Apptension
 
PDF
Browsers with Wings
Remy Sharp
 
PDF
R and cpp
Romain Francois
 
PDF
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
Tomomi Imura
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
InfluxData
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB
 
Search@airbnb
Mousom Gupta
 
Operational Intelligence with MongoDB Webinar
MongoDB
 
MongoDB for Analytics
MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Advancing Scientific Data Support in ArcGIS
The HDF-EOS Tools and Information Center
 
You will learn RxJS in 2017
名辰 洪
 
What's new in GeoServer 2.2
GeoSolutions
 
The Art Of Readable Code
Baidu, Inc.
 
IT Days - Parse huge JSON files in a streaming way.pptx
Andrei Negruti
 
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Databricks
 
Scaling up data science applications
Kexin Xie
 
Compose Async with RxJS
Kyung Yeol Kim
 
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Node
pdeschen
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
GeeksLab Odessa
 
D3.js - A picture is worth a thousand words
Apptension
 
Browsers with Wings
Remy Sharp
 
R and cpp
Romain Francois
 
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
Tomomi Imura
 
Ad

Recently uploaded (20)

PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PPTX
Machine Learning Benefits Across Industries
SynapseIndia
 
PDF
GITLAB-CICD_For_Professionals_KodeKloud.pdf
deepaktyagi0048
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
Machine Learning Benefits Across Industries
SynapseIndia
 
GITLAB-CICD_For_Professionals_KodeKloud.pdf
deepaktyagi0048
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Ad

Apéro RubyBdx - MongoDB - 8-11-2011

  • 1. Pierre-Louis Gottfrois Bastien Murzeau Apéro Ruby Bordeaux, 8 novembre 2011
  • 2. • Brève introduction • Cas pratique • Map / Reduce
  • 3. Qu’est ce que mongoDB ? mongoDB est une base de donnée de type NoSQL, sans schéma document-oriented
  • 4. sans-schéma • Très utile en développements ‘agiles’ (itérations, rapidité de modifications, flexibilité pour les développeurs) • Supporte des fonctionnalités qui seraient, en BDDs relationnelles : • quasi-impossible (stockage d’éléments non finis, ex. tags) • trop complexes pour ce qu’elles sont (migrations)
  • 5. document-oriented • mongoDB stocke des documents, pas de rows • les documents sont stockés sous forme de JSON; binary JSON • la syntaxe de requêtage est aussi fournie que SQL • le mécanisme de documents ‘embedded’ résout bon nombre de problèmes rencontrés
  • 6. document-oriented • Les documents sont stockés dans une collection, en RoR = model • une partie des ces données sont indexées pour optimiser les performances • un document n’est pas une poubelle !
  • 7. stockage de données volumineuses • mongoDB (et autres NoSQL) sont plus performantes pour la scalabilité horizontale • ajout de serveurs pour augmenter la capacité de stockage («sharding») • garantissant ainsi une meilleur disponibilité • load-balancing optimisé entre les nodes • augmentation transparente pour l’application
  • 8. Cas pratique • ORM devient ODM, la gem de référence mongoid • ou : mongoMapper, DataMapper • Création d’une application a base de NoSQL MongoDB • rails new nosql • edition du Gemfile • gem ‘mongoid’ • gem ‘bson_ext’ • bundle install • rails generate mongoid:config
  • 9. Cas pratique • edition du config/application.rb • #require 'rails/all' • require "action_controller/railtie" • require "action_mailer/railtie" • require "active_resource/railtie" • require "rails/test_unit/railtie"
  • 10. Cas pratique class Subject include Mongoid::Document include Mongoid::Timestamps has_many :scores, :as => :scorable, :dependent => :delete, :autosave => true has_many :requests, :dependent => :delete belongs_to :author, :class_name => 'User' class Conversation include Mongoid::Document include Mongoid::Timestamps field :public, :type => Boolean, :default => false has_many :scores, :as => :scorable, :dependent => :delete has_and_belongs_to_many :subjects belongs_to :timeline embeds_many :messages
  • 12. Example A “ticket” collection { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 13. Problematic • We want to • Calculate the ‘checkout’ sum of each object in our ticket’s collection • Be able to distribute this operation over the network • Be fast! • We don’t want to • Go over all objects again when an update is made
  • 14. Map : emit(checkout) The ‘map’ function emit (select) every checkout value of each object in our collection 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 15. Reduce : sum(checkout) 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 16. Reduce function The ‘reduce’ function apply the algorithmic logic for each key/value received from ‘map’ function This function has to be ‘idempotent’ to be called recursively or in a distributed system reduce(k, A, B) == reduce(k, B, A) reduce(k, A, B) == reduce(k, reduce(A, B))
  • 17. Inherently Distributed 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 18. Distributed Since ‘map’ function emits objects to be reduced and ‘reduce’ function processes for each emitted objects independently, it can be distributed through multiple workers. map reduce
  • 19. Logaritmic Update For the same reason, when updating an object, we don’t have to reprocess for each obejcts. We can call ‘map’ function only on updated objects.
  • 20. Logaritmic Update 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 21. Logaritmic Update 430 142 288 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 22. Logaritmic Update 430 142 283 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 23. Logarithmic Update 425 142 283 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 25. $> mongo > db.tickets.save({ "_id": 1, "day": 20111017, "checkout": 100 }) > db.tickets.save({ "_id": 2, "day": 20111017, "checkout": 42 }) > db.tickets.save({ "_id": 3, "day": 20111017, "checkout": 215 }) > db.tickets.save({ "_id": 4, "day": 20111017, "checkout": 73 }) > db.tickets.count() 4 > db.tickets.find() { "_id" : 1, "day" : 20111017, "checkout" : 100 } ... > db.tickets.find({ "_id": 1 }) { "_id" : 1, "day" : 20111017, "checkout" : 100 }
  • 26. > var map = function() { ... emit(null, this.checkout) } > var reduce = function(key, values) { ... var sum = 0 ... for (var index in values) sum += values[index] ... return sum }
  • 27. Temporary Collection > sumOfCheckouts = db.tickets.mapReduce(map, reduce) { "result" : "tmp.mr.mapreduce_123456789_4", "timeMills" : 8, "counts" : { "input" : 4, "emit" : 4, "output" : 1 }, "ok" : 1 } > db.getCollectionNames() [ "tickets", "tmp.mr.mapreduce_123456789_4" ] > db[sumOfCheckouts.result].find() { "_id" : null, "value" : 430 }
  • 28. Persistent Collection > db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" }) > db.getCollectionNames() [ "sumOfCheckouts", "tickets", "tmp.mr.mapreduce_123456789_4" ] > db.sumOfCheckouts.find() { "_id" : null, "value" : 430 } > db.sumOfCheckouts.findOne().value 430
  • 30. > var map = function() { ... emit(this.date, this.checkout) } > var reduce = function(key, values) { ... var sum = 0 ... for (var index in values) sum += values[index] ... return sum }
  • 31. > db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" }) > db.sumOfCheckouts.find() { "_id" : 20111017, "value" : 430 }
  • 33. Scored Subjects per User Subject User Score 1 1 2 1 1 2 1 2 2 2 1 2 2 2 10 2 2 5
  • 34. Scored Subjects per User (reduced) Subject User Score 1 1 4 1 2 2 2 1 2 2 2 15
  • 35. $> mongo > db.scores.save({ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 2, "subject_id": 1, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 3, "subject_id": 1, "user_id": 2, "score": 2 }) > db.scores.save({ "_id": 4, "subject_id": 2, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 5, "subject_id": 2, "user_id": 2, "score": 10 }) > db.scores.save({ "_id": 6, "subject_id": 2, "user_id": 2, "score": 5 }) > db.scores.count() 6 > db.scores.find() { "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 } ... > db.scores.find({ "_id": 1 }) { "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
  • 36. > var map = function() { ... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id, ... user_id:this.user_id, score:this.score}); } > var reduce = function(key, values) { ... var result = {user_id:"", subject_id:"", score:0}; ... values.forEach(function (value) {result.score += value.score;result.user_id = ... value.user_id;result.subject_id = value.subject_id;}); ... return result }
  • 37. ReducedScores Collection > db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" }) > db.getCollectionNames() [ "reduced_scores", "scores" ] > db.reduced_scores.find() { "_id" : "1-1", "value" : { "user_id" : 1, "subject_id" : 1, "score" : 4 } } { "_id" : "1-2", "value" : { "user_id" : 1, "subject_id" : 2, "score" : 2 } } { "_id" : "2-1", "value" : { "user_id" : 2, "subject_id" : 1, "score" : 2 } } { "_id" : "2-2", "value" : { "user_id" : 2, "subject_id" : 2, "score" : 15 } } > db.reduced_scores.findOne().score 4
  • 38. Dealing with Rails Query ruby-1.9.2-p180 :007 > ReducedScores.first => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId('...'), "subject_id"=>BSON::ObjectId('...'), "score"=>4.0}> ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count => 2 ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value['score'] => 4.0 ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value['score'] => 2.0