0% found this document useful (0 votes)
17 views

MongoDB - Cours 4

The document provides a comprehensive overview of MongoDB, covering its evolution, key characteristics, use cases, and criticisms. It details CRUD operations, data extraction techniques, and aggregation methods, along with practical examples and exercises. Additionally, it discusses MongoDB's flexible schema design, high availability, and scalability, while also addressing its limitations regarding ACID guarantees.

Uploaded by

rioluriolu.gm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

MongoDB - Cours 4

The document provides a comprehensive overview of MongoDB, covering its evolution, key characteristics, use cases, and criticisms. It details CRUD operations, data extraction techniques, and aggregation methods, along with practical examples and exercises. Additionally, it discusses MongoDB's flexible schema design, high availability, and scalability, while also addressing its limitations regarding ACID guarantees.

Uploaded by

rioluriolu.gm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

MongoDB

1
Table of contents

1. Introduction
2. Query a MongoDB
a. Simple extraction and various simple commands
b. Aggregation
3. Data Modeling

2
Chapter 1

3
SQL and NoSQL evolution

● 1970 : Dr. EF Codd published the paper “A Relation Model of Data for Large
Shared Data Banks”
● 1974 : IBM developed SQL
● 1979 : Oracle provides the first RDBMS commercially available
● 1986 : The first SQL Standard is published
● 1998 : First time the term NoSQL is coined by Carlo Strozzi for an open
source database that relaxes some SQL constraints (ACID) in favor of
availability and scalability

4
MongoDB key characteristics and use cases
Document-oriented database

5
MongoDB key characteristics and use cases
Key characteristics

● General purpose database


○ Not the case for many other NoSQL databases (e.g. graph-oriented)
● Flexible schema design
○ Document oriented approaches with non-defined attributes
● Build for high availability
● Build to scale
● Aggregation framework
○ Provides powerful transformation capabilities
● Native replication
● Security features
● JSON based (BSON)
● Support MapReduce
6
MongoDB key characteristics and use cases
What is the use case for MongoDB?

● Integration of data providing a single view of them


● Internet of Things
● Mobile applications
● Real-time analytics
● Personalization
● Catalog management
● Content management

7
MongoDB key characteristics and use cases
MongoDB criticism

● Schema-less nature
● Lack of proper ACID guarantees

⇒ Hard to ensure consistency, isolation...

8
MongoDB key characteristics and use cases
MongoDB binaries

● mongod
○ starts MongoDB
● mongo
○ open the MongoDB shell
● mongodump
○ create a database dump (bson+json files)
● mongorestore
○ restore a database dump

9
Chapter 2

10
CRUD using the shell

● List the databases


○ db
● Connect to a database (with creation if does not exist)
○ use <database_name>
● Insert a document
○ db.<collection_name>.insert({...MATCHING_OBJECT...})
○ db.books.insert({title: 'mastering mongoDB', isbn: '101'})
○ db.books.insert({title: 'mongoDB cookbook', isbn: '1331'})
● Find documents
○ db.<collection_name>.find({...MATCHING_OBJECT...})
○ db.books.find({isbn: '1331'})
● Delete a document
○ db.<collection_name>.remove({...MATCHING_OBJECT...})
○ db.books.remove({isbn: '101'})
● Delete a document (2)
○ db.<collection_name>.remove(OBJECT_ID_STRING)
○ db.books.remove('2345678dae345eff')

11
CRUD using the shell

● Update a document
○ db.<collection_name>.update({...MATCHING_OBJECT...},{...UPDATING_OBJECT...})
○ db.books.update({isbn: '1331'},{price: 30})
=> Replace all matched objects with the new object
● Update a document (2)
○ db.<collection_name>.update({...MATCHING_OBJECT...},{$set: {...UPDATING_OBJECT...}})
○ db.books.update({isbn: '1331'},{$set: {price: 30, title:'Mongo'})
● Find all documents
○ db.<collection_name>.find()
● Find documents and pretty print them in the console
○ db.<collection_name>.find({...MATCHING_OBJECT...}).pretty()
○ db.books.find().pretty()

12
Scripting for the mongo shell

Main reason for using the shell : Mongo shell is a Javascript shell
> var title = 'MongoDB in a nutshell'
> title
MongoDB in a nutshell
> db.books.insert({title: title, isbn: 102})
WriteResult({ "nInserted" : 1 })
> db.books.find()
{ "_id" : ObjectId("59203874141daf984112d080"), "title" : "MongoDB in a
nutshell", "isbn" : 102 }

13
Scripting for the mongo shell

Using scripts and functions


> queryBooksByIsbn = function(isbn) { return db.books.find({isbn: isbn})}
> queryBooksByIsbn("101")
{ "_id" : ObjectId("592035f6141daf984112d07f"), "title" : "mastering
mongoDB", "isbn" : "101", "price" : 30 }

Executing script files:


From the shell
mongo FILE_PATH
From the mongo shell:
load(FILE_PATH_STRING)

14
Scripting for the mongo shell
Batch inserts using the shell

authorMongoFactory = function() {
for(loop=0;loop<1000;loop++){
db.books.insert({name: "MongoDB factory book" + loop})
}
}
Although working, it performs 1000 database inserts (which is not efficient).
Prefer a bulk write:
fastAuthorMongoFactory = function() {
var bulk = db.books.initializeUnorderedBulkOp();
for(loop=0;loop<1000;loop++) {
bulk.insert({name: "MongoDB factory book" + loop})
}
bulk.execute();
}
If you want the data having the same order of insertion than declaration, use
db.books.initializeOrderedBulkOp();

15
Scripting for the mongo shell
Batch inserts using the shell

Alternative

db.collection.bulkWrite(
[ <operation 1>, <operation 2>, ... ],
{
writeConcern : <document>,
ordered : <boolean>
}
)

16
Administration

● db.dropDatabase()

● db.getCollectionNames()

● db.copyDatabase(fromDB, toDB)

17
Simple Extraction

18
Simple extraction
find()

db.<collection>.find(
{...SELECTOR_OBJECT...},
{...PROJECTION_OBJECT...}
)

19
Simple extraction
$exists

$exists
allows to filter on properties existence

db.<collection>.find({<FIELD> : {$exists: <BOOL>}},{})

20
Simple extraction
Projection

The projection allows to get specific property from the


objects

db.<collection>.find({},{<FIELD> : <BOOL>})

When <BOOL> is

● true, then shows only the specified properties


● false, then hides the specified properties

21
Simple extraction
$eq, $gt, $lt, $gte, $lte, $ne

Comparison operators $eq equal

db.<collection>.find({ $gt greater than

<FIELD>: {<OPERATOR>:<VALUE>} $lt lower than


},{}) $gte greater equal

$lte lower equal


Example: $ne not equal

db.products.find({price: {$lte: 50}},{})

22
Simple extraction
$in, $nin

Is in or is not in operator

db.<collection>.find({
<FIELD>: {<OPERATOR>:<VALUES_ARRAY>}
},{})

Example:

db.customers.find({City: {$in: ['Orlando', 'Baton Rouge']})

23
Simple extraction
$and, $or, $nor, $not

Logical operators $and Joins query clauses


with a logical AND
returns all documents
db.<collection>.find({ that match the
conditions of both
<OPERATOR>: [<EXPR1>, <EXPR2>,...] clauses.

},{}) $or Joins query clauses


with a logical OR
returns all documents
that match the
conditions of either
Example: clause.

db.products.find({$and: [ $nor Joins query clauses


with a logical NOR
{City: 'Orlando'},{Price: {$lt:50}} returns all documents
that fail to match both
]},{}) clauses.

$not
24
Simple extraction
$type

Selects the documents where the value of the field is an instance of the specified BSON type(s).

db.<collection>.find({
<FIELD>: {$type : <TYPESTRING>}
},{})

Example:

db.products.find({$and: [
{Price: {$type:'double'}}
]},{})

25
Simple extraction
$mod

Select documents where the value of a field divided by a divisor has the specified remainder

db.<collection>.find({
<FIELD>: {$mod : [<DIVISOR>,<REMAINDER>]}
},{})

Example:

db.products.find({$and: [
{Price: {$mod:[2,0]}}
]},{})

26
Simple extraction
$regex

Provides regular expression capabilities for pattern matching strings in queries. (PCRE 8.41)

db.<collection>.find({
<FIELD>: {$regex : /REGEX/, $options : <STRING_OPTIONS>}
},{}) i Case insensitivity to match upper and lower
cases.
db.<collection>.find({
<FIELD>: /REGEX/OPTIONS m For patterns that include anchors (i.e. ^ for
the start, $ for the end), match at the
},{}) beginning or end of each line for strings with
multiline values. Without this option, these
Example: anchors match at beginning or end of the
string.

db.products.find({productName: /cup/i}) x “Extended” capability to ignore all white


space characters in the $regex pattern
unless escaped or included in a character
class.
27
Simple extraction
$text

Performs a text search on the content of the fields indexed with a text index.

REQUIRED: db.<collection>.createIndex({<FIELD>:”text”})
db.<collection>.find({
$text: {
$search: <TEXT_TO_SEARCH>
$language: <LANGUAGE>
$caseSensitive: <BOOL>
$diacriticSensitive: <BOOL>
}
},{})

28
Simple extraction
$text

Performs a text search on the content of the fields indexed with a text index.

Example:

db.products.find({
$text: {
$search: ‘fry’,
$language: ‘english’,
$diacriticSensitive: true
}
})

29
Simple extraction
$where

Use the $where operator to pass either a string containing a JavaScript expression or a full JavaScript
function to the query system. (From 3.6, $expr should be prefered).

db.products.find({
$where: <JS_EXPRESSION>
})

Example:

db.embeddedOrders.find({$where: 'this.Orders.length == 1'})

30
Simple extraction
$expr (3.6)

Allows the use of aggregation expressions within the query language. $expr can build query expressions
that compare fields from the same document in a $match stage

db.products.find({
$expr: <EXPRESSION>
})

Example:

db.monthlyBudget.find({$expr:{$gt:["$spent","$budget"]}})

31
Simple extraction
.distinct()

Finds the distinct values for a specified field across a single collection or view and returns the results in an
array.

db.collection.distinct(
field,
query
)

32
Simple extraction
.count()

Returns the count of documents that would match a find() query for the collection or view

db.collection.count(
query,
options
)

33
Simple extraction
Exercices

1. Get the customers without company


2. List distinctly all the city where customers working for “Yodo” or “Kare” are
living
3. Count the number of products which are sold at a price greater than 50
4. Display the first and last name of the customer having a *****@patch.com
email adres.
5. Find purple product costing more than 20
6. Find the product having their color in their name

34
Aggregation

35
Aggregation
$expr (3.6)

db.<collection>.aggregate([
{<EXPRESSION1>},
{<EXPRESSION2>},
{<EXPRESSION3>},
...
])

Aggregation is performed step-by-step. The document resulting from an


expression is used as input for the following expression.

36
Aggregation Pipeline Stages
$match

Filters the documents to pass only the documents that match the specified
condition(s) to the next pipeline stage.

db.<collection>.aggregate([
{$match: <QUERY>},
...
])

Example:

db.products.aggregate([{$match: {price: {$gt:50}}}])

37
Aggregation Pipeline Stages
$group

Groups documents by some specified expression and outputs to the next stage a document for each
distinct grouping. The output documents contain an _id field which contains the distinct group by key.

$avg
db.<collection>.aggregate([
{$group: { $first

_id: <EXPRESSION>, $last


<FIELD1>: { <ACC1> : <EXPRESSION1> }, $max
<FIELD2>: { <ACC2> : <EXPRESSION2> },
$min
...
$push
}},
... $addToSet

]) $stdDevPop

$sdtDevSamp

$sum
38
Aggregation Pipeline Stages
$group

Groups documents by some specified expression and outputs to the next stage a document for each
distinct grouping. The output documents contain an _id field which contains the distinct group by key.

Example:

db.<collection>.aggregate([
{$group: {
_id: <EXPRESSION>,
<FIELD1>: { <ACC1> : <EXPRESSION1> },
<FIELD2>: { <ACC2> : <EXPRESSION2> },
...
}},
...
])
39
Aggregation Pipeline Stages
$unwind

Deconstructs an array field from the input documents to output a document for each element. Each output
document is the input document with the value of the array field replaced by the element.

db.<collection>.aggregate([
{
$unwind:
{
path: <FIELD_PATH>,
includeArrayIndex: <STRING>,
preserveNullAndEmptyArrays: <BOOL>
}
}
,
...
])

To specify a field path, prefix the field name with a dollar sign $ and enclose in quotes.

40
Aggregation Pipeline Stages
$lookup

Performs a left outer join to an unsharded collection in the same database to filter in documents from the
“joined” collection for processing.

db.<collection>.aggregate([
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
},
...
])

41
Aggregation Pipeline Stages
$sort

Sorts all input documents and returns them to the pipeline in sorted order.

db.<collection>.aggregate([
{
$sort: {
<field1>: <sort order>,
<field2>: <sort order>,
...
}
},
...
])

42
Aggregation Pipeline Stages
$limit

Limits the number of documents passed to the next stage in the pipeline.

db.<collection>.aggregate([
{
$limit: <POSITIVE_NUMBER>
},
...
])

43
Aggregation Pipeline Stages
$out

Takes the documents returned by the aggregation pipeline and writes them to a specified collection. The
$out operator must be the last stage in the pipeline.

db.<collection>.aggregate([
{
$out: {<OUTPUT_COLLECTION_STRING>}
},
...
])

44
Aggregation Pipeline Stages
Exercices

1. Create a new collection which is denormalized where:


a. Customers have Orders that are composed of OrderLine
2. Count the number of orders for each customer working at Yodo
a. Display only the first name, the last name and the order number
3. Display the sales amount and the quantity ordered for each product ordered
by Yodo employees
4. What is the product name of the product the most bought by women
5. What is the average order value for the people living in Atlanta
6. Determine the sales amount both by color and by city
7. For each customer living in Orlando, determines how many orders included
leaf related products.
8. Compare the sales amount of cherry and tomato related products
45
MapReduce

46
MapReduce

47
MapReduce
Exercice

1 - Find the average number of product (orderlines) each customer of Yodo


ordered per order

+ Exercices 3 and 6 from Aggregation Pipeline Stages exercices

48
Chapter 3

49
Data Modelling
Flexible schema

Unlike SQL databases, MongoDB collections do not require the documents to


have the same schema.

● Fields, data type… can vary from one document to another.

In practice, the documents of a particular collection should share a similar schema

● Those similarities can be ensured by rules. This is called schema validation.

50
Data Modelling
Document structure

● Relational databases are designed on the basis of the real-world relations


● Document-oriented database are designed on the basis of how that
application will use those data

MongoDB allows both the use of embedded data (denormalisation) or references


(normalisation).

51
Data Modelling
Embedded data

Embedded data is a kind a denormalisation.

Purpose: allowing the application to retrieve and manipulate related-data in a


single operation.

Use:

● One-to-one “contains” relationship


between data
Ex: ID card for a person...
● When a child document is always
viewed in the context of the parent
Ex: A customer list of delivery address 52
Data Modelling
References

References are used to normalise.

Purpose: provides more flexibility and avoid data duplication.

Use:

● when embedding would result in


duplication of data
● to represent more complex
many-to-many relationships
● to model large hierarchical data sets

53
Data Modeling
Rules of Thumb for MongoDB Schema Design

Modeling One-to-Few : Embedded documents


Example: Person ↔ Address
Addresses are embedded in the person

Modeling One-to-Many : References (Many ~ hundreds)


The parent is referencing the children
Example: Product ↔ Parts
Product having an array with references to all parts

Modeling One-to-Squillions : References (Many ~squillions)


Each child has a reference to its parent (otherwise the size of the document will explode)
Example: Host ↔ Log
Each Log document has a reference to its host

54
Data Modeling
JSON Schema

Add validation at collection creation:

db.createCollection(<collection_name>, {
validator: {
$jsonSchema: {
}}})

Add validation when the collection exists:

db.runCommand( { collMod: <collection_name>, validator: {


$jsonSchema: {

}}})
55
Documentation: https://docs.mongodb.com/v3.6/core/schema-validation/index.html#json-schema
Library System
Exercice

Assume there is a library system with the following properties.

The library contains one or several copies of the same book. Every copy of a book has a copy number
and is located at a specific location in a shelf. A copy is identified by the copy number and the ISBN
number of the book. Every book has a unique ISBN, a publication year, a title, an author, and a number of
pages. Books are published by publishers. A publisher has a name as well as a location.

Within the library system, books are assigned to one or several categories. A category can be a
subcategory of exactly one other category. A category has a name and no further properties. Each reader
needs to provide his/her family name, his/her first name, his/her city, and his/her date of birth to register at
the library. Each reader gets a unique reader number. Readers borrow copies of books. Upon borrowing
the return date is stored.

How would you model this situation in a document-oriented database ?

Source: ETH Zurich - Data Modelling and Databases Lectures – Spring 2018 56
Data Modeling
Exercice 2

Transform this ER Model into document schemas

57
Data Modeling
Exercice 3

Implement the following document schema:

Customers are always known by their first name and last name that are strings.
We can know their address which consists of a street, a zip code (int varying from
1000 to 9999) and a city.

There are no other possible field.

58

You might also like