0% found this document useful (0 votes)

12 views

BDTT Lab 2023 24 Week9

Uploaded by

salmantalib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

BDTT Lab 2023 24 Week9

Uploaded by

salmantalib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

University of Salford, MSc Data Science

Module: Big Data Tools and Techniques

Date: Trimester 2, 2023-2024
Session: Workshop Week 9
Topic: MongoDB
Tools: Jupyter Notebook and MongoDB Atlas
Instructors: Dr Kaveh Kiani, Dr Taha Mansouri, and Nathan Topping.
Objectives:
After completing this workshop, you will be able to:
➢ Prepare a MongoDB Atlas account
➢ Implement queries and aggregations on Atlas
➢ Connect to and interact with Atlas through Python
➢ Design a pipeline to process data

1|Page
Table of Contents
Part 1: Fire up the Atlas workspace ........................................................................................... 3

Part-2: MongoDB ....................................................................................................................... 4

Part-3: Working with Atlas .......................................................................................................... 6

Part-4: Aggregation Framework ................................................................................................. 8

Part-5: Working with MongoDB using Python .......................................................................... 14

References ............................................................................................................................... 26

2|Page
Part 1: Fire up the Atlas workspace

1- Login to your Atlas free version through the link below:

https://account.mongodb.com/account/login?signedOut=true

2- Make sure your current IP is listed as the trusted IP. If not, go through Network Access and add
your local IP address.

3- Click on the Database tab.

3|Page
4- Click on Browse Collections. Now you can see the list of databases on the left, and the content
of a selected collection on the right.

Part-2: MongoDB
MongoDB is a cross-platform, document-oriented NoSQL database. It is designed to store and manage
unstructured or semi-structured data. Unlike traditional relational databases, MongoDB uses a flexible
document model, which allows developers to store and query data in a more intuitive and natural way.
Data in MongoDB is stored in documents, which are similar to JSON objects and can contain any number
of fields, arrays, and sub-documents. This makes it easy to store complex data structures and to modify
them as requirements change.
MongoDB stores data records as documents (specifically BSON documents) which are gathered in
collections. You can create secondary indexes on these collections, join them together, and use the
powerful aggregation framework embedded in MongoDB. A database stores one or more collections of
documents.
In this workshop we mostly work with reading data from MongoDB. To select all documents in the
collection, pass an empty document as the query filter parameter to the find method. The query filter
parameter determines the select criteria:

4|Page
This operation uses a filter predicate of {}, which corresponds to the following SQL statement:

To specify equality conditions, use <field>:<value> expressions in the query filter document:

The following example selects from the inventory collection all documents where the status equals
"D":

This operation uses a filter predicate of { status: "D" }, which corresponds to the following SQL
statement:

5|Page
Part-3: Working with Atlas
1- On Atlas, expand sample_mflix database

2- Select movies collection

3- Filter the movie cast by “John Ott”

4- Search for those movies that have won more than one award.

Note: To access a field inside of a nested document, you can use the dot operator.

6|Page
Note: To use conditional operators such as greater than, less than, greater than and equals and so
on, you can use an operator along with the intended value as a dictionary. For example, to check
whether the value of a field is less than or equal to 5, you can use {“field_Name”: {$lte:5}}

5- Find those movies that have won more than one award and that have the USA as their country.

6- You can also specify a projection list to just show the information needed. Find those movies
whose genre is “Short” and just project the title:

Challenge-1

Go to sample_ restaurants database, and in the restaurants collection, find those restaurants that have
grades’ score greater than or equal to 10, and which serve American cuisine.

7|Page
Part-4: Aggregation Framework

The Aggregation Framework in MongoDB is a powerful data processing tool that allows you to perform
complex data analysis on collections of documents in a database. It provides a set of operators that can
be used to perform data filtering, grouping, sorting, and data transformations. With the Aggregation
Framework, you can combine data from multiple collections, perform calculations on data, and analyse
data in real-time. This makes it a very powerful tool for business intelligence and data analysis
applications.

Some of the key features of the Aggregation Framework in MongoDB include:

• Pipelined data processing: The aggregation framework allows you to combine multiple operators
into a single pipeline, where the output of one operator is the input to the next operator. This
makes it easy to perform complex data transformations and analysis.
• Extensive operator set: The Aggregation Framework provides a wide range of operators for data
filtering, grouping, sorting, and transformations. This includes operators for conditional logic,
arithmetic operations, string manipulation, date manipulation, and more.
• Integration with MongoDB: The Aggregation Framework is tightly integrated with MongoDB,
which means that it can take advantage of MongoDB's scalability, replication, and sharding
features.

The MongoDB Aggregation Framework includes several stages that can be used to perform various data
processing operations. The stages are applied in a pipeline, where the output of one stage becomes the
input of the next stage. The stages in the Aggregation Framework are:

• $match: This stage is used to filter documents based on certain criteria. It works like a query filter
and can use various comparison operators to filter documents.
• $project: This stage is used to select certain fields from documents and project them in the output.
It can also be used to create new fields or transform existing fields.
• $group: This stage is used to group documents by a specified field or fields. It can also perform
various aggregate functions such as sum, average, and count on the grouped data.
• $sort: This stage is used to sort the output documents based on one or more fields. It can sort in
ascending or descending order.
• $limit: This stage is used to limit the number of documents returned in the output.
• $skip: This stage is used to skip a specified number of documents in the input before processing.

8|Page
• $unwind: This stage is used to break up an array field into separate documents, each containing
a single value from the array.
• $lookup: This stage is used to perform a left outer join between two collections.
• $facet: This stage is used to perform multiple aggregation operations on the same set of input
documents. It returns multiple sets of documents, each representing the result of a separate
aggregation operation.

These stages can be combined in different ways to perform a wide range of data processing operations
on MongoDB collections.

1- Click on movies collection and select the aggregation tab. Then click on “create new”.

2- Another window will pop up, select confirm.

9|Page
3- From the stages drop down list select $match and specify directors as “Sam Raimi”.

4- Add another stage and select its type as $project. Then filter out _id and select title and imdb
rating to show.

5- Add another stage and select $group. The objective is to calculate the average imdb rating for
those movies that are directed by “Sam Raimi”.

10 | P a g e
These are the stages of your pipeline.

6- By clicking on “EXPORT TO LANGUAGE”, you can export the pipeline into other programming
languages.

You can join some collections through their shared keys. To this end, you need to use $lookup stage.

The activity is to count the number of comments for each movie. There are two different collections as
movies and comments. You need to join them together by movie_id in the comments collection and _id
in the movies collection. We want to take 1980s movies into account.

7- Select movies collection and the aggregation tab.

11 | P a g e
8- In the current stage select $match and filter year between 1980 and 1990.

12 | P a g e
9- Add a new stage and select $lookup as its type.

Note: from shows the collection that you want to join to. let links the _id of movies collection to a
temporary variable as id and make it accessible inside of the pipeline. In pipeline you can define any
aggregation stages, however, you have to join the primary and foreign keys together. And finally, as
defines the name of the desired field.

10- Add the $count stage inside of the $lookup pipeline. As it is a pipeline you don’t need to add any
other stage.

13 | P a g e
Challenge-2

Calculate the average number of tomatoes viewer reviews of those movies that have a production year
after 1920, English as their language, tomatoes viewer ratings which are greater than 3.5 and they have
mflix comments.

Part-5: Working with MongoDB using Python

In this part we use Jupiter Notebook. To this end you need to install Anaconda on your system. Anaconda
is the world’s most popular data science platform, which helps users manage a collection of over 7,500+
open-source packages available to them. Anaconda Distribution equips individuals to easily search,
install, and run thousands of Python/R packages and access a vast library of community content and
support. It also makes creating, saving and loading programs very straightforward.
Installing Anaconda
Before installing Anaconda distribution, check the system requirements listed below:
System requirements:
• Operating system: Windows 8 or newer, 64-bit macOS 10.13+, or Linux, including Ubuntu, RedHat,
CentOS 7+, and others.
(If your operating system is older than what is currently supported, you can find older versions of the
Anaconda installers that might work for you on the Anaconda’s archive page.)
• Minimum 5 GB disk space to download and install.
Downloading and installing on Windows:
You can download Anaconda installers from
https://www.anaconda.com/products/distribution
If you want to download Windows, Python 3.9, 64-Bit Graphical Installer, right click on the download
button and click save link as. Save the Anaconda3-2022.05- Windows-x86_64.exe on your local computer
(e.g., Downloads folder).

NB: You only need to carry out this step if you are using your own device and have not previously
installed Anaconda. If you have previously installed Anaconda or are using a university device, please
skip to page 19, step 3.

14 | P a g e
If you want to install Anaconda on other operating systems, click on “Get Additional
Installer”. This will take you to the bottom of the page where you can find other versions of the
Anaconda installers. Right click on the option that works for you and click on “save link as” to
download installer.

15 | P a g e
Go to your Downloads folder and double-click the installer to launch (If you encounter issues
during installation, temporarily disable your anti-virus software during install, then re-enable it
after the installation).
Click on the next button.

Read the licensing terms and click on the “I Agree”.

In the next step, choose “Just me” option and click on the Next button. (Only select an install for All Users
if you need to install for all users’ accounts on the computer. This requires Windows Administrator
privileges).

Select a destination folder to install Anaconda. You can use Browse button to change the location
(The directory path should not contain spaces or unicode characters). After choosing install
location click on the next button.

16 | P a g e
In the next step, check “Register Anaconda3 as my default Python 3.9” and click on the Install button.

Please wait while Anaconda3 is being installed. It will take few minutes. Then the Next button will be
enabled.

In the next Dialog box, click on the Next button. Finally, you should see the “Completing Anaconda3
Setup” dialog box. Click the Finish button to complete the installation (If you wish to read more about
Anaconda.org and how to get started with Anaconda, check the boxes “Anaconda Distribution Tutorial”
and “Getting started with Anaconda”.)

17 | P a g e
Now open Anaconda and carry on with the following tasks.
1- Open Anaconda and launch Jupyter Notebook

2- In the Jupyter Notebook create a new notebook.

18 | P a g e
3- pymongo is the required library to work with MongoDB in Python. Install it on your Jupyter
notebook. You need to do this just once.

Note: MongoClient object is a part of pymongo. You need to pass a url containing most of the
information required to access to MongoDB Atlas to instantiate from this Object. The url is your
connection string. So first you should collect it from Atlas.
4- Go to Atlas, select your database and click on connect button.

5- Choose “connect your application” as your connection method.

6- Select Python and version later than 3.6. Then copy the generated connection string. You
have to replace your own password and check “include full driver code example.

19 | P a g e
7- Go back to Jupyter Notebook, import pymongo, define your url based on step 6 and
instantiate a client.

8- We can list the databases connected to this client object through the following command:

9- Select sample_mflix database.

10- Now list its collections.

11- Select movies collection and define a new object upon that collection.

20 | P a g e
12- Count the number of documents in this collection.

Note: there are two methods to read from a collection. find_one() that returns the first document
satisfying the defined condition(s) in a natural order.
13- Find a movie.

14- Find a movie casted by Salma Hayek. So, you need to pass a dictionary to find_one method
containing a field name and the associated condition.

21 | P a g e
Note: Most of the time, find_one is not what we want to use, as we typically want to find all
documents satisfying a condition. On this occasion we use find() method. Find() doesn’t return a
response – instead it returns a cursor object. We can store the cursor in a variable and dump it from a
JSON format. Now we can access the documents inside of the cursor. Dump is in the JSON library and
gives us an output in a nice format.
15- Find all movies with Salma Hayek in the cast and print them.

16- Now what happens once you don’t need to have all the fields? In this situation you need to specify
the projection list. The second dictionary is a projection list, containing the field name and either
a 1 (meaning that you want to show it) or a 0 (meaning you want to omit it). Find movies casted
by Salma Hayek and just print their title.

22 | P a g e
17- You can also limit the number of documents returned by pymonogo through limit(). Just show
two documents regarding the above conditions.

18- You can design the above command through the aggregation framework.

23 | P a g e
19- The next operator is sorting. Sort() method takes two parameter including key and the sorting
order “ASCENDING” or “DESCENDING”. Sort movies casted by Salma Hayek based on their
production year in the ascending order.

Note: Aggregation is a pipeline. Pipelines are composed of stages, which are broad units of work.
Within stages, expressions are used to specify individual units of works. Expressions are functions. Each
stage is like an assembly station and does a specific task. For example, $match checks a specific
condition and select documents that fulfil that condition, $projection filters out unnecessary fields, and
$group collects them together.
Let’s look at a function called add in Python and Aggregation framework.
Python

Aggregation Framework
{“$add” :[“$a”, “$b”]}; all stages in the aggregation framework have $ before them.

24 | P a g e
20- Count the number of movies directed by Sam Raimi

21- Run the exported pipeline of the first aggregation in Part-4 in Python.

Challenge 3
Implement the pipeline designed for challenge-2 in Python.

25 | P a g e
References:
• MongoDB University. MongoDB offers a range of online courses and certifications that
cover various aspects of MongoDB development, administration, and deployment.
These courses are self-paced and are designed to help you learn at your own pace
(https://learn.mongodb.com/).
• https://www.mongodb.com/docs/manual/tutorial/query-documents/

26 | P a g e

MongoDB (BDSL456B) Manual
No ratings yet
MongoDB (BDSL456B) Manual
31 pages
2023 Oct CSC510 Test 1 Answer Scheme
No ratings yet
2023 Oct CSC510 Test 1 Answer Scheme
5 pages
All Exceptions in IOC
100% (3)
All Exceptions in IOC
30 pages
Pump & Primer
100% (12)
Pump & Primer
16 pages
Supermarket Management System Project Report
47% (17)
Supermarket Management System Project Report
10 pages
NoSQL 14 MONGO 2
No ratings yet
NoSQL 14 MONGO 2
37 pages
UNIT 2 - BDA NOTES
No ratings yet
UNIT 2 - BDA NOTES
37 pages
M10A1
No ratings yet
M10A1
3 pages
Dod Unit4
No ratings yet
Dod Unit4
18 pages
Aggregation Tutorial
No ratings yet
Aggregation Tutorial
4 pages
MongoDB_Indexing_Aggregation
No ratings yet
MongoDB_Indexing_Aggregation
5 pages
Aggregation Lab Slides
No ratings yet
Aggregation Lab Slides
34 pages
05 MongoDB Aggregation Pipeline With Examples
No ratings yet
05 MongoDB Aggregation Pipeline With Examples
41 pages
Ex 9,10,11
No ratings yet
Ex 9,10,11
8 pages
Updated Mongodb Lab Manual IV sem
No ratings yet
Updated Mongodb Lab Manual IV sem
48 pages
Big Data Practical 3
No ratings yet
Big Data Practical 3
4 pages
An Introduction To Big Data - NoSQL - Data Science
No ratings yet
An Introduction To Big Data - NoSQL - Data Science
14 pages
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
ADBMS-UNIT-3
No ratings yet
ADBMS-UNIT-3
31 pages
4- MongoDB aggregation framework (1)
No ratings yet
4- MongoDB aggregation framework (1)
38 pages
No SQL
No ratings yet
No SQL
56 pages
MongoDB - Cours 4
No ratings yet
MongoDB - Cours 4
58 pages
Mongo DB (1)
No ratings yet
Mongo DB (1)
30 pages
mongodb theory
No ratings yet
mongodb theory
75 pages
NoSQL and MongoDB
No ratings yet
NoSQL and MongoDB
24 pages
ITP249 - Lecture - 13 - v2-2
No ratings yet
ITP249 - Lecture - 13 - v2-2
19 pages
MongoDB Data Modeling - Sample Chapter
No ratings yet
MongoDB Data Modeling - Sample Chapter
40 pages
To Create A MongoDB Database With Sample Data or Documents
No ratings yet
To Create A MongoDB Database With Sample Data or Documents
13 pages
6417 Sudesh NGT
No ratings yet
6417 Sudesh NGT
172 pages
Mongodb
No ratings yet
Mongodb
9 pages
The Studio 3T Field Guide To MongoDB Aggregation
No ratings yet
The Studio 3T Field Guide To MongoDB Aggregation
148 pages
T4 Aggregation PDF
No ratings yet
T4 Aggregation PDF
18 pages
Mongo DB
No ratings yet
Mongo DB
26 pages
Full Final
No ratings yet
Full Final
64 pages
Aggregation in MongoDB
No ratings yet
Aggregation in MongoDB
3 pages
ADB - Lab Sheet 4
No ratings yet
ADB - Lab Sheet 4
11 pages
Working With Mongo DB PDF
No ratings yet
Working With Mongo DB PDF
12 pages
- Constructing Queries on Databases, Collections, ...
No ratings yet
- Constructing Queries on Databases, Collections, ...
2 pages
Mongo DB
No ratings yet
Mongo DB
3 pages
jagadish MongoDB_Practical_File[1]
No ratings yet
jagadish MongoDB_Practical_File[1]
33 pages
G8-HBase 2
No ratings yet
G8-HBase 2
100 pages
Remaining NGD New
No ratings yet
Remaining NGD New
21 pages
BDA - MongoDB
No ratings yet
BDA - MongoDB
12 pages
MongoDB Aggregation PDF
No ratings yet
MongoDB Aggregation PDF
4 pages
Week 4 Block 2 - ITDSA2 1
No ratings yet
Week 4 Block 2 - ITDSA2 1
45 pages
Chatgpt
No ratings yet
Chatgpt
7 pages
MongoDB ReferenceCards
No ratings yet
MongoDB ReferenceCards
28 pages
DBS UNIT V Notes
No ratings yet
DBS UNIT V Notes
23 pages
Meanstackexperiment 11and 12-WPS Office
No ratings yet
Meanstackexperiment 11and 12-WPS Office
20 pages
Mongodb Notes Basic To Advanced 1692833294
No ratings yet
Mongodb Notes Basic To Advanced 1692833294
10 pages
M 5
No ratings yet
M 5
4 pages
MongoDB - Course Curriculum
No ratings yet
MongoDB - Course Curriculum
5 pages
Lab Sheet 06 - Introduction To NoSQL Databases Using MongoDB
No ratings yet
Lab Sheet 06 - Introduction To NoSQL Databases Using MongoDB
5 pages
BDA_Experiment2
No ratings yet
BDA_Experiment2
7 pages
Module2 Mycontent
No ratings yet
Module2 Mycontent
18 pages
Wa0004.
No ratings yet
Wa0004.
8 pages
MEAN Ebook - CodeWithRandom
No ratings yet
MEAN Ebook - CodeWithRandom
524 pages
04 - Aggregation Operations
No ratings yet
04 - Aggregation Operations
68 pages
Big Data
No ratings yet
Big Data
11 pages
Lecture 3 FULL Explanation
No ratings yet
Lecture 3 FULL Explanation
32 pages
Notes-Lecture 14 - MongoDB with NodeJS - II-3447
No ratings yet
Notes-Lecture 14 - MongoDB with NodeJS - II-3447
13 pages
02 - Document-Based and MongoDB
No ratings yet
02 - Document-Based and MongoDB
133 pages
NOSQL Lab Book
No ratings yet
NOSQL Lab Book
33 pages
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
Math 11 Gen Math Q2 Week 7
100% (1)
Math 11 Gen Math Q2 Week 7
18 pages
AP6265 Series (Preliminary) : Features General Description
No ratings yet
AP6265 Series (Preliminary) : Features General Description
16 pages
Lihua Zhan, Jianguo Lin, Minghui Huang 63: Acsun@saturn - Yzu.edu - TW
No ratings yet
Lihua Zhan, Jianguo Lin, Minghui Huang 63: Acsun@saturn - Yzu.edu - TW
21 pages
Maxwell - S Eq Study Material
No ratings yet
Maxwell - S Eq Study Material
21 pages
Twincat 3: Josef Papenfort Twincat Product Management
No ratings yet
Twincat 3: Josef Papenfort Twincat Product Management
49 pages
Carroll - Ostlie 02.15
No ratings yet
Carroll - Ostlie 02.15
5 pages
Download Study Resources for Elementary Statistics A Step by Step Approach 8th Edition Bluman Solutions Manual
100% (35)
Download Study Resources for Elementary Statistics A Step by Step Approach 8th Edition Bluman Solutions Manual
57 pages
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
No ratings yet
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
348 pages
Introduction To SQL (W3school)
100% (1)
Introduction To SQL (W3school)
105 pages
Diagnostic Test in Mathematics 5
100% (4)
Diagnostic Test in Mathematics 5
6 pages
Get Mathematical puzzles First Edition. Edition Peter Winkler free all chapters
100% (3)
Get Mathematical puzzles First Edition. Edition Peter Winkler free all chapters
55 pages
m20 cube
No ratings yet
m20 cube
1 page
oops-with-java-bcs306a-model-paper-solution
No ratings yet
oops-with-java-bcs306a-model-paper-solution
41 pages
Liberation of Inocula
No ratings yet
Liberation of Inocula
2 pages
Resumen CCNA Data Center 200-155
No ratings yet
Resumen CCNA Data Center 200-155
15 pages
Ansys Modeling and Meshing Guide
100% (5)
Ansys Modeling and Meshing Guide
276 pages
Notes On Nuclear Physics
No ratings yet
Notes On Nuclear Physics
12 pages
Models FB 0751 - FB 6001 Crest Commercial Condensing Boiler: Submittal Sheet
No ratings yet
Models FB 0751 - FB 6001 Crest Commercial Condensing Boiler: Submittal Sheet
2 pages
Astm A226 PDF
No ratings yet
Astm A226 PDF
2 pages
Excel Induction Data File
No ratings yet
Excel Induction Data File
34 pages
Intelligent Energy Management in Hybrid Electric Vehicles
No ratings yet
Intelligent Energy Management in Hybrid Electric Vehicles
32 pages
24Vdc 180amper Rectifier System
No ratings yet
24Vdc 180amper Rectifier System
4 pages
05 ws3
No ratings yet
05 ws3
6 pages
Visualizer 1
No ratings yet
Visualizer 1
67 pages
Chemistry Study Guide Chapter 15,16,18 2022 (Omit 18.3)
No ratings yet
Chemistry Study Guide Chapter 15,16,18 2022 (Omit 18.3)
3 pages
OC Expt 10 PDF
No ratings yet
OC Expt 10 PDF
5 pages

BDTT Lab 2023 24 Week9

Uploaded by

BDTT Lab 2023 24 Week9

Uploaded by

University of Salford, MSc Data Science

Module: Big Data Tools and Techniques

Part-2: MongoDB ....................................................................................................................... 4

Part-3: Working with Atlas .......................................................................................................... 6

Part-4: Aggregation Framework ................................................................................................. 8

Part-5: Working with MongoDB using Python .......................................................................... 14

1- Login to your Atlas free version through the link below:

3- Click on the Database tab.

2- Select movies collection

3- Filter the movie cast by “John Ott”

Some of the key features of the Aggregation Framework in MongoDB include:

2- Another window will pop up, select confirm.

7- Select movies collection and the aggregation tab.

Part-5: Working with MongoDB using Python

Read the licensing terms and click on the “I Agree”.

2- In the Jupyter Notebook create a new notebook.

5- Choose “connect your application” as your connection method.

9- Select sample_mflix database.

10- Now list its collections.

You might also like