BDA Unit-4 (1)
BDA Unit-4 (1)
MONGODB
MongoDB is an open source DBMS. MongoDB programs create and manage databases. MongoDB
manages the collection and document data store. MongoDB functions do querying and accessing the
required information. The functions include viewing, querying, changing, visualizing and running the
transactions. Changing includes updating, inserting, appending or deleting.
MongoDB is a non-relational, NoSQL, Distributed, Open source, Document based, Cross-
platform,Scalable, Flexible data model, Indexed, Multi-master and fault tolerant.
MongoDB is a document-oriented NoSQL database used for high volume data storage. Instead of
using tables and rows as in the traditional relational databases, MongoDB makes use of collections and
documents. Documents consist of key-value pairs which are the basic unit of data in MongoDB.
Collections contain sets of documents and function which is the equivalent of relational database
tables.MongoDB database is developed and managed by MongoDB. Inc. under SSPL(Server Side
Public License) and initially released in February 2009.MongoDB is a database which came into
light around the mid-2000s. Document data store in JSON-like documents. The data store uses the
dynamic schemas.
It also provides official driver support for all the popular languages likeC, C++, C#, and .Net, Go,
Java, Node.js, Perl, PHP, Python, Motor, Ruby, Scala, Swift, Mongoid. So, that we can create an
application using any of these languages. Nowadays there are so many companies that use
MongoDB like Facebook, Nokia, eBay, Adobe, Google, etc. to store their large amount of data.
The typical MongoDB applications are content management and delivery systems, mobile
applications, user data management, gaming, e-commerce, analytics, archiving and logging.
Working of MongoDB:
MongoDB is a database server and the data is stored in these databases. In other words, MongoDB
environment gives us a server that we can start and then create multiple databases on it using
MongoDB. Because of its NoSQL database, the data is stored in the collections and documents.
Hence the database, collection, and documents are related to each other as shown below:
1
Storage FormatBSON (Binary JSON):MongoDB uses BSON, an extended version of JSON,
which supports additional data types like binary data, dates, and nested arrays, improving query
efficiency and storage performance.
Data Storage and Querying:Data is stored in collections of documents, making it highly
flexible. MongoDB supports powerful indexing and real-time queries, ensuring faster
read/write operations. Developers can store, retrieve, and update data using MongoDB Query
Language (MQL), which is similar to JSON queries.
Schema Flexibility:MongoDBallows schema-less data storage (Collections can
store documents with different structures). Fields are not predefined, allowing dynamic
updates. Suitable for applications where data structures frequently change.
Data Relationships in MongoDB: Unlike SQL databases that rely on foreign keys and JOIN
operations, MongoDB supports:
Embedded Documents – Store related data in a single document (reducing the need for
joins).
Reference Documents – Use unique identifiers to establish relationships between
documents.
Scalability - Horizontal vsVertical Scaling: MongoDB is designed for horizontal scaling,
meaning it can:
Distribute data across multiple servers using sharding.
Support automatic load balancing and replication for high availability.
Handle big data and distributed workloads efficiently.
MongoDB Example1: we have a database named PRIW. Inside this database, we have two
collections and, in these collections,we have two documents. And in these documents, we store our
data in the form of fields. As shown in the below image:
2
MongoDB Example2: The below example shows how a document can be modeled in MongoDB:
1. The id field is added by MongoDB to uniquely identify the document in the collection.
2. What you can note is that the Order Data (OrderID, Product, and Quantity) which in RDBMS
will normally be stored in a separate table, while in MongoDB it is actually stored as an
embedded document in the collection itself. This is one of the key differences in how data is
modeled in MongoDB.
Key Components of MongoDB Architecture: Below are a few of the common terms used in
MongoDB:
1. _id: This is a field required in every MongoDB document. The _id field represents a unique
value in the MongoDB document. The _id field is like the document's primary key. If you
create a new document without an _id field, MongoDB will automatically create the field. So,
for example, if we see the example of the above customer table, Mongo DB will add a 24-digit
unique identifier to each document in the collection.
MongoDB Features:
3
Following are features of MongoDB:
1. MongoDB is a document data store in which one collection holds different documents. Data
store in the form of JSON-style documents. Number of fields, content and size of the document
can differ from one document to another.
2. MongoDB data store is a physical container for collections. Each DB gets its own set of files on
the file system. A number of DBs can run on a single MongoDB server. DB is default DB in
MongoDB that stores within a data folder. The database server of MongoDB is mongod and the
client is mongo.
3. The document structure is more in line with how developers construct their classes and objects
in their respective programming languages. Developers will often say that their classes are not
rows and columns but have a clear structure with key-value pairs.
4. The rows (or documents as called in MongoDB) doesn't need to have a schema defined
beforehand. Instead, the fields can be created on the fly.
5. The data model available within MongoDB allows you to represent hierarchical relationships,
to store arrays, and other more complex structures more easily.
6. Scalability: The MongoDB environments are very scalable. Companies across the world have
defined clusters with some of them running 100+ nodes with around millions of documents
within the database.
7. Querying, indexing, and real time aggregation allows accessing and analyzing the data
efficiently.
8. Storing of data is flexible, and data store consists of JSON-like documents. This implies that
the fields can vary from document to document and data structure can be changed over time;
JSON has a standard structure, and scalable way of describing hierarchical data
9. No complex Joins.
10. Distributed DB makes availability high, and provides horizontal scalability.
11. Atomic operations on a single document can be performed even though support of multi-
document transactions is not present. The operations are alternate to ACID transaction
requirement of a relational DB.
12. Fast-in-place updates: The DB does not have to allocate new memory location and write a full
new copy of the object in case of data updates. This results into high performance for frequent
update use cases. For example, incrementing a counter operation does not fetch the document
from the server. Here, the increment operation can simply be set.
13. Conversion/mapping of application objects to data store objects not needed.
MongoDB Datatypes:
MongoDB supports many datatypes. Some of them are
Datatype Description
Integer This type is used to store a numerical value. Integer can be 32 bit or 64 bitdepending
upon your server.
Double This type is used to store floating point values.
Boolean This type is used to store a Boolean (true/ false) value.
String This is the most commonly used datatype to store the data. String in MongoDBmust
be UTF-8 valid.
Arrays This type is used to store arrays or list or multiple values into one key.
Min/ Max This type is used to compare a value against the lowest and highestBSON elements.
keys
4
Timestamp This can be handy for recording when a document has beenmodified or added.
Object This datatype is used for embedded documents.
Null This type is used to store a Null value.
Symbol This datatype is used identically to a string; however, it's generally reserved
forlanguages that use a specific symbol type.
Date This datatype is used to store the current date or time in UNIX time format. Youcan
specify your own date time by creating object of Date and passing day, month, year
into it.
Object ID This datatype is used to store the document’s ID.
MongDB Query Language and Database Commands:
MongoDB commands for querying the Database is given below:
Command Functionality
mongo Starts MongoDB; (*mongo is MongoDB client). The default
database in MongoDB is test.
db.help () Runs help. This displays the list of all the commands.
db.stats () Gets statistics about MongoDB server.
use <database name> Creates database
db Outputs the names of existing database, if created earlier
show dbs Gets list of all the databases
db.dropDatabase () Drops a database
db.<database name>.insert () Creates a collection using insert ()
db.<database name>.find () Views all documents in a collection
db.<database name>.update () Updates a document
db.<databasename>.remove () Deletes a document
Following explains the sample usages of the commands:
To Create database Command use - use command creates a database; For example, Command
use lego creates a database named lego. (A sample database is created to demonstrate
subsequent queries. The Lego is an international toy brand). Default database in MongoDB is
test.
To see the existence of database Commanddb - db command shows that lego database is
created.
To get list of all the databases Command show dbs- This command shows the names of all
the databases.
To drop database Commanddb.dropDatabase () This command drops a database. Run use
lego command before the db.dropDatabase() command to drop lego Database. If no database is
selected, the default database test will be dropped.
To create a collection Command insert() - To create a collection, the easiest way is to insert a
record (a document consisting of keys (Field names) and Values) into a collection. A new
collection will be created, if the collection does not exist. The following statements
demonstrate the creation of a collection with three fields (ProductCategory, ProductId and
ProductName) in the lego:
db.lego.insert
5
(
{
"ProductCategory": "Airplane", "ProductName": "Lost Temple"
"ProductId": 10725,
}
)
To add array in a collectionCommandinsert ()- Insert command can also be used to insert
multiple documents into a collection at one time.
db.lego.insert
(
[
{
"ProductCategory": "Airplane",
"ProductId": 10725,
"ProductName": "Lost Temple"
}
{
"ProductCategory": "Airplane",
"ProductId": 31047,
"ProductName": "Propeller Plane"
}
{
"ProductCategory": "Airplane",
"ProductId": 31049,
"ProductName": "Twin Spin Helicopter"
}
]
)
To view all documents in a collection Command db.<database name>.find()- Find
command is equivalent to select query of RDBMS. Thus, "Select from lego" can be written as
db.lego.find() in MongoDB. MongoDB created unique objecteld ("_id") on its own. This is the
primary key of the collection. Command db. <database name>.find().pretty() gives a prettier
look.
To update a document Command db. <database name>.update () - Update command is used
to change the field value. By default, multi attribute is false. If {multi: true} is not written then
it will update only the first document.
To delete a document Command db. <database name>.remove() - Remove command is used
to delete the document. The query db. <database name>.remove(("ProdctID":10725)) removes
the document whose productId is 10725.
Uses of MongoDB:
MongoDB is a popular NoSQL database known for its flexibility, scalability, and performance. It is
widely used in various applications across different industries. Here are some common uses of
MongoDB:
1. Content Management Systems (CMS):MongoDB’s flexible schema and powerful query
capabilities make it an ideal choice for content management systems. It can efficiently handle
diverse content types and structures, enabling dynamic and scalable content management
solutions.
6
2. E-commerce Platforms:E-commerce platforms benefit from MongoDB’s ability to store and
retrieve large amounts of product data quickly. Its flexible schema supports dynamic product
catalogs, user profiles, shopping carts, and transaction histories.
3. Real-Time Analytics:MongoDB is well-suited for real-time analytics applications due to its
high-performance data ingestion and querying capabilities. It can handle large volumes of data
in real-time, making it ideal for monitoring, fraud detection, and personalized
recommendations.
4. Internet of Things (IoT):IoT applications generate vast amounts of data from sensors and
devices. MongoDB’s scalability and flexible data model allow it to efficiently store and process
this data, enabling real-time analysis and decision-making for IoT systems.
5. Gaming Applications:Gaming applications generate complex data structures, such as player
profiles, scores, achievements, and game states. MongoDB’s document-based model allows for
efficient storage and retrieval of this data, supporting high-performance gaming experiences.
6. Log Management and Analysis:Organizations use MongoDB to store and analyze log data
from various sources. Its ability to handle large volumes of unstructured data makes it ideal for
logging, monitoring, and troubleshooting applications and infrastructure.
7. Customer Relationship Management (CRM):CRM systems use MongoDB to manage
customer data, interactions, and sales pipelines. Its ability to handle complex relationships and
unstructured data enables more personalized and effective customer engagement strategies.
8. Social Networks:Social networking applications require a database that can handle complex
relationships, user-generated content, and real-time interactions. MongoDB’s flexibility and
scalability make it an excellent choice for building social networks and community platforms.
9. Big Data Applications:MongoDB is used in big data applications for its ability to store and
process large volumes of diverse data types. It integrates well with big data technologies like
Hadoop and Spark, enabling advanced data analytics and processing.
10. Healthcare Systems:Healthcare applications use MongoDB to manage patient records, clinical
data, and medical images. Its flexible schema allows for the efficient storage of complex
healthcare data, supporting better patient care and data analysis.
Advantages of MongoDB:
It does not support join operation.
It is a schema-less NoSQL database. We need not to design the schema of the database when
we are working with MongoDB.
It provides great flexibility to the fields in the documents.
It contains heterogeneous data.
It provides high performance, availability, scalability.
It supports Geospatial efficiently.
It is a document-oriented database and the data is stored in BSON documents.
It also supports multiple document ACID transition(string from MongoDB 4.0).
It does not require any SQL injection.
It is easily integrated with Big Data Hadoop
Disadvantages of MongoDB:
7
High Memory Usage – Requires additional storage
No Complex Joins – Relies on embedding or referencing instead
Limited Document Size – Maximum 16MB per document
Nesting Limits – Supports up to 100 levels of nested documents
ii. Document Oriented: In MongoDB, all the data stored in the documents instead of tables like
in RDBMS. In these documents, the data is stored in fields(key-value pair) instead of rows
and columns which make the data much more flexible in comparison to RDBMS. And each
document contains its unique object id.
iii. Using Java Script Object Notation (JSON): JSON is extremely expressive. MongoDB
actually does not use JSON but BSON - it is Binary JSON. It is an open standard. It is used to
store complex data structures.
iv. Creating or Generating a Unique Key (Indexing): Each JSON document should have a
unique identifier. It is the_id key. It is similar to the primary key in relational databases. This
facilitates search for documents based on the unique identifier.If the data is not indexed, then
database search each document with the specified query which takes lots of time and not so
efficient.An index is automatically built on the uniqueidentifier. It is your choice to either
provide unique values yourself or have the mongo shell generate the same.
v. Support for Dynamic Queries: MongoDB has extensive support for dynamic queries. This is
in keeping with traditional RDBMS wherein we have static data and dynamic queries.
8
CouchDB, another document-oriented, schema-less NoSQL database and MongoDB's biggest
competitor, works on quite the reverse philosophy. It has support for dynamic data and static
queries.
vi. Storing Binary Data: MongoDB provides GridFS (GridFS API provides methods that make it
easy to store large files) to support the storage of binary data. It can store up to 4 MB of data.
This manually suffices for photographs (such as a profile picture) or small audio clips.
However, if one wishes to more movie clips, MongoDB has another solution. It stores the
metadata (data about data along with the context information) in a collection called "file". Then
breaks the data into small pieces called chunks and stores it in the "chunks" collection. This
process care about the need for easy scalability.
vii. Replication: Replication is the process of creating and maintaining multiple copies of a
database or its data across different locations or servers. It provides data redundancy and high
availability. It helps to recover from hardware failure and service interruptions. In MongoDB,
the replica set has a single primary and several secondaries. Each write request from the client
is directed to the primary. The primary logs all write requests into its Oplog (operations log).
The Oplog is then used by the secondary replica members to synchronize their data. This way
there is strict adherence to consistency. The clients usually read from the primary. However, the
client can also specify a read preference that will then direct the read operations to secondary.
viii. Sharding: Sharding is akin to horizontal scaling. Horizontal scaling, also known as scaling out,
is a method of increasing system capacity or performance by adding more machines or servers
to distribute the workload across a larger number of units.It means that the large dataset is
divided and distributed over multiple servers or shards. Each shard is an independent database
and collectively they would constitute a logical database.
9
Sharding reduces the amount of data that each shard needs to store and manage. For
example, if the dataset was 1 TB in size and we were to distribute this over four shards,
each shard would house just 256 GB data. As the cluster grows, the amount of data that
each shard will store and manage will decrease.
Sharding reduces the number of operations that each shard handles. For example, if we
were to insert data, the application needs to access only that shard which houses that
data.
ix. Updating Information In-Place: MongoDB updates the information in-place. This implies
that it updates the data wherever it is available. It does not allocate separate space and the
indexes remain unaltered. MongoDB is all for lazy-writes. It writes to the disk once every
second. Reading and writing to disk is a slow operation as compared to reading and writing
from memory. The fewer the reads and writes that weperform to the disk, the better is the
performance. This makes MongoDB faster than its other competition who write almost
immediately to the disk. However, there is a tradeoff. MongoDB makes no guarantee that data
will be stored safely on the disk.
x. Aggregation:It allows to perform operations on the grouped data and get a single result or
computed result. It is similar to the SQL GROUPBY clause. It provides three
different aggregations i.e.aggregation pipeline, map-reduce function, and single-purpose
aggregation methods.
xi. High Performance:The performance of MongoDB is very high and data persistence as
compared to another database due to its features like scalability, indexing, replication, etc.
xii. Dynamic Schema: Dynamic schema implies that documents in the same collection do not need
to have the same set of fields or structure. Also, the similar fields in a document may contain
different types of data.
Below table provides comparison of RDBMS and MongoDB databases.
RDBMS MongoDB
Database Data store
Table Collection
Column Key
Value Value
Records/Rows/Tuple Document / Object
Joins Embedded Documents
Index Index
Primary key Primary key (_id) is default key provided by MongoDB itself.
Any relational DB has a typical schema design that shows the number of tables and the
relationship between these tables. While in MongoDB, there is no concept of relationship.
xiii. Rich Queries and Other DB Functionalities: MongoDB offers a rich set of features and
functionality compared to those offered in simple key-value stores. They can be comparable to
those offered by any RDBMS. MongoDB has a complete query language, highly-functional
secondary indexes (including text search and geospatial), and a powerful aggregation
framework for data analysis. MongoDB provides functionalities and features for more diverse
data types than a relational DB, and at scale. Below table gives a comparison features of
MongoDB with respect to RDBMS.
The ability to derive a document-based data model is also a distinct advantage of MongoDB.
The method of storing data in the form of BSON (Binary JSON) helps to store the data in a
very rich way while can hold arrays and other documents.
11
concept.
Reference Foreign Key Linking documents from different collections,
similar to foreign keys in tables.
Shard Horizontal Distributing data across multiple machines
Partition (sharding).
Replica Set Cluster (Master- A group of servers maintaining the same data
Slave set for redundancy.
Replication)
Aggregation Complex Framework for data processing and
Queries(GROU analytics(similar to SQL operations).
P BY, JOIN)
Schema-less Schema-based MongoDB collections do not enforce
document structure, whereas RDBMS tables
have fixed schemas.
12