Unit 5
Unit 5
STRUCTURE
5.1 Introduction
5.7 Document-Oriented
5.11 Summary
5.12 Keywords
5.15 References
Explain the Key Features of Data Base and Core Server tools
235
Describe the Principles of Schema Design
5.1 INTRODUCTION
If you’ve built web applications in recent years, you’ve probably used a relational database as
the primary data store, and it probably performed acceptably. Most developers are familiar
with SQL, and most of us can appreciate the beauty of a well-normalized data model, the
necessity of transactions, and the assurances provided by a durable storage engine. And even
if we don’t like working with relational databases directly, a host of tools, from
administrative consoles to object-relational mappers, helps alleviate any unwieldy
complexity. Simply put, the relational database is mature and well known. So, when a small
but vocal cadre of developers starts advocating alternative data stores, questions about the
viability and utility of these new technologies arise. Are these new data stores replacements
for relational database systems? Who’s using them in production, and why? What are the
trade-offs involved in moving to a no relational database? The answers to those questions rest
on the answer to this one: why are developers interested in MongoDB?
MongoDB is a database management system designed for web applications and internet
infrastructure. The data model and persistence strategies are built for high read and write
throughput and the ability to scale easily with automatic failover. Whether an application
requires just one database node or dozens of them, MongoDB can provide surprisingly good
performance. If you’ve experienced difficulties scaling relational databases, this may be great
news. But not everyone needs to operate at scale. Maybe all you’ve ever needed is a single
database server. Why then would you use MongoDB?
It turns out that MongoDB is immediately attractive, not because of its scaling strategy, but
rather because of its intuitive data model. Given that a document-based data model can
represent rich, hierarchical data structures, it’s often possible to do without the complicated
multi-table joins imposed by relational databases. For example, suppose you’re modelling
products for an e-commerce site. With a fully normalized relational data model, the
information for any one product might be divided among dozens of tables. If you want to get
a product representation from the database shell, we’ll need to write a complicated SQL
query full of joins. As a consequence, most developers will need to rely on a secondary piece
of software to assemble the data into something meaningful.
236
With a document model, by contrast, most of a product’s information can be represented
within a single document. When you open the MongoDB JavaScript shell, you can easily get
a comprehensible representation of your product with all its information hierarchically
organized in a JSON-like structure. You can also query for it and manipulate it. Mongo DB’s
query capabilities are designed specifically for manipulating structured documents, so users
switching from relational databases experience a similar level of query power. In addition,
most developers now work with object-oriented languages, and they want a data store that
better maps to objects. With MongoDB, the object defined in the programming language can
be persisted “as is,” removing some of the complexity of object mappers.
JSON is an acronym for JavaScript Object Notation. As we’ll see shortly, JSON structures
are comprised of keys and values, and they can nest arbitrarily deep. They’re analogous to the
dictionaries and hash maps of other programming languages.
If the distinction between a tabular and object representation of data is new to you, then you
probably have a lot of questions. Rest assured that by the end of this chapter I’ll have
provided a thorough overview of Mongo DB’s features and design goals, making it
increasingly clear why developers from companies like Geek.net (SourceForge.net) and The
New York Times have adopted MongoDB for their projects. We’ll see the history of
MongoDB and lead into a tour of the database’s main features. Next, we’ll explore some
alternative database solutions and the so-called NoSQL movement,[2] explaining how
MongoDB fits in. Finally, I’ll describe in general where MongoDB works best and where an
alternative data store might be preferable.
The umbrella term NoSQL was coined in 2009 to lump together the many no relational
databases gaining in popularity at the time.
237
Data within the most common types of databases in operation today is typically modelled in
rows and columns in a series of tables to make processing and data querying efficient. The
data can then be easily accessed, managed, modified, updated, controlled, and organized.
Most databases use structured query language (SQL) for writing and querying data.
Types of databases
There are many different types of databases. The best database for a specific organization
depends on how the organization intends to use the data.
Relational databases
Relational databases became dominant in the 1980s. Items in a relational database are
organized as a set of tables with columns and rows. Relational database technology
provides the most efficient and flexible way to access structured information.
Object-oriented databases
Distributed databases
A distributed database consists of two or more files located in different sites. The
database may be stored on multiple computers, located in the same physical location, or
scattered over different networks.
Data warehouses
A central repository for data, a data warehouse is a type of database specifically designed
for fast query and analysis.
NoSQL databases
Graph databases
A graph database stores data in terms of entities and the relationships between entities.
238
OLTP databases. An OLTP database is a speedy, analytic database designed for large
numbers of transactions performed by multiple users.
These are only a few of the several dozen types of databases in use today. Other, fewer
common databases are tailored to very specific scientific, financial, or other functions. In
addition to the different database types, changes in technology development approaches
and dramatic advances such as the cloud and automation are propelling databases in
entirely new directions. Some of the latest databases include
Open-source databases
An open-source database system is one whose source code is open source; such
databases could be SQL or NoSQL databases.
Cloud databases
Multimodel database
Document/JSON database
1Self-driving databases
The newest and most ground-breaking type of database, self-driving databases (also
known as autonomous databases) are cloud-based and use machine learning to automate
database tuning, security, backups, updates, and other routine management tasks
traditionally performed by database administrators.
With massive data collection from the Internet of Things transforming life and industry
across the globe, businesses today have access to more data than ever before.
239
Forward-thinking organizations can now use databases to go beyond basic data storage and
transactions to analyse vast quantities of data from multiple systems. Using database and
other computing and business intelligence tools, organizations can now leverage the data they
collect to run more efficiently, enable better decision-making, and become more agile and
scalable. Optimizing access and throughput to data is critical to businesses today because
there is more data volume to track. It’s critical to have a platform that can deliver the
performance, scale, and agility that businesses need as they grow over time.
Self-driving databases are the wave of the future—and offer an intriguing possibility for
organizations that want to use the best available database technology without the headaches
of running and operating that technology.
Self-driving databases use cloud-based technology and machine learning to automate many
of the routine tasks required to manage databases, such as tuning, security, backups, updates,
and other routine management tasks. With these tedious tasks automated, database
administrators are freed up to do more strategic work. The self-driving, self-securing, and
self-repairing capabilities of self-driving databases are poised to revolutionize how
companies manage and secure their data, enabling performance advantages, lower costs, and
improved security.
The first autonomous database was announced in late 2017, and multiple independent
industry analysts quickly recognized the technology and its potential impact on computing.
A Wikibon 2021 report (PDF) praised autonomous database technology, saying, “Oracle has
by far the best Tier-1 Cloud Database Platform…Wikibon believes Oracle has the strongest
Cloud Database Platform with Autonomous Database.”
And KuppingerCole’s 2021 Leadership Compass (PDF) said, “The Oracle Autonomous
Database, which completely automates provisioning, management, tuning, and upgrade
processes of database instances without any downtime, not just substantially increases
security and compliance of sensitive data stored in Oracle Databases but makes a compelling
argument for moving this data to the Oracle Cloud.”
240
Because Oracle Autonomous Database is built on the highly available and scalable
architecture of Oracle Exadata, it’s possible to easily scale the database deployment as needs
grow.
MongoDB Features
Each database contains collections which in turn contains documents. Each document can be
different with a varying number of fields. The size and content of each document can be
different from each other.
The document structure is more in line with how developers construct their classes and
objects in their respective programming languages. Developers will often say that their
classes are not rows and columns but have a clear structure with key-value pairs.
The rows (or documents as called in MongoDB) doesn't need to have a schema defined
beforehand. Instead, the fields can be created on the fly.
241
The data model available within MongoDB allows you to represent hierarchical relationships,
to store arrays, and other more complex structures more easily.
Below are the few of the reasons as to why one should start using MongoDB
Ad hoc queries - MongoDB supports searching by field, range queries, and regular
expression searches. Queries can be made to return specific fields within documents.
Replication - MongoDB can provide high availability with replica sets. A replica set
consists of two or more mongo DB instances. Each replica set member may act in the
role of the primary or secondary replica at any time. The primary replica is the main
server which interacts with the client and performs all the read/write operations. The
Secondary replicas maintain a copy of the data of the primary using built-in replication.
When a primary replica fails, the replica set automatically switches over to the secondary
and then it becomes the primary server.
As we have seen from the Introduction section, the data in MongoDB has a flexible schema.
Unlike in SQL databases, where you must have a table's schema declared before inserting
data, Mongo DB’s collections do not enforce document structure. This sort of flexibility is
what makes MongoDB so powerful.
242
What are the needs of the application – Look at the business needs of the application and
see what data and the type of data needed for the application. Based on this, ensure that
the structure of the document is decided accordingly.
What are data retrieval patterns – If you foresee a heavy query usage then consider the
use of indexes in your data model to improve the efficiency of queries.
Are frequent inserts, updates and removals happening in the database? Reconsider the
use of indexes or incorporate sharding if required in your data modelling design to
improve the efficiency of your overall MongoDB environment.
Indexing
Fields in a MongoDB document can be indexed with primary and secondary indices.
Replication
MongoDB provides high availability with replica sets. A replica set consists of two or more
copies of the data. Each replica-set member may act in the role of primary or secondary
replica at any time. All writes and reads are done on the primary replica by default.
Secondary replicas maintain a copy of the data of the primary using built-in replication.
When a primary replica fails, the replica set automatically conducts an election process to
determine which secondary should become the primary. Secondary’s can optionally serve
read operations, but that data is only eventually consistent by default.
If the replicated MongoDB deployment only has a single secondary member, a separate
daemon called an arbiter must be added to the set. It has a single responsibility, which is to
resolve the election of the new primary. As a consequence, an idealized distributed
MongoDB deployment requires at least three separate servers, even in the case of just one
primary and one secondary.
Load Balancing
MongoDB scales horizontally using sharding.[31] The user chooses a shard key, which
determines how the data in a collection will be distributed. The data is split into ranges (based
on the shard key) and distributed across multiple shards. (A shard is a master with one or
more replicas.). Alternatively, the shard key can be hashed to map to a shard – enabling an
even data distribution.
MongoDB can run over multiple servers, balancing the load or duplicating data to keep the
system up and running in case of hardware failure.
243
File Storage
MongoDB can be used as a file system, called GridFS, with load balancing and data
replication features over multiple machines for storing files.
This function, called grid file system, is included with MongoDB drivers. MongoDB exposes
functions for file manipulation and content to developers. GridFS can be accessed using
mongo files utility or plugins for Nginx and lighttpd. GridFS divides a file into parts, or
chunks, and stores each of those chunks as a separate document.
Aggregation
MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-
reduce function, and single-purpose aggregation methods.
Map-reduce can be used for batch processing of data and aggregation operations. But
according to Mongo DB’s documentation, the Aggregation Pipeline provides better
performance for most aggregation operations.
The aggregation framework enables users to obtain the kind of results for which the SQL
GROUP BY clause is used. Aggregation operators can be strung together to form a pipeline –
analogous to Unix pipes. The aggregation framework includes the $lookup operator which
can join documents from multiple collections, as well as statistical operators such as standard
deviation.
JavaScript can be used in queries, aggregation functions (such as Map Reduce), and sent
directly to the database to be executed.
Capped Collections
MongoDB supports fixed-size collections called capped collections. This type of collection
maintains insertion order and, once the specified size has been reached, behaves like a
circular queue.
Transactions
MongoDB claims to support multi-document ACID transactions since the 4.0 release in June
2018. This claim was found to not be true as MongoDB violates snapshot isolation.
244
5.4 KEY FEATURES OF DATA BASE AND CORE SERVER TOOLS
Before we proceed, let’s briefly discuss what a database is. A database is an organized
collection of structured data that eases accessibility and management of data. Now, let’s
understand the basic functions of an electronic database. The purpose of a database is to assist
in organizing and storing large volumes of data, which ultimately improves data accessibility.
Hence, you can improve your data analysis and get actionable insights without any delays by
using database software. Data can be quickly and efficiently found in a database; this allows
multiple users to access and modify it accordingly.
Now that we know what database software is and what databases are used for, let’s move
ahead towards understanding what database management software (DBMS) with examples is,
and why organizations need it. The volume of data is increasing rapidly worldwide, making it
difficult for companies to manage their data and gain valuable insights. Thus, making
database management an absolute necessity. But what does database management mean?
Simply put, database management refers to the manipulation of data by an organization to
meet company objectives. This gives rise to the need for a database management system.
Database Management Software or DBM software is used for storing, manipulating, and
managing data, such as format, names of fields, and record and file structures in a database
environment. Users can construct their own databases using a DBMS to satisfy their business
requirements. For example, dBase was one of the first DBMS for micro-computers. Database
design also supports the creation, design, implementation, and maintenance of an
organization-wide data management system.
To interact with a database, a DBMS package generally uses SQL queries. It receives a
command from a database administrator (DBA) and prompts the system to perform the
necessary action. These instructions can be about loading, retrieving, or modifying existing
data in the system.
Over the years, new DBMS software has been introduced with different architecture and
application focus. One such example of database software is advanced database systems that
meet the requirements of modern-day database applications in terms of offering data
modelling, data integration capabilities, support for multimedia data, etc.
245
What Type of Information is Stored in a Database?
The purpose of a database is to store different data in several ways. Some of the types of data
that can be stored in database software are:
Textual data
Numerical data
Binary data
Database management software features data independence, as the storage mechanism and
formats can be changed without altering the entire application within the database. The top
database software includes MySQL, Microsoft SQL Server, Microsoft Access DBMS,
Oracle, IBM DB2, and FoxPro.
The following are some examples of database applications. A common DBMS tool, MySQL,
a free business database software, is high-performing database software that helps enterprise
users build scalable database applications. Similarly, the features of FoxPro include creating,
adding, editing, and removing information from a database.
In a database, the chances of data duplication are quite high as several users use one
database. A DBMS reduces data repetition and redundancy by creating a single data
repository that can be accessed by multiple users, even allowing easy data mapping
while performing ETL.
Most organizational data is stored in large databases. A DBMS helps maintain these
databases by enforcing user-defined validation and integrity constraints, such as user-
based access.
Enhanced Security
When handling large amounts of data, security becomes the top-most concern for all
businesses.
246
A database management software doesn’t allow full access to anyone except the database
administrator or the departmental head. Only they can modify the database and control
user access, making the database more secure. All other users are restricted, depending
on their access level.
In addition to the features mentioned above, it is also essential to look for certain qualities in
a database system. For example, it should represent logical structures of the problem,
eliminate redundant data storage, and offer seamless data access with DBMS tools. Astera
offers one of the best data management systems that users could try for free!
DBMS Language
To communicate database updates and queries, the DBMS language is used. Different types
of database languages are explained below:
247
Data Control Language (DCL): It is used to access the saved data. It also allows giving
or revoking access from a user.
What are the different types of database management systems? These can be broadly
classified into four types. The most popular type of DBMS software includes:
Hierarchical
Network
Relational
A relational model is one of the most extensively used arrangements for organizing
databases. It normalizes data and organizes it as logically independent tables. You can
perform operations like “Select” and “Join” on these tables. The data is stored in fixed
structures and manipulated using SQL.
248
Shared data depicts relationships between different tables. As data in a table can
reference similar data in another table, it preserves the reliability of the connections
between them. This is called referential integrity, which is a critical concept in this
database model.
Object-Oriented
The object-oriented model describes a database as a group of objects, which stores both
values and operations/methods. Objects with similar values and operations are grouped
as classes.
One of the main advantages of DBMS is that it allows users (onsite as well as remote) to
easily share the data by following the correct authorization protocols. It provides
operators access to well-managed data. As a result, they can rapidly respond to variations
in the environment.
By using database management software, you can yield speedy responses to impromptu
queries as the data is properly managed and up to date. In case of any ad hoc query, the
database software returns a response (known as the query result set) to the application.
The threats of data security breaches become more pronounced when several users
access the database. A database management software offers better implementation of
data confidentiality and safety guidelines through controlled user access.
249
Better Decision-Making
The MongoDB shell is an interactive JavaScript shell. As such, it provides the capability to
use JavaScript code directly in the shell or executed as a standalone JavaScript file.
Subsequent hours that deal with using the shell to access the database and create and
manipulate collections and documents provide examples that are written in JavaScript. To
follow those examples, you need to understand at least some of the fundamental aspects of
the JavaScript language.
The MongoDB shell is the go-to tool for experimenting with the database, running ad-hoc
queries, and administering running MongoDB instances. When you’re writing an application
that uses MongoDB, you’ll use a language driver (like Mongo DB’s Ruby gem) rather than
the shell, but the shell is likely where you’ll test and refine these queries. Any and all
MongoDB queries can be run from the shell.
If you’re completely new to Mongo DB’s shell, know that it provides all the features that
you’d expect of such a tool; it allows you to examine and manipulate data and administer the
database server itself. Mongo DB’s shell differs from others, however, in its query language.
Instead of employing a standardized query language such as SQL, you interact with the
server using the JavaScript programming language and a simple API. This means that you
can write JavaScript scripts in the shell that interact with a MongoDB database. If you’re not
familiar with JavaScript, rest assured that only a superficial knowledge of the language is
necessary to take advantage of the shell, and all examples in this chapter will be explained
thoroughly. The MongoDB API in the shell is similar to most of the language drivers, so it’s
easy to take queries you write in the shell and run them from your application.
As you probably know by now, MongoDB stores its information in documents, which can be
printed out in JSON (JavaScript Object Notation) format. You’d probably like to store
different types of documents, like users and orders, in separate places.
250
This means that MongoDB needs a way to group documents, similar to a table in an RDBMS.
In MongoDB, this is called a collection.
MongoDB divides collections into separate databases. Unlike the usual overhead that
databases produce in the SQL world, databases in MongoDB are just namespaces to
distinguish between collections. To query MongoDB, you’ll need to know the database (or
namespace) and collection you want to query for documents. If no other database is specified
on start-up, the shell selects a default database called test. As a way of keeping all the
subsequent tutorial exercises under the same namespace, let’s start by switching to the
tutorial database.
The first specifies which documents to update, and the second defines how the selected
documents should be modified. The first few examples demonstrate modifying a single
document, but the same operations can be applied to many documents, even an entire
collection, as we show at the end of this section. But keep in mind that by default the update()
method updates a single document.
There are two general types of updates, with different properties and use cases. One type of
update involves applying modification operations to a document or documents, and the other
type involves replacing the old document with a new one.
MongoDB must be properly installed and running for MongoDB to run JavaScript and to
enter the Mongo shell interface. Execute the mongo command to enter the interface and then
type db.version() to obtain the current version number of MongoDB.
Mongo Shell should be included in the default installation of MongoDB. If it is not working
properly, the below instructions will help with the Mongo Shell installation process. The
Linux package for Mongo Shell is mongodb-org-shell and can be installed with the distros
package manager.
251
Use the YUM installer (yum install) or the APT repository (apt-get install) for Debian-
based distros, such as Ubuntu or Linux Mint for Red Hat distros of Linux. Refer to the
following screenshot:
In order to run a JavaScript file (.js file extension) with Mongo Shell the mongo command
must be executed in a terminal window followed by the domain name or IP address running
the MongoDB service. The database name with a slash, e.g., /some_db, must be written as
the last part of the command in the JavaScript filename.
An example of how to use Mongo Shell to execute a JavaScript file called mongo-
test.js follows:
NOTE: If the same directory path as the file has not already been established, the absolute
path along with the filename must be specified. For example, 127.0.0.1 is the IPv4 address
for the local host server, but note the local host domain name can be used by itself.
Mongo DB’s JavaScript shell makes it easy to play with data and get a tangible sense of
documents, collections, and the database’s particular query language. Think of the following
walkthrough as a practical introduction to MongoDB.
You’ll begin by getting the shell up and running. Then you’ll see how JavaScript represents
documents, and you’ll learn how to insert these documents into a MongoDB collection.
252
To verify these inserts, you’ll practice querying the collection. Then it’s on to updates.
Finally, we’ll finish out the CRUD operations by learning to remove data and drop
collections.
Indexes are very important in any database, and with MongoDB it's no different. With the use
of Indexes, performing queries in MongoDB becomes more efficient. If you had a collection
with thousands of documents with no indexes, and then you query to find certain documents,
then in such case MongoDB would need to scan the entire collection to find the documents.
But if you had indexes, MongoDB would use these indexes to limit the number of documents
that had to be searched in the collection.
Indexes are special data sets which store a partial part of the collection's data. Since the data
is partial, it becomes easier to read this data. This partial set stores the value of a specific
field, or a set of fields ordered by the value of the field.
Now even though from the introduction we have seen that indexes are good for queries but
having too many indexes can slow down other operations such as the Insert, Delete and
Update operation. If there are frequent insert, delete and update operations carried out on
documents, then the indexes would need to change that often, which would just be an
overhead for the collection.
The below example shows an example of what field values could constitute an index in a
collection. An index can either be based on just one field in the collection, or it can be based
on multiple fields in the collection. In the example below, the Employeeid "1" and Employee
Code "AA" are used to index the documents in the collection. So, when a query search is
made, these indexes will be used to find the required documents quickly and efficiently in the
collection.
So even if the search query is based on the Employee Code "AA", that document would be
returned.
Code Explanation
The create Index method is used to create an index based on the "Employeeid" of the
document.
253
The '1' parameter indicates that when the index is created with the "Employeeid" Field
values, they should be sorted in ascending order. Please note that this is different from the _id
field (The id field is used to uniquely identify each document in the collection) which is
created automatically in the collection by MongoDB. The documents will now be sorted as
per the Employeeid and not the _id field.
Types of Indexing
Clustered
Non-clustered
Both clustered and non-clustered indexes are stored and searched as B-trees, a data structure
similar to a binary tree. A B-tree is a “self-balancing tree data structure that maintains sorted
data and allows searches, sequential access, insertions, and deletions in logarithmic time.”
Basically, it creates a tree-like structure that sorts data for quick searching.
Clustered Indexes
Clustered indexes are the unique index per table that uses the primary key to organize the
data that is within the table. The clustered index ensures that the primary key is stored in
increasing order, which is also the order the table holds in memory.
Non-Clustered Indexes
Non-clustered indexes are sorted references for a specific field, from the main table, that
hold pointers back to the original entries of the table.
They are used to increase the speed of queries on the table by creating columns that are
more easily searchable. Non-clustered indexes can be created by data analysts/
developers after a table has been created and filled.
Note: Non-clustered indexes are not new tables. Non-clustered indexes hold the field that
they are responsible for sorting and a pointer from each of those entries back to the full
entry in the table.
254
When to Use Indexes
Indexes are meant to speed up the performance of a database, so use indexing whenever it
significantly improves the performance of your database. As your database becomes larger
and larger, the more likely you are to see benefits from indexing.
When data is written to the database, the original table (the clustered index) is updated first
and then all of the indexes off of that table are updated. Every time a write is made to the
database, the indexes are unusable until they have updated. If the database is constantly
receiving writes, then the indexes will never be usable. This is why indexes are typically
applied to databases in data warehouses that get new data updated on a scheduled basis (off-
peak hours) and not production databases which might be receiving new writes all the time.
5.7 DOCUMENTS-ORIENTED
In fact, the records in MongoDB that represent documents are stored as BSON, a lightweight
binary form of JSON. It uses field: value pairs that correspond to JavaScript property: value
pairs that define the values stored in the document. Little translation is necessary to convert
MongoDB records back into JSON strings that you might be using in your application.
MongoDB itself doesn’t enforce a schema, but every application needs some basic internal
standards about how its data is stored. This exploration of principles sets the stage for the
second part of the chapter, where we examine the design of an e-commerce schema in
MongoDB. Along the way, you’ll see how this schema differs from an equivalent RDBMS
schema, and you’ll learn how the typical relationships between entities, such as one-to-many
and many-to-many, are represented in MongoDB. The e-commerce schema presented here
will also serve as a basis for our discussions of queries, aggregation, and updates in
subsequent chapters.
255
Because documents are the raw materials of MongoDB, we’ll devote the final portion of this
chapter to some of the many details you might encounter when thinking through your own
schemas. This involves a more detailed discussion of databases, collections, and documents
than you’ve seen up to this point. But if you read to the end, you’ll be familiar with most of
the obscure features and limitations of document data in MongoDB. You may also find
yourself returning to this final section of the chapter later on, as it contains many of the
“gotchas” you’ll encounter when using MongoDB in the wild.
For example, a document in MongoDB might be structured similar to the following, with
name, version, languages, admin, and paths fields:
Notice that the document structure contains fields/properties that are strings, integers, arrays,
and objects, just as in a JavaScript object. The Table below lists the different data types for
field values in the BSON document.
Type Number
Double 1
String 2
Object 3
Array 4
Binary data 5
256
Type Number
Object ID 7
Boolean 8
Date 9
Null 10
Regular expression 11
JavaScript 13
Symbol 14
JavaScript (with 15
scope)
32-bit integer 16
Timestamp 17
64-bit integer 18
Table 5.1 Different data types for field values in the BSON document
257
The field names cannot contain null characters, dots (.), or dollar signs ($). In addition, the
_id field name is reserved for the Object ID. The _id field is a unique ID for the system that
consists of the following parts:
A 2-byte process ID
The maximum size of a document in MongoDB is 16MB, to prevent queries that result in an
excessive amount of RAM or intensive hits to the file system. You might never come close to
this, but you still need to keep the maximum document size in mind when designing some
complex data types that contain file data into your system.
Documents map to the objects in your code, so they are much more natural to work with.
There is no need to decompose data across tables, run expensive JOINs, or integrate a
separate ORM layer. Data that is accessed together is stored together, so you have less code
to write, and your users get higher performance.
A document’s schema is dynamic and self-describing, so you don’t need to first pre-
define it in the database. Fields can vary from document to document, and you modify
the structure at any time, avoiding disruptive schema migrations. Some document
databases offer JSON Schema so you can optionally enforce rules governing document
structures.
258
Powerful: Query Data Anyway You Need
Database schema design is the process of choosing the best representation for a data set,
given the features of the database system, the nature of the data, and the application
requirements. The principles of schema design for relational database systems are well
established. With RDBMSs, you’re encouraged to shoot for a normalized data model,[1]
which helps to ensure generic query ability and avoid updates to data that might result in
inconsistencies. Moreover, the established patterns prevent developers from wondering how
to model, say, one-to-many and many-to-many relationships. But schema design is never an
exact science, even with relational databases. Application functionality and performance is
the ultimate master in schema design, so every “rule” has exceptions.
259
A simple way to think about a “normalized data model” is that information is never stored
more than once. Thus, a one-to-many relationship between entities will always be split into at
least two tables.
If you’re coming from the RDBMS world, you may be troubled by Mongo DB’s lack of hard
schema design rules. Good practices have emerged, but there’s still usually more than one
good way to model a given data set. The premise of this section is that principles can drive
schema design, but the reality is that those principles are pliable. To get you thinking, here
are a few questions you can bring to the table when modelling data with any database system:
You need to pin down the needs of your application, and this should inform not only your
schema design but also which database you choose. Remember, MongoDB isn’t right for
every application. Understanding your application access patterns is by far the most
important aspect of schema design. The idiosyncrasies of an application can easily demand a
schema that goes against firmly held data modelling principles. The upshot is that you must
ask numerous questions about the application before you can determine the ideal data model.
What’s the read/write ratio? Will queries be simple, such as looking up a key, or more
complex? Will aggregations be necessary? How much data will be stored?
In an RDBMS, you have tables with columns and rows. In a key-value store, you have keys
pointing to amorphous values. In MongoDB, the basic unit of data is the BSON document.
Once you understand the basic data type, you need to know how to manipulate it. RDBMSs
feature ad hoc queries and joins, usually written in SQL while simple key-value stores permit
fetching values only by a single key. MongoDB also allows ad hoc queries, but joins aren’t
supported. Databases also diverge in the kinds of updates they permit. With an RDBMS, you
can update records in sophisticated ways using SQL and wrap multiple updates in a
transaction to get atomicity and rollback. MongoDB doesn’t support transactions in the
traditional sense, but it does support a variety of atomic update operations that can work on
the internal structures of a complex document. With simple key-value stores, you might be
able to update a value, but every update will usually mean replacing the value completely.
260
What Makes a Good Unique Id or Primary Key for a Record?
There are exceptions, but many schemas, regardless of the database system, have some
unique key for each record. Choosing this key carefully can make a big difference in how you
access your data and how it’s stored. If you’re designing a user’s collection, for example,
should you use an arbitrary value, a legal name, a username, or a social security number as
the primary key? It turns out that neither legal names nor social security numbers are unique
or even applicable to all users within a given dataset. In MongoDB choosing a primary key
means picking what should go in the _id field. The automatic object ids are good defaults, but
not ideal in every case.
The best schema designs are always the product of deep knowledge of the database you’re
using, good judgment about the requirements of the application at hand, and plain old
experience. A good schema often requires experimentation and iteration, such as when an
application scales and performance considerations change. Don’t be afraid to alter your
schema when you learn new things; only rarely is it possible to fully plan an application
before its implementation. The examples in this chapter have been designed to help you
develop a good sense of schema design in MongoDB. Having studied these examples, you’ll
be well-prepared to design the best schemas for your own applications.
Databases
"Show dbs" command provides you with a list of all the databases.
MongoDB tutorial has a separate page dedicated to commands related to creation and
management of the database. Database names can be almost any character in the ASCII
range. But they can't contain an empty string, a dot (i.e., ".") or " ".
261
Documents
The document is the unit of storing data in a MongoDB database. Document use JSON
(JavaScript Object Notation, is a lightweight, thoroughly explorable format used to
interchange data between various applications) style for storing data. A simple example of a
JSON document is as follows:
{ site : "w3resource.com" }
Collections
Databases are crucial to building applications. They store data that make our applications
work like they should. A database query is a request for a database’s data so we can retrieve
or manipulate it. But when should we query a database, and what exactly are we doing?
At a very high level, a query is a question. When we talk about queries in relation to other
people, we expect some sort of answer in return. This is no different for computers when we
perform database queries.
A database query is a similar action that is most closely associated with some sort of CRUD
(create, read, update, delete) function. A database query is a request to access data from a
database to manipulate it or retrieve it.
This allows us to perform logic with the information we get in response to the query. There
are several different approaches to queries, from using query strings, to writing with a query
language, or using a QBE like GraphQL or REST.
262
Query Parameters
Query Parameters are put on the end of a URL as part of a query string. This is how search
engines grab search results for parameters a user inputs in a search bar. You can also add
query parameters to the end of an endpoint to aid in pagination.
It is done under the hood for you. The timeline for QBE occurred alongside the development
of the structured query language (SQL), which we’ll go over in the next section.
More than likely there is a graphical user interface that a user fills out. Once submitted, the
query is built under the hood. This prevents missing input bugs as the query only gets built
from the information that it’s given as opposed to a prebuilt query that is expecting specific
information.
Query language is what allows us to actually take action on databases. It allows us to create,
read, update, and delete items on our database, as well as more advanced queries like filtering
and counting.
Structured Query Language (SQL) is the most famous of the query languages. SQL grew up
alongside the Query by Example (QBE) system developed by IBM in the 1970s. It serves the
basis of relational databases.
With SQL, we can store, retrieve, and manipulate data using simple code snippets, called
queries, in an RDBMS (relational database management system).
The data is stored in the RDBMS in a structured way, where there are relations between the
different entities and variables in the data.
These relations are defined by the database schema, which specifies the relation between
various entities and the organization of data for the entities.
263
5.11 SUMMARY
With MongoDB, the object defined in the programming language can be persisted “as
is,” removing some of the complexity of object mappers.
JSON is an acronym for JavaScript Object Notation. As we’ll see shortly, JSON
structures are comprised of keys and values, and they can nest arbitrarily deep.
They’re analogous to the dictionaries and hash maps of other programming languages.
The umbrella term NoSQL was coined in 2009 to lump together the many no
relational databases gaining in popularity at the time.
264
Schema less − MongoDB is a document database in which one collection holds
different documents. Number of fields, content and size of the document can differ
from one document to another.
Some of the types of data that can be stored in a database software are Textual data,
Numerical data, Binary data, Data, and time
The threats of data security breaches become more pronounced when several users
access the database. A database management software offers better implementation of
data confidentiality and safety guidelines through controlled user access.
With the use of Indexes, performing queries in MongoDB becomes more efficient. If
you had a collection with thousands of documents with no indexes, and then you
query to find certain documents, then in such case MongoDB would need to scan the
entire collection to find the documents.
265
5.12 KEYWORDS
Data web - Web data services refer to service-oriented architecture (SOA) applied to
data sourced from the World Wide Web and the Internet as a whole. Web data
services enable maximal mash up, reuse, and sharing of structured data (such as
relational tables), semi-structured information (such as Extensible Mark-up Language
(XML) documents), and unstructured information (such as RSS feeds, content from
Web applications, commercial data from online business sources). Web data services
may support business-to-consumer (B2C) and business-to-business (B2B)
information-sharing requirements. Increasingly, enterprises are including Web data
services in their SOA implementations, as they integrate mash up-style user-driven
information sharing into business intelligence, business process management,
predictive analytics, content management, and other applications, according to
industry analysts.
WDI - Web data integration (WDI) is the process of aggregating and managing data
from different websites into a single, homogeneous workflow. This process includes
data access, transformation, mapping, quality assurance and fusion of data. Data that
is sourced and structured from websites is referred to as "web data". WDI is an
extension and specialization of data integration that views the web as a collection of
heterogeneous databases.
Web Analytics - Web analytics is the process of analysing the behaviour of visitors to
a website. This involves tracking, reviewing, and reporting data to measure web
activity, including the use of a website and its components, such as webpages, images,
and videos.
266
Data collected through web analytics may include traffic sources, referring sites, page
views, paths taken and conversion rates. The compiled data often forms a part of
customer relationship management analytics (CRM analytics) to facilitate and
streamline better business decisions.
DDL - A Data Definition Language (DDL) is a computer language used to create and
modify the structure of database objects in a database. These database objects include
views, schemas, tables, indexes, etc. This term is also known as data description
language in some contexts, as it describes the fields and records in a database table. It
is used to establish and modify the structure of objects in a database by dealing with
descriptions of the database schema. Unlike data manipulation language (DML)
commands that are used for data modification purposes, DDL commands are used for
altering the database structure such as creating new tables or objects along with all
their attributes (data type, table name, etc.).
1. Carry out a research on how create and optimize SQL Server indexes for better
performance.
___________________________________________________________________________
___________________________________________________________________________
2. Write a MongoDB query to display the field’s restaurant_id, name, borough, and cuisine
for all the documents in the collection restaurant.
___________________________________________________________________________
___________________________________________________________________________
267
5.14 UNIT END QUESTIONS
A. Descriptive Questions
Short Questions
1. Define Database.
Long Questions
a. C++
b. JavaScript
c. C
d. All of these
b. MongoDB can store the business subject in the minimal number of documents
d. All of these
268
3. Which is the year in which Mongo Database was released?
a. 2008
b. 2009
c. 2010
d. 2011
a. 100 s
b. 60 s
c. 1 s
d. 100 ms
Answers
269
5.15 REFERENCES
Reference Books
Rai, K. (2017). Deployment of Data Base as a Service and connecting it with the local
server. International Journal of Engineering and Computer Science. Published.
Kanth Aluvalu, R., & A. Jabbar, M. (2018). Handling data analytics on unstructured data
using mongo DB. International Journal of Engineering & Technology, 7(2.12), 344.
Textbooks
Malik, U., Goldwasser, M., & Johnston, B. (2019). SQL for Data Analytics:
Perform fast and efficient data analysis with the power of SQL. Packt Publishing.
Websites
https://www.happiestminds.com/Insights/big-data-hadoop/
https://livebook.manning.com/book/mongodb-in-action-second-edition/chapter-
2/126
https://kb.objectrocket.com/mongo-db/use-mongodb-to-run-javascript-957
270